MHGen: Maternal Health Generator
The Story: A few years ago, I tried to make a product/app that was called the Women’s Advancement Neural network. That app would have used real data to make synthetic datasets for the training of AI models to advance women’s health, hence the name. Flash forward to the present, and I was working on my dataset for my Duncan Gamabunta project. I had the idea that I could do something more than generate datasets for a roleplaying humanoid robotic frog, so I got to work.
The Result: After just a few hours, I was able to come up with a python system that uses ChatGPT (and potentially other models) to create realistic synthetic conversation datasets. I made the first dataset (composed of 1193) conversations open source as a citizen science efforts. My hope is that this dataset could be used in the fine-tuning or training of a model that could provide support to maternal health patients.
For More:
HuggingFace Open Source Dataset: https://huggingface.co/datasets/tuc111/mhgen-maternal-health-convos