You must log in or register to comment.
Sounds like those statistics output would the heavily biased by whatever process you were using to turn names into genders. In short, a bad idea.
“Since the dataset isn’t 100% perfectly annotated for analysis, we should give up the whole project entirely.”
No, since the dataset is bound to give nonsensical results, we search for sources that are more precise. Hint: “Andrea” already mentioned and Japanese names
“Since the data is incomplete, we decided to make shit up”