data-raw/wiki/readme.md

Data from Wikipedia

The data were originally collected by a team lead by Steven Skiena as part of the project to build a classifier for race and ethnicity based on names. The team scraped Wikipedia to produce a novel database of over 140k name/race associations. For details of the how the data was collected, see Name-ethnicity classification from open sources (for reference, see below).

The team has two papers (reference for one of the papers can be found below; the other paper is forthcoming) on novel ways of building a classifier. The team has also made it easy to use the classifiers they have built by providing public APIs. The classifier based on the methods discussed in the first paper can be accessed at: http://www.textmap.com/ethnicity, and for the second paper at: http://www.data-prism.com.

If you use this data, please cite:

@inproceedings{ambekar2009name, title={Name-ethnicity classification from open sources}, author={Ambekar, Anurag and Ward, Charles and Mohammed, Jahangir and Male, Swapna and Skiena, Steven}, booktitle={Proceedings of the 15th ACM SIGKDD international conference on Knowledge Discovery and Data Mining}, pages={49--58}, year={2009}, organization={ACM} }



appeler/ethnicolor documentation built on May 30, 2019, 4:20 p.m.