Description Usage Arguments Value
This function crossreferences the 'name' field in the corpus files with a large database of baby names statistics, drawn from two sources: United States Social Security (included in the R package 'babynames' by Hadley Wickham) and the Spanish Instituto Nacional de Estadisticas (INE). The function implements a cascade system, attempting first to find exact matches, after which it results to approximate string matching using Levenhstein distance.
1 2 | addAgeGender(filtered_corpus, language = c("English", "Spanish"),
maxDistance = 1, nthreads = parallel::detectCores())
|
maxDistance |
maximum Levenhstein distance to use for approximate string matching. Defaults to 2 |
nthreads |
number of threads to use in the C++ code for approximate string matching. Defaults to the number of CPU cores on your machine and it's probably a good idea to use that default. |
filteredCorpus |
filtered corpus. Do not use on unfiltered data if you want to get results in this century. |
a data.frame with the two added columns: gender (column 'sex') and most likely year of birth (column 'year')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.