findGivenNames: Getting gender prediction data for a given text vector.

Description Usage Arguments Value Examples

View source: R/findGivenNames.R

Description

findGivenNames extracts from text unique terms and predicts gender for them.

Usage

1
2
3
findGivenNames(x, textPrepare = TRUE, country = NULL,
  language = NULL, apikey = NULL, queryLength = 10,
  progress = TRUE, ssl.verifypeer = TRUE)

Arguments

x

A text vector or a character vector of unique terms pre-processed earlier manually or by the textPrepare function.

textPrepare

If TRUE (default) the textPrepare function will be used on the x vector. Set it to FALSE if you already have prepared a character vector of cleaned up and deduplicated terms that you want to send to the API for gender checking.

country

A character string with a country code for localized search of names. Country codes follow the ISO_3166-1 alpha-2 standard https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2.

language

A character string with a language code for localized search of names. Language codes follow the ISO_639-1 standard: https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes

apikey

A character string with the API key obtained via https://store.genderize.io. A default is NULL, which uses the free API plan. If you reached the limit of the API you can start from the last checked term next time.

queryLength

How much terms can be checked in a one single query.

progress

If TRUE (default) progress bar is displayed in the console.

ssl.verifypeer

Checks the SSL Certificate. Default is TRUE. You may set it to FALSE if you encounter some errors that break the connection with the API (though it is not recommended).

Value

A data table with given names found in database, gender predictions, probabilities of gender predictions, and counts how many people with a given name is recorded in the database.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
x = "Tom did play hookey, and he had a very good time. He got back home 
     barely in season to help Jim, the small colored boy, saw next-day's wood 
     and split the kindlings before supper-at least he was there in time 
     to tell his adventures to Jim while Jim did three-fourths of the work. 
     Tom's younger brother (or rather half-brother) Sid was already through 
     with his part of the work (picking up chips), for he was a quiet boy, 
     and had no adventurous, trouble-some ways. While Tom was eating his
     supper, and stealing sugar as opportunity offered, Aunt Polly asked 
     him questions that were full of guile, and very deep-for she wanted 
     to trap him into damaging revealments. Like many other simple-hearted
     souls, it was her pet vanity to believe she was endowed with a talent 
     for dark and mysterious diplomacy, and she loved to contemplate her 
     most transparent devices as marvels of low cunning. 
     (from 'Tom Sawyer' by Mark Twain)"

xProcessed = textPrepare(x)

foundNames = findGivenNames(xProcessed, textPrepare = FALSE, 
                            ssl.verifypeer = FALSE)
foundNames[count > 100]

# (the results can differ due to new, updated data pulled from the API)
#    name gender probability count
# 1:   jim   male        1.00  2291
# 2:  mark   male        1.00  6178
# 3: polly female        0.99   191
# 4:   tom   male        1.00  3736


# localization
findGivenNames("andrea", country = "us", ssl.verifypeer = FALSE)
#      name gender probability count
# 1: andrea female        0.97  2308

findGivenNames("andrea", country = "it", ssl.verifypeer = FALSE)
#      name gender probability count
# 1: andrea  male         0.99  1070

genderizeR documentation built on Aug. 4, 2019, 5:02 p.m.