genderize: Predicting gender for character strings.

Description Usage Arguments Value Examples

View source: R/genderize.R

Description

For each character string in a x vector genderize function using an output of the findGivenNames function and returns a gender prediction for the whole character string based on first names located inside the strings.

Usage

1
genderize(x, genderDB, blacklist = NULL, progress = TRUE)

Arguments

x

A vector of text strings.

genderDB

A data table output of findGivenNames function for the vector x.

blacklist

A character vector of terms (stopwords) that will be excluded from gender checking.

progress

If TRUE (default) progress bar is displayed in the console.

Value

A data table with text string, a term found in genderDB, that is finally used as a given name to predict gender of the string, a predicted gender, a number of potential gender indicators (eg. 1 if only one term from the text string is found in genderDB).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
x = c("Winston J. Durant, ASHP past president, dies at 84", 
"Gold Badge of Honour of the DGAI Prof. Dr. med. Norbert R. Roewer Wuerzburg",
"The contribution of professor Yu.S. Martynov (1921-2008) to Russian neurology", 
"JAN BASZKIEWICZ (3 JANUARY 1930 - 27 JANUARY 2011) IN MEMORIAM", 
"Maria Sklodowska-Curie")

givenNames = findGivenNames(x, ssl.verifypeer = FALSE)
givenNames = givenNames[count>40]
genderize(x, genderDB=givenNames, blacklist=c('med'))

#                                                                             text
# 1:                            Winston J. Durant, ASHP past president, dies at 84
# 2:   Gold Badge of Honour of the DGAI Prof. Dr. med. Norbert R. Roewer Wuerzburg
# 3: The contribution of professor Yu.S. Martynov (1921-2008) to Russian neurology
# 4:                JAN BASZKIEWICZ (3 JANUARY 1930 - 27 JANUARY 2011) IN MEMORIAM
# 5:                                                        Maria Sklodowska-Curie

#    givenName gender genderIndicators
# 1:   winston   male                1
# 2:   norbert   male                1
# 3:        yu female                1
# 4:       jan   male                1
# 5:     maria female                1

genderizeR documentation built on Aug. 4, 2019, 5:02 p.m.