genderizeTrain: Training genderize function

Description Usage Arguments Value See Also Examples

View source: R/genderizeTrain.R

Description

THe genderizeTrain function predicts gender and checks different combinations of probability and count parameters.

Usage

1
2
genderizeTrain(x, y, givenNamesDB, probs, counts, parallel = FALSE,
  cores = NULL)

Arguments

x

A text vector that we want to genderize.

y

A text vector of true gender labels for the x vector.

givenNamesDB

A dataset with gender data (could be an output of findGivenNames function).

probs

A numeric vector of different probability values. Used to subseting a givenNamesDB dataset.

counts

A numeric vector of different count values. Used to subseting a givenNamesDB dataset.

parallel

If TRUE it computes errors with the use of parallel package and available cores. Default is FALSE.

cores

A integer value for number of cores designated to parallel processing or NULL (default). If parallel argument is TRUE and cores is NULL, than the available number of cores will be detected automatically.

Value

A data frame with prediction indicators for each combination of parameters:

errorCoded

The classification error for predicted and unpredicted gender.

errorCodedWithoutNA

The classification error for items with predicted gender only.

naCoded

The proportion of items with manually coded gender and with unpredicted gender.

errorGenderBias

The net gender bias error.

See Also

Implementation of parallel mclapply on Windows machines by Nathan VanHoudnos http://edustatistics.org/nathanvan/setup/mclapply.hack.R.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
## Not run: 

x = c('Alex', 'Darrell', 'Kale', 'Lee', 'Robin', 'Terry', 'John', 'Tom')
y = c(rep('male', length(x)))

givenNamesDB = findGivenNames(x)
probs = seq(from =  0.5, to = 0.9, by = 0.1)
counts = c(1, 10)

genderizeTrain(x = x, y = y, 
               givenNamesDB = givenNamesDB, 
               probs = probs, counts = counts, 
               parallel = TRUE) 

#     prob count errorCoded errorCodedWithoutNA naCoded errorGenderBias
#  1:  0.5     1      0.125               0.125   0.000           0.125
#  2:  0.6     1      0.125               0.000   0.125           0.000
#  3:  0.7     1      0.125               0.000   0.125           0.000
#  4:  0.8     1      0.375               0.000   0.375           0.000
#  5:  0.9     1      0.500               0.000   0.500           0.000
#  6:  0.5    10      0.125               0.125   0.000           0.125
#  7:  0.6    10      0.125               0.000   0.125           0.000
#  8:  0.7    10      0.125               0.000   0.125           0.000
#  9:  0.8    10      0.375               0.000   0.375           0.000
# 10:  0.9    10      0.500               0.000   0.500           0.000


## End(Not run)

genderizeR documentation built on Aug. 4, 2019, 5:02 p.m.