genderizeBootstrapError: Gender prediction errors on bootstrap samples
In genderizeR: Gender Prediction Based on First Names

Description Usage Arguments Value See Also Examples

View source: R/genderizeBootstrapError.R

genderizeBootstrapError calculates the Apparent Error Rate, the Leave-One-Out bootstrap error rate, and the .632+ error rate from Efron and Tibishirani (1997). The code is modified version of several functions from sortinghat package by John A. Ramey.

1 2	genderizeBootstrapError(x, y, givenNamesDB, probs, counts, num_bootstraps = 50, parallel = FALSE)

`x`	A text vector that we want to genderize
`y`	A text vector of true gender labels ('female' or 'male') for x vector
`givenNamesDB`	A dataset with gender data (could be an output of `findGivenNames` function)
`probs`	A numeric vector of different probability values. Used to subseting a givenNamesDB dataset
`counts`	A numeric vector of different count values. Used to subseting a givenNamesDB dataset
`num_bootstraps`	Number of bootstrap samples. Default is 50.
`parallel`	It is passed to `genderizeTrain` function. If TRUE it computes errors with the use of `parallel` package and available cores. Default is FALSE.

A list of bootstrap errors:

`apparent`	Apparent Error Rate
`loo_boot`	LOO-Boot Error Rate
`errorRate632plus`	.632+ Error Rate

In the sortinghat package.

## Not run: 

x <- c('Alex', 'Darrell', 'Kale', 'Lee', 'Robin', 'Terry', rep('Robin', 20))

y <- c(rep('female', 6), rep('male', 20))

givenNamesDB = findGivenNames(x)
pred = genderize(x, givenNamesDB)
classificationErrors(labels = y, predictions = pred$gender)

probs = seq(from =  0.5, to = 0.9, by = 0.05)
counts = c(1)

set.seed(23)
genderizeBootstrapError(x = x, y = y, 
                         givenNamesDB = givenNamesDB, 
                         probs = probs, counts = counts, 
                         num_bootstraps = 20, 
                         parallel = TRUE)


# $apparent
# [1] 0.9615385

# $loo_boot
# [1] 0.965812

# $errorRate632plus
# [1] 0.964225



## End(Not run)