expandPredictionResults: Expands predicted datasets to contain all allowed codes

View source: R/expandPredictionResults.R

expandPredictionResultsR Documentation

Expands predicted datasets to contain all allowed codes

Description

Start with a data.table of class 'occupationalPredictions' (for each combination of pred.code and answer one prediction) and expand it to contain all allowed codes.

Usage

expandPredictionResults(occupationalPredictions, allowed.codes, method.name)

Arguments

allowed.codes

a character vector of all allowed codes.

method.name

the name how the method shall be called.

data

a data.table created with a predict-function from this package.

Details

The problem solved here is this: Most algorithms do not provide codes for all categories from the classification, because this would require that the categories are in the training data. This function expands the dataset and predicts some very small probabilities (or 0) for classification codes that the training algorithm found impossible to predict.

Value

a data.table

See Also

produceResults

Examples

# set up data
data(occupations)
allowed.codes <- c("71402", "71403", "63302", "83112", "83124", "83131", "83132", "83193", "83194", "-0004", "-0030")
allowed.codes.titles <- c("Office clerks and secretaries (without specialisation)-skilled tasks", "Office clerks and secretaries (without specialisation)-complex tasks", "Gastronomy occupations (without specialisation)-skilled tasks",
 "Occupations in child care and child-rearing-skilled tasks", "Occupations in social work and social pedagogics-highly complex tasks", "Pedagogic specialists in social care work and special needs education-unskilled/semiskilled tasks", "Pedagogic specialists in social care work and special needs education-skilled tasks", "Supervisors in education and social work, and of pedagogic specialists in social care work", "Managers in education and social work, and of pedagogic specialists in social care work",
 "Not precise enough for coding", "Student assistants")
proc.occupations <- removeFaultyAndUncodableAnswers_And_PrepareForAnalysis(occupations, colNames = c("orig_answer", "orig_code"), allowed.codes, allowed.codes.titles)

## split sample
set.seed(3451345)
n.test <- 50
group <- sample(c(rep("test", n.test), rep("training", nrow(proc.occupations) - n.test)))
splitted.data <- split(proc.occupations, group)

# train model and make predictions
model <- trainLogisticRegressionWithPenalization(splitted.data$train, preprocessing = list(stopwords = tm::stopwords("de"), stemming = "de", countWords = FALSE), tuning = list(alpha = 0.05, maxit = 50^5, nlambda = 100, thresh = 1e-5))
res <- predictLogisticRegressionWithPenalization(model, splitted.data$test)

expandPredictionResults(res, allowed.codes, method.name = "Logistic Regression")

malsch/occupationCoding documentation built on March 14, 2024, 8:09 a.m.