predictWithCodingIndex: Code answers with a coding index
In malsch/occupationCoding: Supervised Learning for Occupation Coding

predictWithCodingIndex

R Documentation

Code answers with a coding index

Description

Look up the correct code in a coding index. We often find no code, 1 code or even more than one possible code this way.

Usage

predictWithCodingIndex(
  newdata,
  coding_index,
  include.substrings = FALSE,
  max.count.categories = Inf
)

Arguments

`newdata`	either a data.table created with `removeFaultyAndUncodableAnswers_And_PrepareForAnalysis` or a character vector.
`coding_index`	a data.table as created with function `prepare_German_coding_index_Gesamtberufsliste_der_BA`
`include.substrings`	(default: `FALSE`). If `FALSE`, a match is found if, after preprocessing, the entry from the coding index and the string-element are exactly identical. If TRUE (Attention: THIS IS SLOW!!), a match is found if, after preprocessing, the entry from the coding index is a substring of the string-element.
`max.count.categories`	(default: `Inf`). Should we search the whole coding index (default) or should we exclude entries with large `count_categories`, an indicator of job title ambiguity? Only entries in the coding index with `count_categories \le max.count.categories` are searched.

Value

a data.table with columns id, ans, and pred.code (format is not comparable to other formats in this package.)

Examples

# set up data
data(occupations)
allowed.codes <- c("71402", "71403", "63302", "83112", "83124", "83131", "83132", "83193", "83194", "-0004", "-0030")
allowed.codes.titles <- c("Office clerks and secretaries (without specialisation)-skilled tasks", "Office clerks and secretaries (without specialisation)-complex tasks", "Gastronomy occupations (without specialisation)-skilled tasks",
 "Occupations in child care and child-rearing-skilled tasks", "Occupations in social work and social pedagogics-highly complex tasks", "Pedagogic specialists in social care work and special needs education-unskilled/semiskilled tasks", "Pedagogic specialists in social care work and special needs education-skilled tasks", "Supervisors in education and social work, and of pedagogic specialists in social care work", "Managers in education and social work, and of pedagogic specialists in social care work",
 "Not precise enough for coding", "Student assistants")
proc.occupations <- removeFaultyAndUncodableAnswers_And_PrepareForAnalysis(occupations, colNames = c("orig_answer", "orig_code"), allowed.codes, allowed.codes.titles)

# recommended default
res <- predictWithCodingIndex(proc.occupations,
                              coding_index = coding_index_excerpt,
                              include.substrings = FALSE,
                              max.count.categories = Inf)

# playing around with the parameters to obtain other results
res <- predictWithCodingIndex(proc.occupations,
                              coding_index = coding_index_excerpt,
                              include.substrings = TRUE,
                              max.count.categories = 15)

#################################
# Analysis: Standard functions from this package won't work here.
# Absolute numbers: either nothing is predicted (nPredictedCodes = NA), or 1 or more cods are predicted
res[ , .N, by = list(nPredictedCodes = 1 + nchar(pred.code) %/% 6 )]
# Relative Numbers
res[ , .N / res[, .N], by = list(nPredictedCodes = 1 + nchar(pred.code) %/% 6 )]
# Agreement rate among answers where only a single code was predicted
res[nchar(pred.code) == 5, mean(pred.code == code)]

malsch/occupationCoding documentation built on March 14, 2024, 8:09 a.m.