Introduction to labourR

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

To assist research on the labour market, ESCO has defined a taxonomy for occupations. Occupations are specified and organized in a hierarchical structure based on the International Standard Classification of Occupations (ISCO). labourR is a new package that performs occupations coding for multilingual free-form text (e.g. a job title) using the ESCO hierarchical classification model.

The initial motivation was to retrieve the work experience history from a Curriculum Vitae for the purpose of statistical analysis of data from the Europass online CV editor. In the approach followed, the first step is to generate the term frequency–inverse document frequency numerical statistic for each term found in the ESCO occupations corpus. Then, the input query receives a score for each ESCO occupation based on the matched terms found on the ESCO vocabulary. Given an ISCO level, the classification is performed by a plurality vote in the corresponding hierarchical level of the ESCO ontologies with the highest score.

The labourR package:

Installation

You can install the released version of labourR from CRAN with,

install.packages("labourR")

Examples

library(labourR)
corpus <- data.frame(
  id = 1:3,
  text = c("Data Scientist", "Junior Architect Engineer", "Cashier at McDonald's")
)

For a given ISCO hierarchical level, the top suggested ISCO group is returned. num_leaves specifies the number of ESCO occupations used by the classifier to perform a plurality vote,

classify_occupation(corpus = corpus, isco_level = 3, lang = "en", num_leaves = 5)


Try the labourR package in your browser

Any scripts or data that you put into this service are public.

labourR documentation built on July 18, 2020, 5:06 p.m.