Home

/

CRAN

/

labourR

/

classify_occupation: Classify occupations

classify_occupation: Classify occupations
In labourR: Classify Multilingual Labour Market Free-Text to Standardized Hierarchical Occupations

Description Usage Arguments Details Value References Examples

View source: R/occupations_classify.R

This function takes advantage of the hierarchical structure of the ESCO-ISCO mapping and matches multilingual free-text with the ESCO occupations vocabulary in order to map semi-structured vacancy data into the official ESCO-ISCO classification.

classify_occupation(
  corpus,
  id_col = "id",
  text_col = "text",
  lang = "en",
  num_leaves = 10,
  isco_level = 3,
  max_dist = 0.1,
  string_dist = NULL
)

`corpus`	A data.frame or a data.table that contains the id and the text variables.
`id_col`	The name of the id variable.
`text_col`	The name of the text variable.
`lang`	The language that the text is in.
`num_leaves`	The number of occupations/neighbors that are kept when matching.
`isco_level`	The ISCO level of the suggested occupations. Can be either 1, 2, 3, 4 for ISCO occupations, or NULL that returns ESCO occupations.
`max_dist`	String distance used for fuzzy matching. The `amatch` function from the stringdist package is used.
`string_dist`	String dissimilarity measurement. Available string distance metrics: `stringdist-metrics`.

First, the input text is cleansed and tokenized. The tokens are then matched with the ESCO occupations vocabulary, created from the preferred and alternative labels of the occupations. They are joined with the tfidf weighted tokens of the ESCO occupations and the sum of the tf-idf score is used to retrieve the suggested ontologies. Technically speaking, the suggested ESCO occupations are retrieved by solving the optimization problem,

\arg\max_d≤ft\{\vec{u}_{binary}\cdot \vec{u}_d\right\}

where, \vec{u}_{binary} stands for the binary representation of a query to the ESCO-vocabulary space, while, \vec{u}_d is the ESCO occupation normalized vector generated by the tf-idf numerical statistic. If an ISCO level is specified, the k-nearest neighbors algorithm is used to determine the suggested occupation, classified by a plurality vote in the corresponding hierarchical level of its neighbors.

Before the suggestions are returned, the preferred label of each suggested occupation is added to the result, using the occupations_bundle and isco_occupations_bundle as look-up tables.

Either a data.table with the id, the preferred label and the suggested ESCO occupation URIs (num_leaves predictions for each id), or a data.table with the id, the preferred label and the suggested ISCO group of the inputted level (one for each id).

M.P.J. van der Loo (2014). The stringdist package for approximate string matching. R Journal 6(1) pp 111-122.

Gweon, H., Schonlau, M., Kaczmirek, L., Blohm, M., & Steiner, S. (2017). Three Methods for Occupation Coding Based on Statistical Learning, Journal of Official Statistics, 33(1), 101-122.

Arthur Turrell, Bradley J. Speigner, Jyldyz Djumalieva, David Copple, James Thurgood (2019). Transforming Naturally Occurring Text Data Into Economic Statistics: The Case of Online Job Vacancy Postings.

ESCO Service Platform - The ESCO Data Model documentation

corpus <- data.frame(
 id = 1:3,
 text = c(
   "Junior Architect Engineer",
   "Cashier at McDonald's",
   "Priest at St. Martin Catholic Church"
 )
)
classify_occupation(corpus = corpus, isco_level = 3, lang = "en", num_leaves = 5)

labourR documentation built on July 18, 2020, 5:06 p.m.

labourR index

README.md Introduction to labourR Occupations Classification

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

labourR
Classify Multilingual Labour Market Free-Text to Standardized Hierarchical Occupations

classify_occupation: Classify occupations
In labourR: Classify Multilingual Labour Market Free-Text to Standardized Hierarchical Occupations

Description

Usage

Arguments

Details

Value

References

Examples

Related to classify_occupation in labourR...

R Package Documentation

Browse R Packages

We want your feedback!

labourR Classify Multilingual Labour Market Free-Text to Standardized Hierarchical Occupations

classify_occupation: Classify occupations In labourR: Classify Multilingual Labour Market Free-Text to Standardized Hierarchical Occupations

Description

Usage

Arguments

Details

Value

References

Examples

Related to classify_occupation in labourR...

R Package Documentation

Browse R Packages

We want your feedback!

labourR
Classify Multilingual Labour Market Free-Text to Standardized Hierarchical Occupations

classify_occupation: Classify occupations
In labourR: Classify Multilingual Labour Market Free-Text to Standardized Hierarchical Occupations