knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This vignette will go into the details of the ESCO/ISCO hierarchical relationship and explain how the labourR package takes advantage of that relationship to suggest occupations for multilingual free-text vacancy data.
ESCO is the multilingual classification of European Skills, Competences, Qualifications and Occupations. ESCO works as a dictionary, describing, identifying and classifying professional occupations, skills, and qualifications relevant for the EU labour market and education and training. Those concepts and the relationships between them can be understood by electronic systems, which allows different online platforms to use ESCO for services like matching jobseekers to jobs on the basis of their skills, suggesting training to people who want to reskill or upskill etc.
ISCO, on the other hand, the International Standard Classification of Occupations, is a four-level classification of occupation groups managed by the International Labour Organisation (ILO). Its structure follows a grouping by education level. The two latest versions of ISCO are ISCO-88 (dating from 1988) and ISCO-08 (dating from 2008).
In ESCO, each occupation is mapped to exactly one ISCO-08 code. ISCO-08 can therefore be used as a hierarchical structure for the occupations pillar. ISCO-08 provides the top four levels for the occupations pillar. ESCO occupations are located at level 5 and lower.
knitr::include_graphics("../man/figures/ESCO_ISCO_hierarchy.png")
To find more about ESCO and its relationship with ISCO, visit ESCOpedia, the online reference to the ESCO classification.
The goal of labourR is to map multilingual free-text of occupations, such as a job title in a Curriculum Vitae, to existing hierarchical ontologies of ESCO and ISCO classification and showcase their importance in understanding and analyzing labour market. Computations are vectorized and the data.table package is used for high performance and memory efficiency. 
In the following we will explain how the classifier maps free-text vacancy data into the ESCO-ISCO official ontologies and takes advantage of their hierarchy.
The occupations classifier takes as input the following,
A corpus in tabular form containing the id and the text variables.
Names of the id and the text variable.
Corpus language.
Number of ESCO ontologies that are used for k-NN.
ISCO level for the suggested occupations. If it is 1-4, then the suggested ISCO Group is returned. If NULL the top ESCO occupations are returned.
First, the input text is cleansed and tokenized. The tokens are then matched with the ESCO occupations vocabulary, created from the \code{\link[=occupations_bundle]{preferred and alternative labels}} of the occupations. They are joined with the \code{\link[=tf_idf]{tfidf}} weighted tokens of the ESCO occupations and the sum of the tf-idf score is used to retrieve the suggested ontologies. Precisely, the suggested ESCO occupations are retrieved by solving the optimization problem,
$$\arg \max_d \left{ \vec{u}_{binary} \cdot \vec{u}_d \right}$$
where, $\vec{u}_{binary}$ stands for the binary representation of a query to the ESCO-vocabulary space, while, $\vec{u}_d$ is the ESCO occupation normalized vector generated by the tf-idf numerical statistic. If an ISCO level is specified, the K-NN algorithm is used to determine the suggested occupation, classified by a plurality vote in the corresponding hierarchical level of its neighbors.
library(labourR) library(data.table) library(magrittr) corpus <- data.table( id = 1:3, text = c( "Insegnante di scuola primaria", "Sales and marketing assistant manager", "Data Scientist" ) )
One functionality the package provides is language identification using the cld2 package, that is based on a Naive Bayes classifier.
corpus[, language := identify_language(text)]
For num_leaves equal to 10 (ESCO occupations) and isco_level equal to 3, the suggested occupation is returned for each identified language respectively,
languages <- unique(corpus$language) suggestions <- lapply(languages, function(lang) { classify_occupation( corpus = corpus[language == lang], lang = lang, isco_level = 3, num_leaves = 10 ) }) %>% rbindlist
suggestions
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.