knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
In this vignette, I will demonstrate how to use LinkedInJobsScrapeR
for extracting and tidying the data that was scraped using the scrape_job()
function. This vignette assumes that you have scraped data in the standard directory structure generated by scrape_job()
:
data/ └── job title/ └── experience level/ └── location/ └── file
First we can extract the job listing metadata using the get_job_ad_metadata()
function.
This function will return: - Job ID (LinkedIn internal ID) - Job title - Company advertising the job - Location (city, state) - URL link to the job ad
Importantly, each scrape file contains all of the job listings for a given search query, and since search queries are unique at the data/job title/experience level/location/
level, we only need to scrape one file in each of the folders in order to get the metadata for all of the job ads in that search.
First we can construct a list of files needed for metadata extraction as follows:
# Generate a list of files for metadata extraction # we only need 1 file per job search results page # so we will take the first one for each location folder files_for_metadata <- c() for(i in 1:length(locations)){ for(k in 1:length(experience_levels)){ for (j in 1:length(job_titles)){ job_title_no_space <- gsub("\\s", "", job_titles[j]) file <- list.files(paste0('data/', job_title_no_space, '/', experience_levels[k], '/', locations[[i]][2]), full.names = T)[[1]] files_for_metadata <- c(files_for_metadata, file) } } }
Then we can extract the data from these files and put them in the job_metadata
dataframe.
# Extract the metadata for all of the files job_metadata <- data.frame() for(i in 1:length(files_for_metadata)){ metadata <- get_job_ad_metadata(files_for_metadata[i]) job_metadata <- rbind(job_metadata, metadata) }
Next we can extract the job descriptions and job criteria using the get_job_description()
function. This function will return description
and criteria
as separate elements.
Unlike the metadata extraction above, this function needs to run on all files. So first we get a list of files, and then we run a loop over them and join the data together.
# Get a list of all scraped files job_ads <- list.files("data", recursive = T, full.names = T) # Extract the job descriptions and criteria job_desc <- data.frame() job_criteria <- data.frame() for(i in 1:length(job_ads)){ details <- get_job_description(job_ads[i]) job_desc <- rbind(job_desc, details$description) job_criteria <- rbind(job_criteria, details$criteria) }
After running the above, we have three dataframes, containing job ad metadata, descriptions, and criteria.
job_metadata job_desc job_criteria
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.