knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

In this vignette, I will demonstrate how to use LinkedInJobsScrapeR for extracting and tidying the data that was scraped using the scrape_job() function. This vignette assumes that you have scraped data in the standard directory structure generated by scrape_job():

data/
└── job title/
    └── experience level/
        └── location/
            └── file

Extracting job metadata

First we can extract the job listing metadata using the get_job_ad_metadata() function.

This function will return: - Job ID (LinkedIn internal ID) - Job title - Company advertising the job - Location (city, state) - URL link to the job ad

Importantly, each scrape file contains all of the job listings for a given search query, and since search queries are unique at the data/job title/experience level/location/ level, we only need to scrape one file in each of the folders in order to get the metadata for all of the job ads in that search.

First we can construct a list of files needed for metadata extraction as follows:

# Generate a list of files for metadata extraction
# we only need 1 file per job search results page
# so we will take the first one for each location folder
files_for_metadata <- c()
for(i in 1:length(locations)){
  for(k in 1:length(experience_levels)){
    for (j in 1:length(job_titles)){

      job_title_no_space <- gsub("\\s", "", job_titles[j])

      file <- list.files(paste0('data/',
                                 job_title_no_space, '/',
                                 experience_levels[k], '/',
                                 locations[[i]][2]),
                         full.names = T)[[1]]

      files_for_metadata <- c(files_for_metadata, file)
    }
  }
}

Then we can extract the data from these files and put them in the job_metadata dataframe.

# Extract the metadata for all of the files
job_metadata <- data.frame()
for(i in 1:length(files_for_metadata)){
  metadata <- get_job_ad_metadata(files_for_metadata[i])
  job_metadata <- rbind(job_metadata, metadata)
}

Extracting job descriptions and criteria

Next we can extract the job descriptions and job criteria using the get_job_description() function. This function will return description and criteria as separate elements.

Unlike the metadata extraction above, this function needs to run on all files. So first we get a list of files, and then we run a loop over them and join the data together.

# Get a list of all scraped files
job_ads <- list.files("data", recursive = T, full.names = T)

# Extract the job descriptions and criteria
job_desc <- data.frame()
job_criteria <- data.frame()
for(i in 1:length(job_ads)){
  details <- get_job_description(job_ads[i])
  job_desc <- rbind(job_desc, details$description)
  job_criteria <- rbind(job_criteria, details$criteria)
}

End

After running the above, we have three dataframes, containing job ad metadata, descriptions, and criteria.

job_metadata
job_desc
job_criteria


tylerburleigh/LinkedInJobsScrapeR documentation built on Nov. 5, 2019, 11:02 a.m.