```{css, echo = FALSE} .source { font-size: 0.7em; background-color: #ffcc66; padding: 10px; }

.source .sourceCode { background-color: #ffffe6 }

```r
library(knitr)
opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  cache = TRUE,
  cache.path = "cache/",
  warning = FALSE,
  message = FALSE
)
# work around as per https://github.com/yihui/knitr/issues/1647
rc <- read_chunk
rc(here::here("data-raw/data-preprocessing.R"))

The data cleaning step is conducted using the suite of packages in tidyverse.


Reading the data {#read-data}

Click here for the source file to read data (note this is quite long) The code below is provided by the NLSY79 database to do the reading and initial processing of the data. Note that we did not modify this script except for the location of the file.



The above source code creates a data set new_data_qnames and categories_qnames. As shown below, the column names contain information on the job number (HRP1 = job 1, HRP2 = job 2, ..., HRP5 = job 5) and the year information.


str(categories_qnames, list.len = 20)

Demographic variables

Tidying the date of birth data

The month and year of birth is recorded in 1979 and 1981 for each individual. The records in 1981 are missing for some individuals so we take the month and year of birth from 1979 records.

Where the record is present for both 1979 and 1981, we check that the record matches.


cat("All birth month and year recorded in 1979 and 1981 match.")
cat("The birth record does not match for the following individuals.
")
dob_tidy %>%
  filter(dob_conflict)
as_tibble(dob_tidy)

Getting the race and sex data


as_tibble(demog_tidy)

Tidying the education data


as_tibble(demog_education)

Getting the highest year completed


as_tibble(highest_year)

demog_nlsy79


as_tibble(demog_nlsy79)

Tidying the employment information


as_tibble(hours_all)

as_tibble(rates_all)

as_tibble(st_work)

as_tibble(exp)

as_tibble(hours_wages)

Subsetting to the high school population


as_tibble(wages_demog)
as_tibble(wages_before)


numbats/yowie documentation built on June 7, 2022, 10:29 a.m.