```{css, echo = FALSE} .source { font-size: 0.7em; background-color: #ffcc66; padding: 10px; }
.source .sourceCode { background-color: #ffffe6 }
```r library(knitr) opts_chunk$set( collapse = TRUE, comment = "#>", cache = TRUE, cache.path = "cache/", warning = FALSE, message = FALSE ) # work around as per https://github.com/yihui/knitr/issues/1647 rc <- read_chunk rc(here::here("data-raw/data-preprocessing.R"))
The data cleaning step is conducted using the suite of packages in tidyverse
.
Click here for the source file to read data (note this is quite long)
The code below is provided by the NLSY79 database to do the
reading and initial processing of the data. Note that we did not modify this script except for
the location of the file.
The above source code creates a data set new_data_qnames
and categories_qnames
.
As shown below, the column names contain information on the job number
(HRP1 = job 1, HRP2 = job 2, ..., HRP5 = job 5) and the year information.
str(categories_qnames, list.len = 20)
The month and year of birth is recorded in 1979 and 1981 for each individual. The records in 1981 are missing for some individuals so we take the month and year of birth from 1979 records.
Where the record is present for both 1979 and 1981, we check that the record matches.
cat("All birth month and year recorded in 1979 and 1981 match.")
cat("The birth record does not match for the following individuals. ") dob_tidy %>% filter(dob_conflict)
as_tibble(dob_tidy)
as_tibble(demog_tidy)
as_tibble(demog_education)
as_tibble(highest_year)
demog_nlsy79
as_tibble(demog_nlsy79)
as_tibble(hours_all)
as_tibble(rates_all)
as_tibble(st_work)
as_tibble(exp)
as_tibble(hours_wages)
as_tibble(wages_demog) as_tibble(wages_before)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.