knitr::opts_chunk$set(
  collapse = TRUE,
  warning = FALSE,
  message = FALSE,
  comment = "#>"
)
rc <- knitr::read_chunk
rc(here::here("data-raw/data-valid.R"))

The wages contain some values that are unlikely to be true. To detect these unlikely values, we fitted a robust linear regression to each individual. More specifically, for each individual we fitted the model

$$y_i = \beta_0 + \beta_1 x_i + e_i$$ where

using the iterated re-weighted least squares (IWLS) process.

Observations with weights (used in the IWLS process) less than 0.12 are modified as missing values. An alternative wage is predicted from the fitted robust linear regression model for these censored values and stored in another variable. The threshold of the weights was determined by visualising the effect of different thresholds using the shiny app found here.



wages_after


numbats/yowie documentation built on June 7, 2022, 10:29 a.m.