knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
I use socR to handle almost everyday to handle common tasks that involve occupational or industrial codes. The most common task I have involves dealing with coding systems. This vignette is designed to show you how I do common tasks with socR.
This is often the first thing you have to do. I save my coding system data on github pages. It is public data, feel free to use it. If you want to add a coding system to my github repository, let me know. As long as there are no licensing issues, I'll be happy to add it.
As an example, I will load the soc2000 system from https://danielruss.github.io/codingsystems/soc2000_all.csv. I actually use soc2010 in my work, but that comes with socR, as does a few other I that use often.
library(socR) soc2000_all <- codingsystem("https://danielruss.github.io/codingsystems/soc2000_all.csv",name="soc2000") soc2000_all
A coding system is an S3 class that wraps a tibble. The coding system is required
to have a column name code
and a column named title
. The other columns are optional,
however, if you want to move up the code hierarchy having the additional columns are
useful. In this example, soc2000 has Level
which corresponds to the number of digits
in the code (not counting trailing zeros, e.g. 11-0000 is a 2-digit code (Level=2)
and 11-1010 is a 5-digit code Level=5). The parent
column is the immediate parent
in the heirarchy of a coding system. The columns soc2d
through soc6d
are the codes
at the various levels. My codingsystem use NA
to mark cases that don't exist
(e.g. the soc6d for 11-0000). The codingsystem also has a name that is printed out
for your use.
Here is the soc2010 coding system that comes with socR. There is also a soc2010_6d, which is deprecated and will be removed soon since you can create it from by filtering soc2010_all.
soc2010_all
Given a vector of soc codes, you may want to convert them to 2-digit socs. In order to do this we use a function factory method to create the appropriate function.
## create a function to convert a vector of codes to a the 2-digit level ## notice we are uses the column name that contains the 2-digit socs for ## each code to_2d <- to_level(soc2000_all,soc2d) to_2d(c("11-1021","11-1031")) ## lets do it for a tibble... my_data <- tibble::tibble(resp_id=c("A13254","A33122"),soc2000=c("11-1021","11-1031")) |> dplyr::mutate(soc2000_2d=to_2d(soc2000)) my_data
Sometimes you want to check if your data has invalid codes. socR has a few ways
of checking codes. If you have a coding system, you can create a function using
a provided factory method valid_code
which takes either a coding system or a
vector of codes. This is why the data had to have a column named code
, the codingsystem
knows which column is the code column and can create a list of all the valid codes for
you. If you want, you could replace the codingsystem object with a vector of valid codes
is_valid_soc2000 <- valid_code(soc2000_all) is_valid_soc2000( c("11-0000","11","11-1021","11-1030") )
Sometime you are not interested in the entire coding system, but only the codes
at a particular level. Since a codingsystem is a thin wrapper around a tibble, you can
use some of the dplyr
verbs (select and filter -- I can add others if needed). Now
you see why I named the variable soc2000_all
. If you get odd errors when you filter,
you may be using the wrong filter function. The stats package, which is loaded by
default, has a filter method.
soc2000_5d <- soc2000_all |> dplyr::filter(Level == 5,name="soc2000_5d") soc2000_5d ## you can check for valid 5-digit soc codes is_valid_5digit_soc2010 <- valid_code(soc2000_5d) is_valid_5digit_soc2010( c("11-0000","11","11-1021","11-1030") )
If you need a dplyr verb that I don't support, if you ask I might be able to add it.
Otherwise, the work around is to get the tibble from the codingsystem which is the
table
entry of the S3 codingsystem object. Since you now have a tibble, you can
continue working with it as any other tibble, or convert it back to a codingsystem using
the as_codingsystem
function. You will need to give the codingsystem a name, or
it will default to something useless like coding system.
soc2000_3d <- soc2000_all$table |> dplyr::filter(Level == 3) |> as_codingsystem(name="soc2000_3d") soc2000_3d
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.