knitr::opts_chunk$set( collapse = TRUE # , comment = "#>" )
This example uses the National Study of Long-Term Care Providers (NSLTCP) Residential Care Community (RCC) Services User (SU) 2018 Public Use File (PUF) to replicate the estimates from a report called Residential Care Community Resident Characteristics: United States, 2018. "The survey used a sample of residential care community residents, obtained from a frame that was constructed from lists of licensed residential care communities acquired from the state licensing agencies in each of the 50 states and the District of Columbia."
The RCC SU 2018 survey comes with the surveytable
package, for use in examples, in an object called rccsu2018
.
Begin by loading the surveytable
package.
library(surveytable)
Now, specify the survey that you'd like to analyze.
set_survey(rccsu2018)
Check the survey name, survey design variables, and the number of observations to verify that it all looks correct.
For this example, we do want to turn on certain NCHS-specific options, such as identifying low-precision estimates. If you do not care about identifying low-precision estimates, you can skip this command. To turn on the NCHS-specific options:
set_opts(mode = "NCHS")
Alternatively, you can combine these two commands into a single command, like so:
set_survey(rccsu2018, mode = "NCHS")
This figure shows the percentage of residents by sex, race / ethnicity, and age group.
Sex.
tab("sex")
Race / ethnicity.
var_list("race") tab("raceeth2")
In the published figure, the Hispanic and Other categories have been merged into a single category called "Another race or ethnicity". We can do that using the var_collapse()
function.
var_collapse("raceeth2" , "Another race or ethnicity" , c("Hispanic", "Other")) tab("raceeth2")
Age group.
var_list("age")
age2
is a numeric variable. We need to create a categorical variable based on this numeric variable. This is done using the var_cut()
function.
var_cut("Age", "age2" , c(-Inf, 64, 74, 84, Inf) , c("Under 65", "65-74", "75-84", "85 and over") ) tab("Age")
This figure shows the percentage of residents with Medicaid, overall and by age group.
tab("medicaid2")
As we can see, for some observations, the value of this variable is unknown (it's missing or NA
). The above command calculates percentages based on all observations, including the ones with missing (NA
) values. However, in the published figure, the percentages are based on the knowns only. To exclude the NA
's from the calculation, use the drop_na
argument:
tab("medicaid2", drop_na = TRUE)
Note that the table title alerts you to the fact that you are using known values only.
By age group:
tab_subset("medicaid2", "Age", drop_na = TRUE)
Note that according to the NCHS presentation criteria, some of the percentages should be suppressed.
(Figure 3 is slightly more involved, so we'll do it next.)
Here's a table for high blood pressure.
tab("hbp")
Once again, unknown values (NA
) are present, while the figure is based on knowns only. Therefore, we again will use the drop_na
argument:
tab("hbp", "alz", "depress", "arth", "diabetes", "heartdise", "osteo" , "copd", "stroke", "cancer" , drop_na = TRUE)
surveytable
provides a number of functions to create or modify survey variables. var_collapse()
and var_cut()
.Occasionally, you might need to do advanced variable editing. Here's how:
Every survey object has an element called variables
class(rccsu2018$variables)
variables
data frame (which is part of the survey object).set_survey()
again. Any time you modify the variables
data frame, call set_survey()
.We go through these steps to count how many chronic conditions were present.
rccsu2018$variables$num_cc = 0 for (vr in c("hbp", "alz", "depress", "arth", "diabetes", "heartdise", "osteo" , "copd", "stroke", "cancer")) { idx = which(rccsu2018$variables[,vr]) rccsu2018$variables$num_cc[idx] = rccsu2018$variables$num_cc[idx] + 1 } set_survey(rccsu2018, mode = "NCHS")
num_cc
is a numeric variable with the number of chronic conditions. The published figure uses a categorical variable which is based on this numeric variable. Use var_cut()
, which converts numeric variables to categorical (factor
) variables.
var_cut("Number of chronic conditions", "num_cc" , c(-Inf, 0, 1, 3, 10, Inf) , c("0", "1", "2-3", "4-10", "??")) tab("Number of chronic conditions")
Here's a table for bathhlp
(help with bathing):
tab("bathhlp")
This variable has multiple levels.
"NEED NO ASSISTANCE"
) = does not need help"MISSING"
) = unknownWe want to show (resident needing help) as a percentage of knowns only (that is, excluding the unknowns).
To do this, convert the variable to having 2 levels (needs help / does not need help) plus NA
(for unknown); then use the drop_na
argument to base percentages on knowns only.
for (vr in c("bathhlp", "walkhlp", "dreshlp", "transhlp", "toilhlp", "eathlp")) { var_collapse(vr , "Needs assistance" , c("NEED HELP OR SUPERVISION FROM ANOTHER PERSON" , "USE OF AN ASSISTIVE DEVICE" , "BOTH")) var_collapse(vr, NA, "MISSING") } tab("bathhlp", "walkhlp", "dreshlp", "transhlp", "toilhlp", "eathlp", drop_na = TRUE)
Now, go through the "advanced variable editing" steps -- very similar to Figure 4 -- to count how many ADLs were present.
rccsu2018$variables$num_adl = 0 for (vr in c("bathhlp", "walkhlp", "dreshlp", "transhlp", "toilhlp", "eathlp")) { idx = which(rccsu2018$variables[,vr] %in% c("NEED HELP OR SUPERVISION FROM ANOTHER PERSON" , "USE OF AN ASSISTIVE DEVICE" , "BOTH")) rccsu2018$variables$num_adl[idx] = rccsu2018$variables$num_adl[idx] + 1 } set_survey(rccsu2018, mode = "NCHS")
For generating the figure, create a categorical variable based on num_adl
, which is numeric.
var_cut("Number of ADLs", "num_adl" , c(-Inf, 0, 2, 6, Inf) , c("0", "1-2", "3-6", "??")) tab("Number of ADLs")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.