knitr::opts_chunk$set( collapse = TRUE # , comment = "#>" )
This example uses the National Ambulatory Medical Care Survey (NAMCS) 2019 Public Use File (PUF) to replicate certain tables from the National Ambulatory Medical Care Survey: 2019 National Summary Tables. NAMCS is "an annual nationally representative sample survey of visits to non-federal office-based patient care physicians, excluding anesthesiologists, radiologists, and pathologists." Note that the unit of observation is visits, not patients -- this distinction is important since a single patient can make multiple visits.
Selected variables from NAMCS 2019 come with the surveytable
package, for use in examples, in an object called namcs2019sv
.
Begin by loading the surveytable
package.
library(surveytable)
Now, specify the survey that you'd like to analyze.
set_survey(namcs2019sv)
Check the survey name, survey design variables, and the number of observations to verify that it all looks correct.
For this example, we do want to turn on certain NCHS-specific options, such as identifying low-precision estimates. If you do not care about identifying low-precision estimates, you can skip this command. To turn on the NCHS-specific options:
set_opts(mode = "NCHS")
This table shows the overall estimated count as well as the counts and percentages by type of doctor, physician specialty, and metropolitan statistical area.
The variables that are necessary for creating this table are already in the survey, making the commands very straightforward.
total() tab("MDDO", "SPECCAT", "MSA")
The published table also shows several rates. To calculate rates, in addition to the survey, we need a source of information with population estimates. You would typically use a function such as read.csv()
to load the population estimates and get them into the correct format. The surveytable
package comes with an object called uspop2019
that contains several population estimates for use in these examples.
class(uspop2019) names(uspop2019)
Here is the overall population estimate:
uspop2019$total
Once we have the overall population estimate, the overall rate is:
total_rate(uspop2019$total)
To calculate the rates for a particular variable, we need to provide a data frame with a variable called Level
that matches the levels of the variable in the survey, and a variable called Population
that gives the population size (which is assumed to be a constant rather than a random variable).
For MSA
, we can see the levels of the variables just by using the tab()
command, just as we did above. Thus, to calculate rates, we need a data frame as follows:
uspop2019$MSA
Now that we have the appropriate population estimates, the rate is:
tab_rate("MSA", uspop2019$MSA)
We can also calculate rates of a specific variable based on the entire population:
tab_rate("MDDO", uspop2019$total) tab_rate("SPECCAT", uspop2019$total)
This table presents estimates for each age group, as well as for each age group by sex.
var_list("age")
The survey has a couple of relevant age-related variables. AGE
is the patient age in years. AGER
is a categorical variable based on AGE
. However, for this table, in addition to AGER
, we need another age group variable, with different age categories. We create it using the var_cut
function.
var_cut("Age group", "AGE" , c(-Inf, 0, 4, 14, 64, Inf) , c("Under 1", "1-4", "5-14", "15-64", "65 and over") )
Now that we've created the Age group
variable, we can create the tables:
tab("AGER", "Age group", "SEX") tab_cross("AGER", "SEX")
tab_rate("AGER", uspop2019$AGER) tab_rate("Age group", uspop2019$`Age group`) tab_rate("SEX", uspop2019$SEX)
To calculate the rates for one variable (AGER
) by another variable (SEX
), we need population estimates in the following format:
uspop2019$`AGER x SEX`
Once we have these population estimates, the rates are:
tab_subset_rate("AGER", "SEX", uspop2019$`AGER x SEX`)
This table gives the expected sources of payment. We use the PAY*
variables to create several new variables that are required by the table. Note that the PAY*
variables are logical (TRUE
or FALSE
), which simplifies the workflow. (The survey was imported into R using the importsurvey
package, which automatically detects binary variables and imports them as logical variables.)
# var_all("Medicare and Medicaid", c("PAYMCARE", "PAYMCAID")) # var_any("Payment used", c("PAYPRIV", "PAYMCARE", "PAYMCAID" , "PAYWKCMP", "PAYOTH", "PAYDK")) var_not("No other payment used", "Payment used") var_all("Self-pay", c("PAYSELF", "No other payment used")) var_all("No charge", c("PAYNOCHG", "No other payment used")) var_any("No insurance", c("Self-pay", "No charge")) # var_case("No pay", "NOPAY", "No categories marked") var_any("Unknown or blank", c("PAYDK", "No pay")) ## tab("PAYPRIV", "PAYMCARE", "PAYMCAID", "Medicare and Medicaid" , "No insurance", "Self-pay", "No charge" , "PAYWKCMP", "PAYOTH", "Unknown or blank")
Check the presentation standards flags! Under NCHS presentation standards rules, some of these estimates should not be shown.
This table shows the primary care provider and referral status, by prior-visit status.
In the table, the "Unknown" and "Blank" values are collapsed into a single value. We can collapse two or more levels of a factor into a single level using the var_collapse
function.
var_collapse("PRIMCARE", "Unknown if PCP", c("Unknown", "Blank")) var_collapse("REFER", "Unknown if referred", c("Unknown", "Blank"))
Now, for the table:
tab("PRIMCARE", "REFER", "SENBEFOR")
The percentages within each subset that is defined by SENBEFOR
add up to 100% -- for this reason, we want to use tab_subset()
, not tab_cross()
.
tab_subset("PRIMCARE", "SENBEFOR") tab_subset("REFER", "SENBEFOR")
This table shows the same information as Table 3, but only for preventive care visits. That is, estimates for each age group, as well as for each age group by sex, but only for preventive care visits.
Let's create Age group
from AGE
and cross AGER
and SEX
to create a variable called Age x Sex
:
var_cut("Age group", "AGE" , c(-Inf, 0, 4, 14, 64, Inf) , c("Under 1", "1-4", "5-14", "15-64", "65 and over") ) var_cross("Age x Sex", "AGER", "SEX")
To see the possible values of MAJOR
(Major reason for this visit), and to estimate the total count for preventive care visits:
tab("MAJOR")
To create the tables of age, sex, and their interaction, and limit them to only the preventive care visits:
tab_subset("AGER", "MAJOR", "Preventive care") tab_subset("Age group", "MAJOR", "Preventive care") tab_subset("SEX", "MAJOR", "Preventive care") tab_subset("Age x Sex", "MAJOR", "Preventive care")
As each of the above commands is similar, and differs only in the first variable that is passed to the tab_subset()
function, this code can be streamlined with a for
loop:
for (vr in c("AGER", "Age group", "SEX", "Age x Sex")) { print( tab_subset(vr, "MAJOR", "Preventive care") ) }
Note that when called from inside a for
loop, the print()
function needs to be called explicitly.
In addition, for each age-sex category, the published table shows the percentage of preventive care visits made to primary care physicians.
To calculate these percentages, a slightly more involved for
loop is needed. Below is the code, followed by an explanation:
set_opts(output = "csv", file = "my_output.csv")
set_opts(output = "csv", file = "my_output.csv", .file_temp = TRUE)
for (vr in c("AGER", "Age group", "SEX", "Age x Sex")) { var_cross("tmp", "MAJOR", vr) for (lvl in levels(surveytable:::env$survey$variables[,vr])) { print( tab_subset("SPECCAT", "tmp", paste0("Preventive care: ", lvl)) ) } } set_opts(output = "auto")
my_output.csv
.tab_subset()
is called from within a for
loop, to print the tables, we need to call the print()
function explicitly.vr
. vr
are crossed, with the result stored in a variable called tmp
.vr
, calling each of these levels lvl
. SPECCAT
(Type of specialty – Primary, Medical, Surgical) on a subset in which tmp
(which is MAJOR
crossed with vr
) is restricted to "Preventive care: "
followed by lvl
, which is some level of vr
, such as “Under 15 years” for AGER
. If you run this code, all of the tables should be stored in the CSV file. To give you an idea of what the tables should look like, here is just one of the tables:
vr = "AGER" var_cross("tmp", "MAJOR", vr) lvl = levels(surveytable:::env$survey$variables[,vr])[1] tab_subset("SPECCAT", "tmp", paste0("Preventive care: ", lvl))
To match the percentage in the published table, see the "Primary care specialty" row. Be sure to check the presentation standards flags.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.