This vignette will show how a Provider of Services report layout file can be
quickly parsed to extract the descriptive variable names it contains. POS
datasets from year 2010 and earlier have generic variable names like PROV0001,
PROV0002, ...
that offer no insight into what the variable actually is. In the
Layout file, along with a data dictionary explaining the variable's values,
there is also a COBOL descriptive name. The pos_names_extract()
function will
parse this file and return the descriptive names, in the order that matches the
variables in the dataset.
I have included a sample of the 2010 Provider of Services data for hospices. The full 2010 file (along with many other years) is available from the NBER and contains data from other provider types as well.
library(medicare) # load the package data data(pos2010, package = "medicare") names(pos2010)[1:10]
These variable names are useless, and with over 500 variables it is impractical to look up each one. Instead, we can parse the layout file to obtain useful names. In this example, I have bundled the Layout 2010 file with this package, but I expect the user to have the downloaded text file that corresponds to each dataset in use.
# filepath should be changed by user filepath <- system.file("extdata", "layout10.txt", package = "medicare") names_2010 <- pos_names_extract(filepath, pos2010) names_2010[1:10]
These are much more descriptive variable names and worth using.
pos2010_renamed <- pos2010 names(pos2010_renamed) <- names_2010
Note that it is up to the user to make sure that the layout file is appropriate for the chosen data file. Each year's layout file is different, so each year must be parsed separately. The function checks whether the number of variables in the layout file and dataset match and whether the generic variable names are the same in both. It will stop if there's a problem. If the generic names from dataset 20XX are the same as in layout 20YY, the parsing should work, but won't necessarily be accurate. CMS is not 100% consistent with variable naming across years.
pos2010_short <- pos2010[, 1:500] names_2010_short <- pos_names_extract(filepath, pos2010_short)
pos2010_wrong_names <- pos2010 names(pos2010_wrong_names)[1:3] <- c("wrong1", "wrong2", "wrong3") names_2010_wrong_names <- pos_names_extract(filepath, pos2010_wrong_names)
In order to same the user time and headaches of
downloading each year's Layout file, I have pre-compiled dataset names for years
2000-2010. These can be accessed via the pos_names()
function. By looking at
inner variables, this also illustrates how the dataset layouts change over time:
for (year in 2000:2010) { print(year) print(pos_names(year)[200:205]) }
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.