process_accel: Process NHANES 2003-2004 and 2005-2006 accelerometry data

Description Usage Arguments Details Value Examples

View source: R/process_accel.R

Description

This function processes the raw NHANES 2003-2004 and 2005-2006 accelerometry data as provided by the NHANES study on the CDC website. Note that due to the large file size of the unzipped .xpt files, this function uses a non-trivial amount of RAM (~12 GB peak). To avoid crashing your computer when running, please ensure you have enough RAM available.

Usage

1
2
3
4
5
6
7
8
process_accel(
  names_accel_xpt = c("PAXRAW_C", "PAXRAW_D"),
  local = FALSE,
  localpath = NULL,
  urls = NULL,
  zipped = TRUE,
  check_data = FALSE
)

Arguments

names_accel_xpt

A character vector of names for the zipped raw files. These should be of the form PAXRAW_\* where \* corresponds to the letter of the alphabet indicating which "wave" the data is from. For example, PAXRAW_C and PAXRAW_D correspond to the 2003-2004 and 2005-2006 waves accelerometery data, respectively. A vector containing PAXRAW_C and PAXRAW_D is the default and will process both 2003-2004 and 2005-2006 waves data.

local

Logical value indicating whether the zippped raw .xpt accelerometry files are stored locally. If FALSE, will download the data into a temporary file from the CDC website and process the data. If TRUE, localpath must be specified and the zipped data will be sourced locally. Defaults to FALSE.

localpath

Character string indicating where the locally zipped raw .xpt files are. If local=TRUE, then localpath must be a valid local directory.

urls

Character vector provides the website URLs where the NHANES accelerometry data can be downloaded. The default contains the URLs which will directly download the data frome the CDC's website. This argument, if specified, must be the same length as the names_accel_xpt argument. Downloading the data through R is often slower than downloading the data outside of R. See the examples section below for how to download and process the data directly from the CDC.

zipped

Logical scalar indicating whether the physical activity files are in the zipped format downloaded directly from the CDC website (.ZIP). If local=FALSE and the data are downloaded from the CDC's website, this argument is ignored. Note that if the data are saved locally, processing speed is substantially increased by unzipping before calling the process_accel function.

check_data

logical value indicating whether to perform some checks of the data. If TRUE, the function will incur additional processing time. The NHANES 2003-2006 data have been tested and already passed these checks. Defaults to FALSE.

Details

This function takes the long format of the NHANES 2003-2006 accelerometry data and transforms it into the 1440+ format, with one row per participant-day, and 7 rows per participant. Although process_accel will try to process any ".xpt" or ".ZIP" file which follows the NHANES accelerometry naming convention, it has only been tested on the NHANES 2003-2006 waves' accelerometry data. As future NHANES accelerometry data are released, we intend to verify that process_accel will correctly transform the newly released data into our 1440+ format. The function documentation, and, if necessary the function itself, will be updated as needed going forward.

If the data are directly downloaded from the CDC website, the raw data will be downloaded to a temporary folder and then deleted once it's been read into R.

Value

This function will return a list with number of elements less than or equal to the number of waves of data specified by the names_accel_xpt argument. The exact number of elements returned will depend on whether all files specified by the user are found in either: 1) the local directory indicated by the localpath argument; or 2) downloadable from the website(s) indicated by the "urls" argument. Each element of the list returned is a data frame with columns:

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
## Not run: 
library("rnhanesdata")
## download and process the data directly from the cdc
## the first element of accel_ls corresponds to PAXINTEN_C and
## the second element of accel_ls corresponds to PAXINTEN_D
accel_ls <- process_accel(
        names_accel_xpt = c("PAXRAW_C","PAXRAW_D"),
        local=FALSE,
        urls= c("https://wwwn.cdc.gov/Nchs/Nhanes/2003-2004/PAXRAW_C.ZIP",
                "https://wwwn.cdc.gov/Nchs/Nhanes/2005-2006/PAXRAW_D.ZIP")
)

## check to see that the data processed using the process_accel function
## are identical to the processed data included in the package
identical(accel_ls[[1]], PAXINTEN_C)
identical(accel_ls[[2]], PAXINTEN_D)

## End(Not run)

andrew-leroux/rnhanesdata documentation built on March 6, 2020, 11:35 p.m.