clean_perch_data: Clean PERCH data

Description Usage Arguments Details Value See Also

View source: R/clean-perch-data.R

Description

clean_perch_data transforms a raw data table (row for subject, column for variable - usually {pathogen name}_{specimen} and other covariate names) into a list. It is specific for PERCH data format.

Usage

1
clean_perch_data(clean_options)

Arguments

clean_options

The list of options for cleaning PERCH data. Its elements are defined as follows:

  • raw_meas_dir: The file path to the raw data;

  • case_def: variable name in raw data for case definition;

  • case_def_val: The value for case definition;

  • ctrl_def: variable name in raw data for control definition;

  • ctrl_def_val: The value for control definition;

  • X_strat: A vector of variable names for stratifying the data to perform SEPARATE analyses;

  • X_strat_val: A list of values for X_strat. The output data will only correspond to those with identical(X_strat,X_strat_val)==TRUE. To perform analysis on a single site, say "02GAM", use X_strat="newSITE" and X_strat_val=list("02GAM");

  • pathogen_BrS_anyorder: The vector of pathogen names (arbitrary order) that have bronze-standard (BrS) measurments (cf. Wu et al. (2015) for definitions and examplels of BrS, SS, and GS). It has to be a subset of pathogens listed in taxonomy information at the file path patho_taxo_dir.

  • pathogen_SSonly: A vector of pathogens that only have SS data;

  • X_extra: A vector of covariate names for regression or visualization;

  • X_order_obs: A vector of variable names for ordering observations. For example, it can include site names or enrollment dates. It must be a subset of X_extra;

  • patho_taxo_dir: The file path to the pathogen category or taxonomy information (.csv). The information should be as complete as possible to display all pathogens considered in an actual study;

  • date_formatspossible formats of date; default is c("%d%B%Y","%d%B%y"). See parse_date_time for a complete list of date formats.

  • allow_missing: TRUE for using an observation that has either BrS missing, or SS missing. Set it to TRUE if we want to use the SS information from some cases who missed BrS measurements. In other words, all the subjects' data will be used if allow_missing is set to TRUE.

  • extra_meas_nm: a list of (pathogen,specimen,test) names, each of which is considered extra measurements informative for etiology.

Details

It deletes cases (Y==1) having two positives for BCX measures. We suggest put c("newSITE","ENRLDATE") in X_extra and X_order_obs to order cases and controls separately according to site and enrollment date. In current implementation, the raw data must have both BrS and SS measurements.

Value

A List: list(Mobs,Y,X,JSS,pathogen_BrS_ordered_by_MSS,pathogen_BrS_cat), or with additional JSSonly, pathogen_SSonly_cat if silver- standard only pathogens are supplied.

This function does not re-order pathogens that only have silver-standard data.

See Also

parse_date_time


zhenkewu/nplcm documentation built on May 4, 2019, 10:19 p.m.