clean_perch_data: Clean PERCH data
In zhenkewu/nplcm: Nested Partially-Latent Class Models (nplcm)

Description Usage Arguments Details Value See Also

clean_perch_data transforms a raw data table (row for subject, column for variable - usually {pathogen name}_{specimen} and other covariate names) into a list. It is specific for PERCH data format.

1	clean_perch_data(clean_options)

clean_options

The list of options for cleaning PERCH data. Its elements are defined as follows:

raw_meas_dir: The file path to the raw data;
case_def: variable name in raw data for case definition;
case_def_val: The value for case definition;
ctrl_def: variable name in raw data for control definition;
ctrl_def_val: The value for control definition;
X_strat: A vector of variable names for stratifying the data to perform SEPARATE analyses;
X_strat_val: A list of values for X_strat. The output data will only correspond to those with identical(X_strat,X_strat_val)==TRUE. To perform analysis on a single site, say "02GAM", use X_strat="newSITE" and X_strat_val=list("02GAM");
pathogen_BrS_anyorder: The vector of pathogen names (arbitrary order) that have bronze-standard (BrS) measurments (cf. Wu et al. (2015) for definitions and examplels of BrS, SS, and GS). It has to be a subset of pathogens listed in taxonomy information at the file path patho_taxo_dir.
pathogen_SSonly: A vector of pathogens that only have SS data;
X_extra: A vector of covariate names for regression or visualization;
X_order_obs: A vector of variable names for ordering observations. For example, it can include site names or enrollment dates. It must be a subset of X_extra;
patho_taxo_dir: The file path to the pathogen category or taxonomy information (.csv). The information should be as complete as possible to display all pathogens considered in an actual study;
date_formatspossible formats of date; default is c("%d%B%Y","%d%B%y"). See parse_date_time for a complete list of date formats.
allow_missing: TRUE for using an observation that has either BrS missing, or SS missing. Set it to TRUE if we want to use the SS information from some cases who missed BrS measurements. In other words, all the subjects' data will be used if allow_missing is set to TRUE.
extra_meas_nm: a list of (pathogen,specimen,test) names, each of which is considered extra measurements informative for etiology.

It deletes cases (Y==1) having two positives for BCX measures. We suggest put c("newSITE","ENRLDATE") in X_extra and X_order_obs to order cases and controls separately according to site and enrollment date. In current implementation, the raw data must have both BrS and SS measurements.

A List: list(Mobs,Y,X,JSS,pathogen_BrS_ordered_by_MSS,pathogen_BrS_cat), or with additional JSSonly, pathogen_SSonly_cat if silver- standard only pathogens are supplied.

Mobs A list of bronze- (MBS), silver- (MSS), and gold-standard (MGS, if available) measurements. Here if all pathogens have BrS measures, MSS has the same number of columns as in MBS; if some pathogens only have SS measures, then MSS will have extra columns;
Y 1 for case; 0 for control;
X Data frame of covariates for cases and controls. The covariate names are specified in X_extra;
JSS Number of pathogens having both silver- and bronze-standard data;
pathogen_BrS_ordered_by_MSS Ordered vector of pathogen names. Pathogens with both SS and BrS pathogens are ordered first and then those with only BrS measurements. Note that for a pathogen name vector of arbitrary order (pathogen_BrS_anyorder), this function just picks out those pathogens with BrS+SS measures and puts them at the front. Other pathogens with only BrS measures are not reordered. Pathogens with only silver-standard measures are not included.
pathogen_BrS_cat Pathogen categories ordered according to pathogen_BrS_ordered_by_MSS.
JSSonly Number of pathogens with only silver-standard measures;
pathogen_SSonly_cat Category of pathogens with only silver-standard data.
extra_Mobs extra measurements as requested by extra_meas_nm.