pheno.list: List to describe the covariate and outcome data
In CGEN: An R package for analysis of case-control studies in genetic epidemiology

The list to describe the covariate and outcome data for GxE.scan.

The format is: List of 14

file: Covariate data file. This file must have variable names, two of which being an id variable and a response variable (see id.var and response.var). No default.
id.var: Name of the id variable(s). No default.
response.var: Name of the binary response variable. This variable must be coded as 0 and 1. No default.
strata.var: Stratification variable name or a formula for variables in file. See the individual model documentation for the allowable stratifications. The default is NULL so that all observations belong to the same strata.
main.vars: Character vector of variables names or a formula for variables in file that will be included in the model as main effects. The default is NULL.
int.vars: Character vector of variable names or a formula for variables in file that will be included in the model as interactions with each SNP in the genotype data. The default is NULL.
file.type: 1, 3, 4. 1 is for an R object file created with the save() function. 3 is for a table that will be read in with read.table(). 4 is for a SAS data set. The default is 3.
delimiter: The delimiter in file. The default is "".
factor.vars: Vector of variable names to convert into factors. The default is NULL.
in.miss: Vector of character strings to define the missing values. This option corresponds to the option na.strings in read.table(). The default is "NA".
subsetData: List of sublists to subset the phenotype data for analyses. Each sublist should contain the names "var", "operator" and "value" corresponding to a variable name, operator and values of the variable. Multiple sublists are logically connected by the AND operator. For example,
subsetData=list(list(var="GENDER", operator="==", value="MALE"))
will only include subjects with the string "MALE" for the GENDER variable.
subsetData=list(list(var="AGE", operator=">", value=50),
list(var="STUDY", operator="%in%", value=c("A", "B", "C")))
will include subjects with AGE > 50 AND in STUDY A, B or C. The default is NULL.
cc.var: Name of the cc.var variable used in snp.matched. The default is NULL.
nn.var: Name of the nn.var variable used in snp.matched. The default is NULL.

In this list, file, id.var, and response.var must be specified. The variable id.var is the link between the covariate data and the genotype data. For each subject id, there must be the same subject id in the genotype data for that subject to be included in tha analysis. If the genotype data is in a PLINK format, then id.var must be of length 2 corresponding the the family id and subject id.

Missing data: If any of the variables defined in main.vars, int.vars, strata.var, or response.var contain missing values, then those subjects will be removed from the covariate and outcome data. After the subjects with missing values are removed, the subject ids are matched with the genotype data.

CGEN documentation built on April 28, 2020, 8:08 p.m.