pheno.list: List to describe the covariate and outcome data

Description Format Details

Description

The list to describe the covariate and outcome data for GxE.scan.

Format

The format is: List of 14

file

Covariate data file. This file must have variable names, two of which being an id variable and a response variable (see id.var and response.var). No default.

id.var

Name of the id variable(s). No default.

response.var

Name of the binary response variable. This variable must be coded as 0 and 1. No default.

strata.var

Stratification variable name or a formula for variables in file. See the individual model documentation for the allowable stratifications. The default is NULL so that all observations belong to the same strata.

main.vars

Character vector of variables names or a formula for variables in file that will be included in the model as main effects. The default is NULL.

int.vars

Character vector of variable names or a formula for variables in file that will be included in the model as interactions with each SNP in the genotype data. The default is NULL.

file.type

1, 3, 4. 1 is for an R object file created with the save() function. 3 is for a table that will be read in with read.table(). 4 is for a SAS data set. The default is 3.

delimiter

The delimiter in file. The default is "".

factor.vars

Vector of variable names to convert into factors. The default is NULL.

in.miss

Vector of character strings to define the missing values. This option corresponds to the option na.strings in read.table(). The default is "NA".

subsetData

List of sublists to subset the phenotype data for analyses. Each sublist should contain the names "var", "operator" and "value" corresponding to a variable name, operator and values of the variable. Multiple sublists are logically connected by the AND operator. For example,
subsetData=list(list(var="GENDER", operator="==", value="MALE"))
will only include subjects with the string "MALE" for the GENDER variable.
subsetData=list(list(var="AGE", operator=">", value=50),
list(var="STUDY", operator="%in%", value=c("A", "B", "C")))
will include subjects with AGE > 50 AND in STUDY A, B or C. The default is NULL.

cc.var

Name of the cc.var variable used in snp.matched. The default is NULL.

nn.var

Name of the nn.var variable used in snp.matched. The default is NULL.

Details

In this list, file, id.var, and response.var must be specified. The variable id.var is the link between the covariate data and the genotype data. For each subject id, there must be the same subject id in the genotype data for that subject to be included in tha analysis. If the genotype data is in a PLINK format, then id.var must be of length 2 corresponding the the family id and subject id.

Missing data: If any of the variables defined in main.vars, int.vars, strata.var, or response.var contain missing values, then those subjects will be removed from the covariate and outcome data. After the subjects with missing values are removed, the subject ids are matched with the genotype data.


CGEN documentation built on April 28, 2020, 8:08 p.m.