Heterogeneous Subtype analysis

Description

Subset-based analysis of case-control studies with heterogeneous disease subtypes.

Usage

1
2
3
h.types(dat, response.var, snp.vars, adj.vars, types.lab, cntl.lab, 
        subset=NULL, method=NULL, side=2, logit=FALSE, test.type="Score", 
        zmax.args=NULL, meth.pval=c("DLM", "IS", "B"), pval.args=NULL)

Arguments

dat

A data frame containing individual level data for phenotype (disease status/subtype information), covariate data and SNPs. No default.

response.var

A character string containing the name/position of the response variable column in the data frame. This variable needs to contain disease status/subtype information in the data frame. No default.

snp.vars

A character vector giving the names of the SNP variables. Missing values for SNP genotypes are indicated by NA. No default.

adj.vars

A character vector containing the names/positions of the columns in the data frame that would be used as adjusting covariates in the analysis. Use NULL if no covariates are used for adjustment.

types.lab

NULL or a character vector giving the names/identifiers of the disease subtypes in response.var to be included in the analysis. If NULL, then all subtypes will be included. No default.

cntl.lab

A single character string giving the name/identifier of controls (disease-free subjects) in response.var. No default.

subset

A logical vector with length=nrow(dat) indicating the subset of rows of the data frame to be included in the analysis. Default is NULL, all rows are used.

method

A single character string indicating the choice of method as "case-control" or "case-complement". The Default option is NULL which will carry out both types of analysis. For the case-complement analysis of disease subtype i, the set of control subjects is formed by taking the complement of disease subtype i, ie the original controls and the cases not defined by disease subtype i.

side

A numeric value of either 1 or 2 indicating whether one or two-sided p-values should be computed, respectively. The default is 2.

logit

If TRUE, results are returned from an overall case-control analysis using standard logistic regression. Default is FALSE.

test.type

A character string indicating the type of tests to be performed. The current options are "Score" and "Wald". The default is "Score."

zmax.args

Optional arguments to be passed to z.max as a named list. This option can be useful if the user wants to restrict subset searches in some structured way, for example, incorporating ordering constraints.

meth.pval

A character string indicating the method of evaluating the p-value. Currently the options are "DLM" (Discrete Local Maximum), "IS" (Exact Importance Sampling) and "B" (Bonferroni) with the default option being DLM. The IS method is currently computationally feasible for analysis of at most k=10 studies/traits

pval.args

Optional arguments to be passed to p.dlm or p.tube as a named list. This option can be useful if the user wants to restrict subset searches in some structured way, for example, incorporating ordering constraints.

Details

The output standard errors are approximate (based on inverting DLM pvalues) and are used for constructing confidence intervals in h.summary and h.forestPlot. For a particular SNP, if any of the genotypes are missing, then those subjects will be removed from the analysis for that SNP.

Value

A list containing 3 component lists named:

(1) "Overall.Logistic" (output for overall case-control analysis using standard logistic regression): This list is non-null when logit is TRUE and contains 3 vectors named (pval, beta, sd) of length same as snp.vars.

(2) "Subset.Case.Control" (output for subset-based case-control analysis): This list is non-null when method is NULL or "case-control". The output contains, 3 vectors named (pval, beta, sd) of length same as snp.vars and a logical matrix named "pheno" with one row for each snp and one column for each disease subtype. For a particular SNP and disease-subtype, the corresponding entry is "TRUE" if that disease subtype is included the best subset of disease subtypes that is identified to be associated with the SNP in the subset-based case-control analysis. In the output, the p-value is automatically adjusted for multiple testing due to subset search. The beta and sd corresponds to estimate of log-odds-ratio and standard error for a SNP from a logistic regression analysis involving the cases of the identified disease subtypes and the controls.

(3) "Subset.Case.Complement" (output for subset-based case-complement analysis): This list is non-null when method is NULL or "case-complement". The output contains, 3 vectors named (pval, beta, sd) of length same as snp.vars and a logical matrix named "pheno" with one row for each snp and one column for each disease subtype. For a particular SNP and disease-subtype, the corresponding entry is "TRUE" if that disease subtype is included the best subset of disease subtypes that is identified to be associated with the SNP in the subset-based case-complement analysis. In the output, the p-value is automatically adjusted for multiple testing due to subset search. The beta and sd corresponds to estimate of log-odds-ratio and standard error for the SNP from a logistic regression analysis involving the cases of the selected disease subtypes and the whole complement set of subjects that includes original controls and the cases of unselected disease subtypes.

References

Bhattacharjee S, Chatterjee N and others. A subset-based approach improves power and interpretation for combined-analysis of genetic association studies of heterogeneous traits. Submitted.

See Also

h.summary, h.forestPlot

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
 # Use the example data
 data(ex_types, package="ASSET")

 # Display the first 10 rows of the data and a table of the subtypes
 data[1:10, ]
 table(data[, "TYPE"])
 
 # Define the input arguments to h.types. 
 snps     <- paste("SNP_", 1:3, sep="")
 adj.vars <- c("CENTER_1", "CENTER_2", "CENTER_3")
 types <- paste("SUBTYPE_", 1:5, sep="")

 # SUBTYPE_0 will denote the controls
 res <- h.types(data, "TYPE", snps, adj.vars, types, "SUBTYPE_0", subset=NULL, 
        method="case-control", side=2, logit=FALSE, test.type="Score", 
        zmax.args=NULL, meth.pval="DLM", pval.args=NULL)

 
 h.summary(res)