# Heterogeneous Subtype analysis

### Description

Subset-based analysis of case-control studies with heterogeneous disease subtypes.

### Usage

1 2 3 |

### Arguments

`dat` |
A data frame containing individual level data for phenotype (disease status/subtype information), covariate data and SNPs. No default. |

`response.var` |
A character string containing the name/position of the response variable column in the data frame. This variable needs to contain disease status/subtype information in the data frame. No default. |

`snp.vars` |
A character vector giving the names of the SNP variables. Missing values for SNP genotypes are indicated by NA. No default. |

`adj.vars` |
A character vector containing the names/positions of the columns in the data frame that would be used as adjusting covariates in the analysis. Use NULL if no covariates are used for adjustment. |

`types.lab` |
NULL or a character vector giving the names/identifiers of the disease subtypes in |

`cntl.lab` |
A single character string giving the name/identifier of controls (disease-free subjects) in |

`subset` |
A logical vector with length=nrow( |

`method` |
A single character string indicating the choice of method as "case-control" or "case-complement".
The Default option is NULL which will carry out both types of analysis. For the case-complement analysis of disease subtype |

`side` |
A numeric value of either 1 or 2 indicating whether one or two-sided p-values should be computed, respectively. The default is 2. |

`logit` |
If TRUE, results are returned from an overall case-control analysis using standard logistic regression. Default is FALSE. |

`test.type` |
A character string indicating the type of tests to be performed. The current options are "Score" and "Wald". The default is "Score." |

`zmax.args` |
Optional arguments to be passed to |

`meth.pval` |
A character string indicating the method of evaluating the p-value. Currently the options are "DLM" (Discrete Local Maximum), "IS" (Exact Importance Sampling) and "B" (Bonferroni) with the default option being DLM. The IS method is currently computationally feasible for analysis of at most k=10 studies/traits |

`pval.args` |
Optional arguments to be
passed to |

### Details

The output standard errors are approximate (based on inverting DLM pvalues) and are used for constructing confidence intervals
in `h.summary`

and `h.forestPlot`

. For a particular SNP, if any of the genotypes are missing, then those
subjects will be removed from the analysis for that SNP.

### Value

A list containing 3 component lists named:

(1) "Overall.Logistic" (output for overall case-control analysis using standard logistic regression):
This list is non-null when `logit`

is TRUE and contains 3 vectors named (pval, beta, sd) of length same as snp.vars.

(2) "Subset.Case.Control" (output for subset-based case-control analysis):
This list is non-null when `method`

is NULL or "case-control". The output contains, 3 vectors named (pval, beta, sd) of length same as snp.vars
and a logical matrix named "pheno" with one row for each snp and one column for each disease subtype. For a particular SNP and disease-subtype, the
corresponding entry is "TRUE" if that disease subtype is included the best subset of disease subtypes that is identified to be associated with
the SNP in the subset-based case-control analysis. In the output, the p-value is automatically adjusted for multiple testing
due to subset search. The beta and sd corresponds to estimate of log-odds-ratio and standard error for a SNP from a logistic regression analysis involving
the cases of the identified disease subtypes and the controls.

(3) "Subset.Case.Complement" (output for subset-based case-complement analysis):
This list is non-null when `method`

is NULL or "case-complement". The output contains, 3 vectors named (pval, beta, sd) of length same as snp.vars
and a logical matrix named "pheno" with one row for each snp and one column for each disease subtype. For a particular SNP and disease-subtype, the
corresponding entry is "TRUE" if that disease subtype is included the best subset of disease subtypes that is identified to be associated with
the SNP in the subset-based case-complement analysis. In the output, the p-value is automatically adjusted for multiple testing
due to subset search. The beta and sd corresponds to estimate of log-odds-ratio and standard error for the SNP from a logistic regression analysis involving
the cases of the selected disease subtypes and the whole complement set of subjects that includes original controls and the cases of unselected disease subtypes.

### References

Bhattacharjee S, Chatterjee N and others. A subset-based approach improves power and interpretation for combined-analysis of genetic association studies of heterogeneous traits. Submitted.

### See Also

`h.summary`

, `h.forestPlot`

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | ```
# Use the example data
data(ex_types, package="ASSET")
# Display the first 10 rows of the data and a table of the subtypes
data[1:10, ]
table(data[, "TYPE"])
# Define the input arguments to h.types.
snps <- paste("SNP_", 1:3, sep="")
adj.vars <- c("CENTER_1", "CENTER_2", "CENTER_3")
types <- paste("SUBTYPE_", 1:5, sep="")
# SUBTYPE_0 will denote the controls
res <- h.types(data, "TYPE", snps, adj.vars, types, "SUBTYPE_0", subset=NULL,
method="case-control", side=2, logit=FALSE, test.type="Score",
zmax.args=NULL, meth.pval="DLM", pval.args=NULL)
h.summary(res)
``` |

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker. Vote for new features on Trello.