glmWrapper: Perform glm test for all gene probes

Description Usage Arguments Details Value Note Author(s) Examples

View source: R/glmWrapper.R

Description

Perform glm test for all gene probes.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
glmWrapper(es, 
           formula = FEV1 ~ xi + age + gender, 
           pos.var.interest = 1,
           family = gaussian, 
           logit = FALSE, 
           pvalAdjMethod = "fdr", 
           alpha = 0.05, 
           probeID.var = "ProbeID", 
           gene.var = "Symbol", 
           chr.var = "Chromosome", 
           applier = lapply,
           verbose = TRUE) 

Arguments

es

An LumiBatch object. fData(es) should contains information about probe ID, chromosome number and gene symbol.

formula

An object of class formula. The left handside of ~ is the response variable. Gene probe must be represented by the variable xi. For example, xi~age+gender (gene probe is the response variable); Or FEV1~xi+age+gender (gene probe is the predictor).

pos.var.interest

integer. Indicates which covariate in the right-hand-size of ~ of formula is of the interest. pos.var.interest = 0 means the intercept is of the interest. If the covariate of the interest is an factor or interaction term with more than 2 levels, the smallest p-value will represent the pvalue for the covariate of the interest.

family

By default is gaussian. refer to glm.

logit

logical. Indicate if the gene probes will be logit transformed. For example, for DNA methylation data, one might want to logit transformation for the beta-value (methylated/(methylated+unmethylated)).

pvalAdjMethod

One of p-value adjustment methods provided by the R function p.adjust in R package stats: “holm”, “hochberg”, “hommel”, “bonferroni”, “BH”, “BY”, “fdr”, “none”.

alpha

Significance level. A test is claimed to be significant if the adjusted p-value < alpha.

probeID.var

character string. Name of the variable indicating probe ID in feature data set.

gene.var

character string. Name of the variable indicating gene symbol in feature data set.

chr.var

character string. Name of the variable indicating chromosome number in feature data set.

applier

By default, it is lapply. If the library multicore is available, can use mclapply to replace lappy.

verbose

logical. Determine if intermediate output need to be suppressed. By default verbose=TRUE, intermediate output will be printed.

Details

This function applies R function glm for each gene probe.

Value

A list with the following elements:

n.sig

Number of significant tests after p-value adjustment.

frame

A data frame containing test results sorted according to the ascending order of unadjusted p-values for the covariate of the interest. The data frame contains 7 columns: probeIDs, geneSymbols (gene symbols of the genes where the probes come from), chr (numbers of chromosomes where the probes locate), stats (z-value), pval (p-values of the tests for the covariate of the interest), p.adj (adjusted p-values), pos (row numbers of the probes in the expression data matrix).

statMat

A matrix containing test statistics for all covariates and for all probes. Rows are probes and columns are covariates. The rows are ordered according to the ascending order of unadjusted p-values for the covariate of the interest.

pvalMat

A matrix containing pvalues for all covariates and for all probes. Rows are probes and columns are covariates. The rows are ordered according to the ascending order of unadjusted p-values for the covariate of the interest.

pval.quantile

Quantiles (minimum, 25 for each covariate including intercept provided in the input argument formula.

frame.unsorted

A data frame containing test results. The data frame contains 7 columns: probeIDs, geneSymbols (gene symbols of the genes where the probes come from), chr (numbers of chromosomes where the probes locate), stats (z-value for the covariate of the interest), pval (p-values of the tests for the covariate of the interest), p.adj (adjusted p-values), pos (row numbers of the probes in the expression data matrix).

statMat.unsorted

A matrix containing test statistics for all covariates and for all probes. Rows are probes and columns are covariates.

pvalMat.unsorted

A matrix containing pvalues for all covariates and for all probes. Rows are probes and columns are covariates.

memGenes

A numeric vector indicating the cluster membership of probes (unsorted). memGenes[i]=1 if the i-th probe is significant (adjusted pvalue < alpha) with positive z-value for the covariate of the interest; memGenes[i]=2 if the i-th probe is nonsignificant ; memGenes[i]=3 if the i-th probe is significant with negative z-value for the covariate of the interest;

memGenes2

A numeric vector indicating the cluster membership of probes (unsorted). memGenes2[i]=1 if the i-th probe is significant (adjusted pvalue < alpha). memGenes2[i]=0 if the i-th probe is nonsignificant.

mu1

Mean expression levels for arrays for probe cluster 1 (average taking across all probes with memGenes value equal to 1.

mu2

Mean expression levels for arrays for probe cluster 2 (average taking across all probes with memGenes value equal to 2.

mu3

Mean expression levels for arrays for probe cluster 3 (average taking across all probes with memGenes value equal to 3.

resMat

A matrix with 2p columns, where p is the number of covariates (including intercept; for a nominal variable with 3 levels say, there were 2 dummy covariates). The first p columns are p-values. The remaining p columns are test statistics.

Note

If the covariate of the interest is a factor or interaction term with more than 2 levels, then the p-value of the likelihood ratio test might be more appropriate than the smallest p-value for the covariate of the interest.

Author(s)

Weiliang Qiu <stwxq@channing.harvard.edu>, Brandon Guo <brandowonder@gmail.com>, Christopher Anderson <christopheranderson84@gmail.com>, Barbara Klanderman <BKLANDERMAN@partners.org>, Vincent Carey <stvjc@channing.harvard.edu>, Benjamin Raby <rebar@channing.harvard.edu>

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
    # generate simulated data set from conditional normal distribution
    set.seed(1234567)
    es.sim = genSimData.BayesNormal(nCpGs = 100, 
      nCases = 20, nControls = 20,
      mu.n = -2, mu.c = 2,
      d0 = 20, s02 = 0.64, s02.c = 1.5, testPara = "var",
      outlierFlag = FALSE, 
      eps = 1.0e-3, applier = lapply) 
    print(es.sim)

   res.glm = glmWrapper(
  es = es.sim, 
  formula = xi~as.factor(memSubj), 
  pos.var.interest = 1,
  family = gaussian, 
  logit = FALSE, 
  pvalAdjMethod = "fdr", 
  alpha = 0.05, 
  probeID.var = "probe", 
  gene.var = "gene", 
  chr.var = "chr", 
  applier = lapply,
  verbose = TRUE) 

iCheck documentation built on Nov. 8, 2020, 11:09 p.m.