lmPerGene: Fit linear model for each gene
In GSEAlm: Linear Model Toolset for Gene Set Enrichment Analysis

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/lm2.R

For each gene, lmPerGene fits the same, user-specified linear model. It returns the estimates of the model parameters and their variances for each fitted model. The function uses matrix algebra so it is much faster than repeated calls to lm.

1	lmPerGene(eSet, formula, na.rm=TRUE,pooled=FALSE)

`eSet`	An `ExpressionSet` object.
`formula`	an object of class `formula` (or one that can be coerced to that class), specifying only the right-hand side starting with the '~' symbol. The LHS is automatically set as the expression levels provided in `eSet`. The names of all predictors must exist in the phenotypic data of `eSet`.
`na.rm`	Whether to remove missing observations.
`pooled`	Whether to pool the variance calculation across all genes.

This function efficiently computes the least squares fit of a linear regression to a set of gene expression values. We assume that there are G genes, on n samples, and that there are p variables in the regression equation. So the result is that G different regressions are computed, and various summary statistics are returned.

Since the independent variables are the same in each model fitting, instead of repeatedly fitting linear model for each gene, lmPerGene accelarates the fitting process by calculating the hat matrix X(X'X)^(-1)X' first. Then matrix multiplication, and solve are to compute estimates of the model parameters.

Leaving the formula blank (the default) will calculate an intercept-only model, useful for generic pattern and outlier identification.

A list with components:

`eS`	The `ExpressionSet` used in the model fitting.
`x`	The design matrix of the coded predictor variables.
`Hmat`	The Hat matrix.
`coefficients`	A matrix of dimension `p` by `G` containing the estimated model parameters.
`pooled`	Whether the variance was pooled (this affects “coef.var” and “tstat”, but not “sigmaSqr”).
`sigmaSqr`	A vector of length $G$ containing the mean square error for that model, the sum of the residuals squared divided by `n - p`.
`coef.var`	A matrix of dimension `p` by `G` containing the estimated variances for the model parameters, for each regression.
`tstat`	A matrix of the same dimension as `coefficients`, containing the $t$-statistics for each model estimate. This is simply `coefficients` divided by the square root of `coef.var`, and is provided for convenience.

Robert Gentleman, Assaf Oron

getResidPerGene to extract row-by-row residuals; gsealmPerm for code that utilizes 'lmPerGene' for gene-set-enrichment analysis (GSEA); and CooksDPerGene for diagnostic functions on an object produced by 'lmPerGene'. Applying a by-gene regression in the manner performed here is a special case of a more generic linear-model framework available in the GlobalAncova package; our assumption here is equivalent to a diagonal covariance structure between genes, with unequal variances.

data(sample.ExpressionSet)
layout(1)
lm1 = lmPerGene(sample.ExpressionSet,~sex)
qqnorm(lm1$coefficients[2,]/sqrt(lm1$coef.var[2,]),main="Sample Dataset: Sex Effect by Gene",ylab="Individual Gene t-statistic",xlab="Normal Quantile")
abline(0,1,col=2)
lm2 = lmPerGene(sample.ExpressionSet,~type+sex)
qqnorm(lm2$coefficients[2,]/sqrt(lm2$coef.var[2,]),main="Sample Dataset: Case vs. Control Effect by Gene, Adjusted for Sex",ylab="Individual Gene t-statistic",xlab="Normal Quantile")
abline(0,1,col=2)

Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colMeans, colSums, colnames, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.