getLimmaInput: Fit a linear model to each nucleotide

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/getLimmaInput.R

Description

Fits the linear model log2(count+scalefac) = beta0 + beta1*group + beta2*library.size + [optional confounders] to each nucleotide. From these models, this function constructs an object which can be directly passed to getTstats to obtain limma's moderated t statistics for each nucleotide, which we use as a measure of strength of association between group and count (expression). Reads coverage file from a SQLite database (see makeDb) and relies heavily on the limma package, using lmFit as the main workhorse.

Usage

1
2
3
getLimmaInput(dbfile, tablename, comparison = c("twogroup", "multigroup",
  "expression"), group = NULL, chunksize = 1e+05, adjustvars = NULL,
  colsubset = c(-1), scalefac = 32, nonzero = FALSE, colmeds = NULL)

Arguments

dbfile

Name/location (as character string) of database (usually ".db") file containing nucleotide by sample coverage.

tablename

Name of the table the database contains

comparison

Either twogroup, multigroup or expression. multigroup will use the F-statistic and expression tests the intercept-only model.

group

a 0/1 vector grouping the samples (columns) in the database.

chunksize

How many rows of the database should be processed at a time?

adjustvars

Optional matrix of adjustment variables (e.g. measured confounders, output from SVA, etc.) to use in fitting linear models to each nucleotide.

colsubset

Optional vector of column indices of the input file that denote samples you wish to include in analysis. Should NEVER include 1 (genomic position).

scalefac

A log transformation is used on the count tables, so zero counts present a problem. What number should we add to the entire matrix before running the models? Defaults to 32.

nonzero

If TRUE, use the median of only the nonzero counts as the library size adjustment.

colmeds

If NULL, the column medians are calculated using getColmeds. Otherwise, the output of getColmeds is expected.

Details

It is assumed that the first column in the database is called pos and contains genomic position. Note that group must have the one fewer entries than the database denoted by dbfile has columns (or, if colsubset is used, one fewer entries than the length of colsubset). Also, adjustvars must have the same number of rows as group has entries. Larger values of chunksize require more memory; smaller values of chunksize require more computation time.

Value

A list with elements

ebobject

A list of five vectors (coefficients, stdev.unscaled, sigma, df.residual, and Amean), mimicking the MArrayLM class in limma. Here, coefficients and stdev.unscaled are only returned for beta1, the coefficient for group, as it is assumed this is the only covariate of interest.

pos

A vector of the same length as those contained in ebobject, giving the genomic positions of each linear model.

Author(s)

Alyssa Frazee

References

Smyth G (2004). "Linear models and empirical Bayes methods for assessing differential expression in microarray experiments." Statistical Applications in Genetics and Molecular Biology 3(1): Article 3.

See Also

getTstats, makeDb, getColmeds

Examples

1
## add example here when we have a vignette

leekgroup/derfinder documentation built on May 20, 2019, 11:30 p.m.