snm: Perform a supervised normalization of microarray data

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Description

Implement Supervised Normalization of Microarrays on a gene expression matrix. Requires a set of biological covariates of interest and at least one probe-specific or intensity-dependent adjustment variable.

Usage

1
2
3
snm(raw.dat, bio.var=NULL, adj.var=NULL, int.var=NULL,
    weights=NULL, spline.dim = 4, num.iter = 10, nbins=50,
    rm.adj=FALSE, verbose=TRUE, diagnose=TRUE)

Arguments

raw.dat

An m probes by n arrays matrix of expression data. If the user wishes to remove intensity-dependent effects, then we request the matrix corresponds to the raw, log transformed data.

bio.var

A model matrix (see model.matrix) or data frame with n rows of the biological variables. If NULL, then all probes are treated as "null" in the algorithm.

adj.var

A model matrix (see model.matrix) or data frame with n rows of the probe-specific adjustment variables. If NULL, a model with an intercept term is used.

int.var

A data frame with n rows of type factor with the unique levels of intensity-dependent effects. Each column parametrizes a unique source of intensity-dependent effect (e.g., array effects for column 1 and dye effects for column 2).

weights

A vector of length m. Values unchanged by algorithm, used to control the influence of each probe on the intensity-dependent array effects.

spline.dim

Dimension of basis spline used for array effects.

num.iter

Number of iterations to run.

nbins

Number of bins used by binning strategy. Array effects are calculated from a nbins x n data matrix, where the (i,j) value is equal to that bin i's average intensity on array j.

rm.adj

If set to FALSE, then only the intensity dependent effects have been removed from the normalized data, implying the effects from the adjustment variables are still present. If TRUE, then the adjustment variables effects and the intensity dependent effects are both removed from the returned normalized data.

verbose

A flag telling the software whether or not to display a report after each iteration. TRUE produces the output.

diagnose

A flag telling the software whether or not to produce diagnostic output in the form of consecutive plots. TRUE produces the plot.

Details

This function implements the supervised normalization of microarrays algorithm described in Mecham, Nelson, and Storey (2010).

Value

norm.dat

The matrix of normalized data. The default setting is rm.adj=FALSE, which means that only the intensity-dependent effects have been subtracted from the data. If the user wants the adjustment variable effects removed as well, then set rm.adj=TRUE when calling the snm function.

pvalues

A vector of p-values testing the association of the biological variables with each probe. These p-values are obtained from an ANOVA comparing models where the full model contains both the probe-specific biological and adjustment variables versus a reduced model that just contains the probe-specific adjustment variables. The data used for this comparison has the intensity-dependent variables removed. These returned p-values are calculated after the final iteration of the algorithm.

pi0

The estimated proportion of true null probes pi_0, calculated after the final iteration of the algorithm.

iter.pi0s

A vector of length equal to num.iter containing the estimated pi_0 values at each iteration of the snm algorithm. These values should converge and any non-convergence suggests a problem with the data, the assumed model, or both

nulls

A vector indexing the probes utilized in estimating the intensity-dependent effects on the final iteration.

M

A matrix containing the estimated probe intensities for each array utilized in estimating the intensity-dependent effects on the final iteration. For memory parsimony, only a subset of values spanning the range is returned, currently nbins*100 values.

array.fx

A matrix of the final estimated intensity-dependent array effects. For memory parsimony, only a subset of values spanning the range is returned, currently nbins*100 values.

bio.var

The processed version of the same input variable.

adj.var

The processed version of the same input variable.

int.var

The processed version of the same input variable.

df0

Degrees of freedom of the adjustment variables.

df1

Degrees of freedom of the full model matrix, which includes the biological variables and the adjustment variables.

raw.dat

The input data.

rm.var

Same as the input (useful for later analyses).

call

Function call.

Note

It is necessary for adj.var and adj.var+bio.var to be valid model matrices (e.g., the models cannot be over-determined).

We suggest that the probe level data be analyzed on the log-transformed scale, particularly if the user wishes to remove intensity-dependent effects. It is recommended that the normalized data (and resulting inference) be inspected for latent structure using Surrogate Variable Analysis (Leek and Storey 2007, PLoS Genetics).

Author(s)

Brig Mecham <brig.mecham@sagebase.org> and John D. Storey <jstorey@princeton.edu>

References

Mecham BH, Nelson PS, Storey JD (2010) Supervised normalization of microarrays. Bioinformatics, 26: 1308-1315.

See Also

model.matrix, plot.snm, fitted.snm, summary.snm, sim.singleChannel, sim.doubleChannel, sim.preProcessed, sim.refDesign

Examples

1
2
3
4
5
6
7
8
9
singleChannel <- sim.singleChannel(12345)
snm.obj <- snm(singleChannel$raw.data,
		      singleChannel$bio.var,
		      singleChannel$adj.var,
		      singleChannel$int.var)
ks.test(snm.obj$pval[singleChannel$true.nulls],"punif")
plot(snm.obj)
summary(snm.obj)
snm.fit = fitted(snm.obj)

Sage-Bionetworks/snm documentation built on May 9, 2019, 12:14 p.m.