gPCA.batchdetect: Guided Principal Components Analysis

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/gPCA.batchdetect.R

Description

Tests for batch effects an n \times p data set with batch vector given by batch using the δ statistic resulting from guided principal componenets analysis (gPCA).

Usage

1
2
gPCA.batchdetect(x, batch, filt = NULL, nperm = 1000, center = FALSE, scaleY=FALSE, 
seed = NULL)

Arguments

x

an n x p matrix of data where n denotes observations and p denotes the number of features (e.g. probe, gene, SNP, etc.).

batch

a length n vector that indicates batch (group or class) for each observation.

filt

(optional) the number of features to retain after applying a variance filter. If NULL, no filter is applied. Filtering can significantly reduce the processing time in the case of very large data sets.

nperm

the number of permutations to perform for the permutation test, default is 1000.

center

(logical) Is your data x centered? If not, then center=FALSE and gPCA.batchdetect will center it for you.

scaleY

(logical) Do you want to scale the Y matrix by the number of samples in each batch? If not, then center=FALSE (default), otherwise, center=TRUE.

seed

the seed number for set.seed(). Default is NULL.

Details

Guided principal components analysis (gPCA) is an extension of principal components analysis (PCA) that guides the singular value decomposition (SVD) of PCA by applying SVD to \mathbf{Y}'\mathbf{X} where \mathbf{Y} is a n \times b batch indicator matrix of ones when an observation i (i=1,…,n) is in batch b and zeros otherwise.

The test statistic δ along with a one-sided p-value results from a gPCA.batchdetect() call, along with the values of δ_p from the permutation test. The δ_p values can be used to visualize the permutation distribution of your test using the gDist function. For more information on gPCA, please see reese.

Value

delta

test statistic δ from gPCA.

p.val

p-value associated with δ resulting from gPCA.

delta.p

nperm length vector of delta values resulting from the permuation test.

batch

returns your length n batch vector.

filt

returns the number of features the variance filter retained.

n

the number of observations

p

the number of features

b

the number of batches

PCu

principal component matrix from unguided PCA.

PCg

principal component matrix from gPCA.

varPCu1

the proportion out of the total variance associated with the first principal component of unguided PCA.

varPCg1

the proportion out of the total variance associated with the first principal component of gPCA.

cumulative.var.u

length n vector of the cumulative variance of the i=1,…,n principal components from unguided PCA.

cumulative.var.g

length b vector of the cumulative variance of the k=1,…,b principal components from gPCA.

Author(s)

Sarah Reese reesese@vcu.edu

References

Reese, S. E., Archer, K. J., Therneau, T. M., Atkinson, E. J., Vachon, C. M., de Andrade, M., Kocher, J. A., and Eckel-Passow, J. E. A new statistic for identifying batch effects in high-throughput genomic data that uses guided principal components analysis. Bioinformatics, (in review).

See Also

gDist, PCplot, CumulativeVarPlot,

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
data(caseDat)
batch<-caseDat$batch
data<-caseDat$data
out<-gPCA.batchdetect(x=data,batch=batch,center=FALSE,nperm=250)
out$delta ; out$p.val

## Plots:
gDist(out)
CumulativeVarPlot(out,ug="unguided",col="blue")
PCplot(out,ug="unguided",type="1v2")
PCplot(out,ug="unguided",type="comp",npcs=4)

gPCA documentation built on May 2, 2019, 4:02 p.m.