kill.pc: Removes principal components from a data matrix

Description Usage Arguments Details Value Note Author(s) Examples

Description

Does not destroy your personal computer. Really. (No warranty).

Usage

1
kill.pc(g, pc, imputeknn = F, center = T)

Arguments

g

the input data in form of a matrix with features as rows and samples as columns.

pc

the principal components to be removed in form of a numeric vector of length 1 or more. e.g. to remove pc1 and pc3 use pc=c(1,3), to remove only pc3 use pc=3.

imputeknn

default=FALSE. missing values in the data matrix can be imputed by imputeknn=TRUE. The function knn.impute from the package impute is used with default settings.

center

default=TRUE. the features are mean-centered before singular value decompositon. this is a pre-requisite for principal component analysis, change only if you are really convinced that centering is not necessary.

Details

A specific principal component might be associated with several interelated batch surrogate variables but free from biological associations. In such a case it may be useful to take out such a principal component from the data. The svd() function resolves the data matrix X into X = U*D*V. D is then set to zero for the unwanted principal components and the data X is recalculated. If you use the default center=TRUE make sure you also use prince() with the default center=TRUE. Using different settings for center for the two functions is not fully compatible.

Value

a matrix which is the new data with the specified principal components removed.

Note

requires the package impute.

Author(s)

Martin Lauss

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# data as a matrix
set.seed(100)
g<-matrix(nrow=1000,ncol=50,rnorm(1000*50),dimnames=list(paste("Feature",1:1000),
   paste("Sample",1:50)))
g[1:100,26:50]<-g[1:100,26:50]+1 # the first 100 features show
# higher values in the samples 26:50
# patient annotations as a data.frame, annotations should be numbers and factors
# but not characters.
# rownames have to be the same as colnames of the data matrix 
set.seed(200)
o<-data.frame(Factor1=factor(c(rep("A",25),rep("B",25))),
              Factor2=factor(rep(c("A","B"),25)),
              Numeric1=rnorm(50),row.names=colnames(g))
  
## pca on unadjusted data
res1<-prince(g,o,top=10)
prince.plot(res1)

## take out pc1
gadj3<-kill.pc(g,pc=1)
prince.plot(prince(gadj3,o,top=10))  

Example output

Loading required package: impute
Loading required package: amap
Loading required package: gplots

Attaching package: 'gplots'

The following object is masked from 'package:stats':

    lowess

Loading required package: MASS

swamp documentation built on Dec. 6, 2019, 5:09 p.m.