prince.var.plot: ScreePlot of the data variation covered by the principal...
In swamp: Visualization, Analysis and Adjustment of High-Dimensional Data in Respect to Sample Annotations

Description Usage Arguments Details Value Note Author(s) Examples

To identify the number of top principal components with relevant variation, this function plots the variation contained in the pc for both observed data and reshuffled data.

1 2	prince.var.plot(g, show.top = dim(g)[2], imputeknn = F, center = T, npermute = 10)

`g`	the input data in form of a matrix with features as rows and samples as columns.
`show.top`	the number of top principal components to be shown in the plot (cannot exceed ncol(g) or nrow(g)).
`imputeknn`	default=FALSE. missing values in the data matrix can be imputed by imputeknn=TRUE. The function knn.impute from the package impute is used with default settings.
`center`	default=TRUE. the features are mean-centered before singular value decompositon. this is a pre-requisite for principal component analysis, change only if you are really convinced that centering is not necessary.
`npermute`	the number of reshuffled datasets. default=10. A permuted data matrix is generated with the values for each feature shuffled. From the permutation sets the median percentage of variation for each principal component is taken.

The function prcomp() is used to calculate the variation of the data contained in the principal components. As prcomp cannot handle missing values they have to be imputed beforehands, using imputeknn=TRUE.

a list with components

`real.variation`	a vector containing the percentage of variation for each principal component in the observed data.
`permuted.variation`	a matrix containing the percentages of variation for each principal component in the reshuffled data sets.

requires the package impute

Martin Lauss

## data as a matrix
set.seed(100)
g<-matrix(nrow=1000,ncol=50,rnorm(1000*50),dimnames=list(paste("Feature",1:1000),
          paste("Sample",1:50)))
g[1:100,26:50]<-g[1:100,26:50]+1 
     # the first 100 features show higher values in the samples 26:50

## to plot the variations
res2<-prince.var.plot(g,show.top=50,npermute=10)
str(res2)

Loading required package: impute
Loading required package: amap
Loading required package: gplots

Attaching package: 'gplots'

The following object is masked from 'package:stats':

    lowess

Loading required package: MASS
[1] "Perm = 1"
[1] "Perm = 2"
[1] "Perm = 3"
[1] "Perm = 4"
[1] "Perm = 5"
[1] "Perm = 6"
[1] "Perm = 7"
[1] "Perm = 8"
[1] "Perm = 9"
[1] "Perm = 10"
List of 2
 $ real.variation    : Named num [1:50] 4.54 2.86 2.82 2.72 2.66 ...
  ..- attr(*, "names")= chr [1:50] "PC 1" "PC 2" "PC 3" "PC 4" ...
 $ permuted.variation: num [1:50, 1:10] 2.98 2.86 2.76 2.68 2.65 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:50] "PC 1" "PC 2" "PC 3" "PC 4" ...
  .. ..$ : chr [1:10] "Permutation 1" "Permutation 2" "Permutation 3" "Permutation 4" ...