MetaPCA implements simultaneous dimension reduction using PCA when multiple studies are combined. We propose two basic ideas to find a common PC subspace by eigenvalue maximization approach and angle minimization approach, and we extend the concept to incorporate Robust PCA and Sparse PCA in the meta-analysis realm.

1 2 3 4 5 | ```
MetaPCA(DList, method=c("Angle","Eigen","RobustAngle","SparseAngle"), robust.var=c("qn","mad"), nPC=2,
.weight=rep(1/length(DList),length(DList)), sparse.maxFeatures=NULL, sparse.lambda=NULL,
sparse.max.iter=100, sparse.eps=1e-3, .scale=FALSE, .scaleAdjust=TRUE, doPreprocess=TRUE,
cutRatioByMean=.4, cutRatioByVar=.4, doImpute=TRUE, na.rm.pct=.1, na.rm.pct.each=.5,
verbose=FALSE)
``` |

`DList` |
A list of all data matrices; Each data name should be set as the name of each list element. Each data should be a numeric matrix that has genes in the rows and samples in the columns. Row names should be official gene symbols and column names be sample labels. |

`method` |
A vector of four meta PCA methods. The first two methods are basic approaches; the last two are extended approaches of robust PCA and sparse PCA but may be rather slower than the basic methods. Default is "Angle", which is angle minimization method. See the details in the reference. |

`robust.var` |
Robust measure of variance when "RobustAngle" method was selected in the method. |

`nPC` |
The number of returned PC's, i.e. the number of dimension reduced by PCA. |

`.weight` |
Weight for each data if information is available. Default is equal weight. |

`sparse.maxFeatures` |
The number of genes left for the Sparse PCA approach. If NULL (default), it is determined based on the default lambda. |

`sparse.lambda` |
The parameter lambda which determines the sparsity of loading vectors. The default is calculated as the number of data divided by square root of the number of overall genes. |

`sparse.max.iter` |
The number of maximum iteration for achieving convergence of sparse loading vectors. Default is 100. |

`sparse.eps` |
The convergence decision precision level. Default is 1e-3. |

`.scale` |
Whether to apply gene based normalization. Default is FALSE. But for the "Eigen" method, gene scaling is recommended for the comparability reason of covariance matrix. |

`.scaleAdjust` |
Whether to apply scaling adjustment for a comparable visualization. Default is TRUE. |

`doPreprocess` |
Whether to apply gene filtering. Default is TRUE. However "SparseAngle" method do not use gene filtering. |

`cutRatioByMean` |
Proportion of genes filtered by study-wise mean. Default is 40%. |

`cutRatioByVar` |
Proportion of genes filtered by study-wise variance. Default is 40%. |

`doImpute` |
Whether to impute missing genes. Default is TRUE, and default imputation method is knn. |

`na.rm.pct` |
Proportion of genes filtered by study-wise missing proportion. Default is 10%. |

`na.rm.pct.each` |
Proportion of genes filtered by each study's missing proportion. Default is 50%. |

`verbose` |
Whether to print logs. Default is FALSE. |

list object having the specified number of PC's of all data sets and loading matrix of meta subspace.

Don Kang (donkang75@gmail.com) and George Tseng (ctseng@pitt.edu)

Dongwan D. Kang and George C. Tseng. (2011) Meta-PCA: Meta-analysis in the Dimension Reduction of Genomic data.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | ```
## Not run:
#Spellman, 1998 Yeast cell cycle data set
#Consider each synchronization method as a separate data
data(Spellman)
pc <- list(alpha=prcomp(t(Spellman$alpha))$x, cdc15=prcomp(t(Spellman$cdc15))$x,
cdc28=prcomp(t(Spellman$cdc28))$x, elu=prcomp(t(Spellman$elu))$x)
#There are currently 4 meta-pca methods. Run either one of following four.
metaPC <- MetaPCA(Spellman, method="Eigen", doPreprocess=FALSE)
metaPC <- MetaPCA(Spellman, method="Angle", doPreprocess=FALSE)
metaPC <- MetaPCA(Spellman, method="RobustAngle", doPreprocess=FALSE)
metaPC <- MetaPCA(Spellman, method="SparseAngle", doPreprocess=FALSE)
#Comparing between usual pca and meta-pca
#The first lows are four data sets based on usual PCA, and
#the second rows are by MetaPCA
#We're looking for a cyclic pattern.
par(mfrow=c(2,4), cex=1, mar=c(0.2,0.2,0.2,0.2))
for(i in 1:4) {
plot(pc[[i]][,1], pc[[i]][,2], type="n", xlab="", ylab="", xaxt="n", yaxt="n")
text(pc[[i]][,1], pc[[i]][,2], 1:nrow(pc[[i]]), cex=1.5)
lines(pc[[i]][,1], pc[[i]][,2])
}
for(i in 1:4) {
plot(metaPC$x[[i]]$coord[,1], metaPC$x[[i]]$coord[,2], type="n", xlab="", ylab="", xaxt="n", yaxt="n")
text(metaPC$x[[i]]$coord[,1], metaPC$x[[i]]$coord[,2], 1:nrow(metaPC$x[[i]]$coord), cex=1.5)
lines(metaPC$x[[i]]$coord[,1], metaPC$x[[i]]$coord[,2])
}
#4 prostate cancer data which have three classes: normal, primary, metastasis
data(prostate)
#There are currently 4 meta-pca methods. Run either one of following four.
metaPC <- MetaPCA(prostate, method="Eigen", doPreprocess=FALSE, .scale=TRUE)
metaPC <- MetaPCA(prostate, method="Angle", doPreprocess=FALSE)
metaPC <- MetaPCA(prostate, method="RobustAngle", doPreprocess=FALSE)
metaPC <- MetaPCA(prostate, method="SparseAngle", doPreprocess=FALSE)
#Plotting 4 data in the same space!
coord <- foreach(dd=iter(metaPC$x), .combine=rbind) %do% dd$coord
PlotPC2D(coord[,1:2], drawEllipse=F, dataset.name="Prostate", .class.order=c("Metastasis","Primary","Normal"),
.class.color=c('red','#838383','blue'), .annotation=T, newPlot=T,
.class2=rep(names(metaPC$x), times=sapply(metaPC$x,function(x)nrow(x$coord))),
.class2.order=names(metaPC$x), .points.size=1)
#In the case of "SparseAngle" method, the top contributing genes for all studies can be determined
#For instance, top 20 genes in 1st PC and their coefficients
metaPC$v[order(abs(metaPC$v[,1]), decreasing=TRUE),1][1:20]
## End(Not run)
``` |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.