TFA.estimate: Prediction of Transcription Factor Activities using PLS
In plsgenomics: PLS Analyses for Genomics

Description Usage Arguments Details Value Author(s) References See Also Examples

The function TFA.estimate estimates the transcription factor activities from gene expression data and ChIP data using the PLS multivariate regression approach described in Boulesteix and Strimmer (2005).

1	TFA.estimate(CONNECdata, GEdata, ncomp=NULL, nruncv=0, alpha=2/3, unit.weights=TRUE)

`CONNECdata`	a (n x p) matrix containing the ChIP data for the n genes and the p predictors. The n genes must be the same as the n genes of `GEdata` and the ordering of the genes must also be the same. Each row of `ChIPdata` corresponds to a gene, each column to a transcription factor. `CONNECdata` might have either binary (e.g. 0-1) or numeric entries.
`GEdata`	a (n x m) matrix containing the gene expression levels of the n considered genes for m samples. Each row of `GEdata` corresponds to a gene, each column to a sample.
`ncomp`	if `nruncv=0`, `ncomp` is the number of latent components to be constructed. If `nruncv>0`, the number of latent components to be used for PLS regression is chosen from 1,...,`ncomp` using the cross-validation procedure described in Boulesteix and Strimmer (2005). If `ncomp=NULL`, `ncomp` is set to min(n,p).
`nruncv`	the number of cross-validation iterations to be performed for the choice of the number of latent components. If `nruncv=0`, cross-validation is not performed and `ncomp` latent components are used.
`alpha`	the proportion of genes to be included in the training set for the cross-validation procedure.
`unit.weights`	If `TRUE` then the latent components will be constructed from weight vectors that are standardized to length 1, otherwise the weight vectors do not have length 1 but the latent components have norm 1.

The gene expression data as well as the ChIP data are assumed to have been properly normalized. However, they do not have to be centered or scaled, since centering and scaling are performed by the function TFA.estimate as a preliminary step.

The matrix ChIPdata containing the ChIP data for the n genes and p transcription factors might be replaced by any 'connectivity' matrix whose entries give the strength of the interactions between the genes and transcription factors. For instance, a connectivity matrix obtained by aggregating qualitative information from various genomic databases might be used as argument in place of ChIP data.

A list with the following components:

`TFA`	a (p x m) matrix containing the estimated transcription factor activities for the p transcription factors and the m samples.
`metafactor`	a (m x `ncomp`) matrix containing the metafactors for the m samples. Each row corresponds to a sample, each column to a metafactor.
`ncomp`	the number of latent components used in the PLS regression.

Anne-Laure Boulesteix (http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/ 020_professuren/boulesteix/index.html) and Korbinian Strimmer (http://strimmerlab.org).

A. L. Boulesteix and K. Strimmer (2005). Predicting Transcription Factor Activities from Combined Analysis of Microarray and ChIP Data: A Partial Least Squares Approach.

A. L. Boulesteix, K. Strimmer (2007). Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Briefings in Bioinformatics 7:32-44.

S. de Jong (1993). SIMPLS: an alternative approach to partial least squares regression, Chemometrics Intell. Lab. Syst. 18, 251–263.

pls.regression, pls.regression.cv.

# load plsgenomics library
library(plsgenomics)

# load Ecoli data
data(Ecoli)

# estimate TFAs based on 3 latent components
TFA.estimate(Ecoli$CONNECdata,Ecoli$GEdata,ncomp=3,nruncv=0)

# estimate TFAs and determine the best number of latent components simultaneously
TFA.estimate(Ecoli$CONNECdata,Ecoli$GEdata,ncomp=1:5,nruncv=20)