README.md
In rdevito/MSFA: Maximum Likelihood Estimation and Bayesian analysis of MSFA models

MSFA

author: Roberta De Vito, Ruggero Bellio

Fits the Multi-Study Factor Analysis model via ECM algorithm and via a Bayesian approach.

The following example illustrates how to fit a MSFA model via the ECM Algorithm, using a data set available in the Bioconductor repository (www.bioconductor.org).

Some pre-processing is required to get the data into a form suitable for the analysis. This was already done, and the resulting data frame is saved into the data_immune object. The commands that were used to form it are included in the help file for the data object.

```{r help2, echo = TRUE, results = TRUE, tidy = TRUE} library(MSFA) data(data_immune) help(data_immune)


## Obtaining suitable starting values for model parameters
Then we get suitable starting values for model parameters, selecting K=3 common
factors and (3, 4) study-specific factors for the two studies, respectively.

```{r, starting values, messages = FALSE}
start_value <- start_msfa(X_s = data_immune, k = 3, j_s = c(3, 4))

Now everything is in place for estimating the model parameters via the ECM algorithm

```{r get estimate, results = FALSE} mle <- ecm_msfa(data_immune, start_value, trace = FALSE)


The estimated matrix of common loadings can be represented by a suitable heatmap:

```{r heatmap, fig.show = 'hold', fig.width = 7.5, fig.height = 6.5, message = FALSE}
library(gplots)
heatmap.2(mle$Phi,dendrogram='none', Rowv=FALSE, Colv=FALSE,trace='none', density.info="none", col=heat.colors(256))

## 2 Bayesian Analysis of a MSFA model

The following example illustrates a Bayesian analysis of a MSFA model. Although the methodology has been developed targeting the $p>n$ case, for the sake of simplicity we illustrate the analysis of the same data set employed for maximum likelihood estimation. The data set is available in the Bioconductor repository (www.bioconductor.org).

```{r help3, echo = TRUE, results = TRUE, tidy = TRUE} library(MSFA) data(data_immune) help(data_immune)



## Sampling from the posterior distribution 
We fist estimate a model with a somewhat large dimension of the various loading matrices,
so we set a dimension 10 for both the common factor loadings and the study-specific loadings. In order to get reproducible results, we set the random seed.

```{r posterior, messages = FALSE}
set.seed(1971)
out10_1010 <- sp_msfa(data_immune,  k = 10,  j_s = c(10, 10), trace = FALSE)

We take as the estimated $\Sigma_\Phi$ the posterior median

```{r Phi } p <- ncol(data_immune[[1]]) nrun <- dim(out10_1010$Phi)[3] SigmaPhi <- SigmaLambda1 <- SigmaLambda2 <- array(0, dim=c(p, p, nrun)) for(j in 1:nrun) { SigmaPhi[,,j] <- tcrossprod(out10_1010$Phi[,,j]) SigmaLambda1[,,j] <- tcrossprod(out10_1010$Lambda[[1]][,,j]) SigmaLambda2[,,j] <- tcrossprod(out10_1010$Lambda[[2]][,,j]) } SigmaPhi <- apply(SigmaPhi, c(1, 2), median) SigmaLambda1 <- apply(SigmaLambda1, c(1, 2), median) SigmaLambda2 <- apply(SigmaLambda2, c(1, 2), median) Phi <- apply(out10_1010$Phi, c(1, 2), median)


## Choice of the number of factors
Then we proceed to the choice of the number of common latent factors.

```{r SigmaPhi, fig.width=4.5, fig.height=4.5}
plot(sp_eigen(SigmaPhi), pch = 16)
abline(h = 0.05, col = 2)

We note that 5 factors are above the $5\%$ threshold, so we choose $K=5$. We proceed in a similar way for the two study-specific loading matrices: ```{r SigmaLambda, fig.width=6.5, fig.height=4.5} par(mfrow=c(1, 2)) plot(sp_eigen(SigmaLambda1), pch=16) abline(h = 0.05, col = 2) plot(sp_eigen(SigmaLambda2), pch=16) abline(h = 0.05, col = 2)


We end up with dimensions 3 and 4 for the study-specific factor loadings.

##  OP prostprocessing
We post-process the estimated loading matrix by the OP procedure
```{r OP}
Phi_OP10 <- sp_OP(out10_1010$Phi[,1:5,], itermax = 10)

For larger data size, we recommend to reduce the output level in the call to sp_msfa.

rdevito/MSFA documentation built on March 18, 2020, 2:57 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com