gfa: Gibbs sampling for group factor analysis
In GFA: Group Factor Analysis

gfa	R Documentation

Gibbs sampling for group factor analysis

Description

gfa returns posterior samples of group factor analysis model.

Usage

gfa(Y, opts, K = NULL, projection = NULL, filename = "")

Arguments

`Y`	Either Data sources with co-occuring samples: a list of data matrices, where Y[[m]] is a numeric `N \times D_m` matrix, or Data sources paired in two modes (some data sources share the samples of the first data source, and some share its features): A list with two elements structured as 1. The data collections Y[[1]] and Y[[2]] should be connected by sharing their first data source, i.e. Y[[1]][[1]] should equal the transpose of Y[[2]][[1]]. NOTE: The data features should have roughly zero mean and unit variance. If this is not the case, preprocessing with function `normalizeData` is recommended.
`opts`	List of model options; see function `getDefaultOpts`.
`K`	The number of components (i.e. latent variables). Recommended to be set somewhat higher than the expected component number, so that the sampler can determine the model complexity by shutting down excessive components. High values result in high CPU time. Default: half of the minimum of the sample size and total data dimensionality.
`projection`	Fixed projections. Only intended for sequential prediction use via function `sequentialGfaPrediction`. Default: NULL.
`filename`	A string. If provided, will save the sampling chain to this file every 100 iterations. Default "", inducing no saving.

Details

GFA allows factor analysis of multiple data sources (i.e. data sets). The priors of the model can be set to infer bicluster structure from the data sources; see getDefaultOpts. Missing values (NAs) are inherently supported. They will not affect the model parameters, but can be predicted with function reconstruction, based on the observed values of the corresponding sample and feature. The association of a data source to each component is inferred based on the data. Letting only a subset of the components to explain a data source results in the posterior identifying relationships between any subset of the data sources. In the extreme cases, a component can explain relationships within a single data source only ("structured noise"), or across all the data sources.

Value

A list containing the model parameters - in case of pairing in two modes, each element is a list of length 2; one element for each mode. For most parameters, the final posterior sample is provided to aid in initial checks; all the posterior samples should be used for model analysis. The list elements are:

`W`	The loading matrix (final posterior sample); `D \times K` matrix.
`X`	The latent variables (final sample); `N \times K` matrix.
`Z`	The spike-and-slab parameters (final sample); `D \times K` matrix.
`r`	The probability of slab in Z (final sample).
`rz`	The probability of slab in the spike-and-slab prior of X (final sample).
`tau`	The noise precisions (final sample); D-element vector.
`alpha`	The precisions of the projection weights W (final sample); `D \times K` matrix.
`beta`	The precisions of the latent variables X (final sample); `N \times K` matrix.
`groups`	A list denoting which features belong to each data source.
`D`	Data dimensionalities; M-element vector.
`K`	The number of components inferred. May be less than the initial K.

and the following elements:

`posterior`	the posterior samples of, by default, X, W and tau.
`cost`	The likelihood of all the posterior samples.
`aic`	The Akaike information criterion of all the posterior samples.
`opts`	The options used for the GFA model.
`conv`	An estimate of the convergence of the model's reconstruction based on Geweke diagnostic. Values significantly above 0.05 imply a non-converged model, and hence the need for a longer sampling chain.
`time`	The CPU time (in seconds) used to sample the model.