fit.dependency.model: Fit dependency model between two data sets.
In dmt: Dependency Modeling Toolkit

Description Usage Arguments Details Value Author(s) References See Also Examples

Fit generative latent variable model (see vignette for model specification) on two data sets. Regularize the solutions with priors, including constraints on marginal covariance structures, the structure of W, latent dimensionality etc. Probabilistic versions of PCA, factor analysis and CCA are available as special cases.

fit.dependency.model(X, Y, zDimension = 1, marginalCovariances = "full",
                     epsilon = 1e-3,
                     priors = list(), matched = TRUE,
                     includeData = TRUE, calculateZ = TRUE, verbose = FALSE)
ppca(X, Y = NULL, zDimension = NULL, includeData = TRUE, calculateZ = TRUE)
pfa(X, Y = NULL, zDimension = NULL, includeData = TRUE, calculateZ = TRUE, priors = NULL)
pcca(X, Y, zDimension = NULL, includeData = TRUE, calculateZ = TRUE)

`X, Y`	Data set/s X and Y. 'Variables x samples'. The second data set (`Y`) is optional.
`zDimension`	Dimensionality of the shared latent variable.
`marginalCovariances`	Structure of marginal covariances, assuming multivariate Gaussian distributions for the dataset-specific effects. Options: `"identical isotropic"`, `"isotropic"`, `"diagonal"` and `"full"`. The difference between isotropic and identical isotropic options is that in isotropic model, phi$X != phi$Y in general, whereas with isotropic model phi$X = phi$Y.
`epsilon`	Convergence limit.
`priors`	Prior parameters for the model. A list, which can contain some of the following elements: W Rate parameter for exponential distribution (should be positive). Used to specify the prior for Wx and Wy in the dependency model. The exponential prior is used to produce non-negative solutions for W; small values of the rate parameter correspond to an uninformative prior distribution. Nm.wxwy.mean Mean of the matrix normal prior distribution for the transformation matrix T. Must be a matrix of size (variables in first data set) x (variables in second data set). If value is `1`, `Nm.wxwy.mean` will be made identity matrix of appropriate size. Nm.wxwy.sigma Variance parameter for the matrix normal prior distribution of the transformation matrix `T`. Described the allowed deviation scale of the transformation matrix `T` from the mean matrix `Nm.wxwy.mean`.
`matched`	Logical indicating if the variables (dimensions) are matched between X and Y. Applicable only when dimX = dimY. Affects the results only when prior on the relationship Wx ~ Wy is set, i.e. when priors$Nm.wx.wy.sigma < Inf.
`includeData`	Logical indicating whether the original data is included to the model output. Using `FALSE` can be used to save memory.
`calculateZ`	Logical indicating whether an expectation of the latent variable Z is included in the model output. Otherwise the expectation can be calculated with `getZ` or `z.expectation`. Using `FALSE` speeds up the calculation of the dependency model.
`verbose`	Follow procedure by intermediate messages.

The fit.dependency.model function fits the dependency model X = N(W$X * Z, phi$X); Y = N(W$Y * Z, phi$Y) with the possibility to tune the model structure and parameter priors.

In particular, the dataset-specific covariance structure phi can be defined; non-negative priors for W are possible; the relation between W$X and W$Y can be tuned. For a comprehensive set of examples, see the example scripts in the tests/ directory of this package.

Special cases of the model, obtained with particular prior assumptions, include probabilistic canonical correlation analysis (pcca; Bach & Jordan 2005), probabilistic principal component analysis (ppca; Tipping & Bishop 1999), probabilistic factor analysis (pfa; Rubin & Thayer 1982), and a regularized version of canonical correlation analysis (pSimCCA; Lahti et al. 2009).

The standard probabilistic PCA and factor analysis are methods for a single data set (X ~ N(WZ, phi)), with isotropic and diagonal covariance (phi) for pPCA and pFA, respectively. Analogous models for two data sets are obtained by concatenating the two data sets, and performing pPCA or pFA.

Such special cases are obtained with the following choices in the fit.dependency.model function:

pPCA: marginalCovariances = "identical isotropic" (Tipping & Bishop 1999)
pFA: marginalCovariances = "diagonal" (Rubin & Thayer 1982)
pCCA: marginalCovariances = "full" (Bach & Jordan 2005)
pSimCCA: marginaCovariances = "full", priors = list(Nm.wxwy.mean = I, Nm.wxwy.sigma = 0). This is the default method, corresponds to the case with W$X = W$Y. (Lahti et al. 2009)
pSimCCA with T prior: marginalCovariances = "isotropic", priors = list(Nm.wxwy.mean = 1, Nm.wx.wy.sigma = 1 (Lahti et al. 2009)

To avoid computational singularities, the covariance matrix phi is regularised by adding a small constant to the diagonal.

DependencyModel

Olli-Pekka Huovilainen ohuovila@gmail.com and Leo Lahti leo.lahti@iki.fi

Dependency Detection with Similarity Constraints, Lahti et al., 2009 Proc. MLSP'09 IEEE International Workshop on Machine Learning for Signal Processing, http://arxiv.org/abs/1101.5919

A Probabilistic Interpretation of Canonical Correlation Analysis, Bach Francis R. and Jordan Michael I. 2005 Technical Report 688. Department of Statistics, University of California, Berkley. http://www.di.ens.fr/~fbach/probacca.pdf

Probabilistic Principal Component Analysis, Tipping Michael E. and Bishop Christopher M. 1999. Journal of the Royal Statistical Society, Series B, 61, Part 3, pp. 611–622. http://research.microsoft.com/en-us/um/people/cmbishop/downloads/Bishop-PPCA-JRSS.pdf

EM Algorithms for ML Factorial Analysis, Rubin D. and Thayer D. 1982. Psychometrika, vol. 47, no. 1.

Output class for this function: DependencyModel. Special cases: ppca, pfa, pcca

data(modelData) # Load example data X, Y

# probabilistic CCA
model <- pcca(X, Y)

# dependency model with priors (W>=0; Wx = Wy; full marginal covariances)
model <- fit.dependency.model(X, Y, zDimension = 1, 
      	 		      priors = list(W = 1e-3, Nm.wx.wy.sigma = 0), 
			      marginalCovariances = "full")

# Getting the latent variable Z when it has been calculated with the model
#getZ(model)