Fit a probabilistic principal components analysis model to a metabolomic data set, and assess uncertainty via the jackknife.

Share:

Description

Fit a probabilistic principal components analysis (PPCA) model to a metabolomic data set via the EM algorithm, and assess uncertainty in the obtained loadings estimates via the jackknife.

Usage

1
2
ppca.metabol.jack(Y, minq=1, maxq=2, scale ="none", 
epsilon = 0.1, conflevel = 0.95)

Arguments

Y

An N x p data matrix where each row is a spectrum.

minq

The minimum number of principal components to be fit. By default minq is 1.

maxq

The maximum number of principal components to be fit. By default maxq is 2.

scale

Type of scaling of the data which is required. The default is "none". Options include "pareto' and "unit" scaling. See scaling for further details.

epsilon

Value on which the convergence assessment criterion is based. Set by default to 0.1.

conflevel

Level of confidence required for the loadings confidence intervals. By default 95\% confidence intervals are computed.

Details

A (range of) PPCA model(s) are fitted and an optimal model (i.e. number of principal components, q) is selected. Confidence intervals for the obtained loadings are then obtained via the jackknife i.e. a model with q principal components is fitted to the dataset N times, where an observation is removed from the dataset each time.

On convergence of the algorithm, the number of loadings significantly different from zero is printed on screen. The user may then further examine the significant loadings when prompted by selecting a cutoff value from the table printed on screen. Bar plots detailing the resulting significantly high loadings are provided.

Value

A list containing:

q

The number of principal components in the optimal PPCA model, selected by the BIC.

sig

The posterior mode estimate of the variance of the error terms.

scores

An N x q matrix of estimates of the latent locations of each observation in the principal subspace.

loadings

The maximum likelihood estimate of the p x q loadings matrix.

SignifW

The maximum likelihood estimate of the loadings matrix for those loadings significantly different from zero.

SignifHighW

The maximum likelihood estimate of the loadings matrix for those loadings significantly different from zero and higher than a user selected cutoff point.

Lower

The lower limit of the confidence interval for those loadings significantly different from zero.

Upper

The upper limit of the confidence interval for those loadings significantly different from zero.

Cutoffs

A table detailing a range of cutoff points and the associated number of selected spectral bins.

number

The number of spectral bins selected by the user.

cutoff

The cutoff value selected by the user.

BIC

A vector containing the BIC values for the fitted models.

AIC

A vector containing the AIC values for the fitted models.

Author(s)

Nyamundanda Gift, Isobel Claire Gormley and Lorraine Brennan.

References

Nyamundanda G., Gormley, I.C. and Brennan, L. (2010) Probabilistic principal components analysis for metabolomic data. Technical report, University College Dublin.

See Also

ppca.metabol, loadings.jack.plot, ppca.scores.plot

Examples

1
2
3
4
5
6
data(UrineSpectra)
## Not run: 
mdlfit<-ppca.metabol.jack(UrineSpectra[[1]], minq=2, maxq=2, scale="none")
loadings.jack.plot(mdlfit)
ppca.scores.plot(mdlfit, group=UrineSpectra[[2]][,1])
## End(Not run)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.