Fit a probabilistic principal components and covariates analysis model to a metabolomic data set, and assess uncertainty via the jackknife.

Share:

Description

Fit a probabilistic principal components and covariates analysis (PPCCA) model to a metabolomic data set via the EM algorithm, and assess uncertainty in the obtained loadings estimates and the regression coefficients via the jackknife.

Usage

1
2
ppcca.metabol.jack(Y, Covars, minq=1, maxq=2, scale="none", epsilon=0.1, 
conflevel=0.95)

Arguments

Y

An N x p data matrix in which each row is a spectrum.

Covars

An N x L covariate data matrix where each row is a set of covariates.

minq

The minimum number of principal components to be fit. By default minq is 1.

maxq

The maximum number of principal components to be fit. By default maxq is 2.

scale

Type of scaling of the data which is required. The default is "none". Options include "pareto' and "unit" scaling. See scaling for further details.

epsilon

Value on which the convergence assessment criterion is based. Set by default to 0.1.

conflevel

Level of confidence required for the loadings and regression coefficients confidence intervals. By default 95\% confidence intervals are computed.

Details

A (range of) PPCCA model(s) are fitted and an optimal model (i.e. number of principal components, q) is selected. Confidence intervals for the obtained loadings and regression coefficients are then obtained via the jackknife i.e. a model with q principal components is fitted to the data N times, where an observation is removed from the dataset each time.

Care should be taken with the form of covariates supplied. All covariates are standardized (to lie in [0,1]) within the ppcca.metabol.jack function for stability reasons. Hence continuous covariates and binary valued categorical covariates are easily handled. For a categorical covariate with V levels, the equivalent V-1 dummy variables representation should be passed as an argument to ppcca.metabol.jack.

On convergence of the algorithm, the number of loadings significantly different from zero is printed on screen. The user may then further examine the significant loadings when prompted by selecting a cutoff value from the table printed on screen. Bar plots detailing the resulting significantly high loadings are provided.

Value

A list containing:

q

The number of principal components in the optimal PPCCA model, selected by the BIC.

sig

The posterior mode estimate of the variance of the error terms.

scores

An N x q matrix of estimates of the latent locations of each observation in the principal subspace.

loadings

The maximum likelihood estimate of the p x q loadings matrix.

SignifW

The maximum likelihood estimate of the loadings matrix for those loadings significantly different from zero.

SignifHighW

The maximum likelihood estimate of the loadings matrix for those loadings significantly different from zero and above the user selected cutoff point.

LowerCI_W

The lower limit of the confidence interval for those loadings significantly different from zero.

UpperCI_W

The upper limit of the confidence interval for those loadings significantly different from zero.

coefficients

The maximum likelihood estimates of the regression coefficients.

coeffCI

A matrix detailing the upper and lower limits of the confidence intervals for the regression parameters.

Cutoffs

A table detailing a range of cutoff points and the associated number of selected spectral bins.

number

The number of spectral bins selected by the user.

cutoff

The cutoff value selected by the user.

BIC

A vector containing the BIC values for the fitted models.

AIC

A vector containing the AIC values for the fitted models.

Author(s)

Nyamundanda Gift, Isobel Claire Gormley and Lorraine Brennan.

References

Nyamundanda G., Gormley, I.C. and Brennan, L. (2010) Probabilistic principal components analysis for metabolomic data. Technical report, University College Dublin.

See Also

ppcca.metabol, ppcca.scores.plot,loadings.jack.plot

Examples

1
2
3
4
5
6
7
data(UrineSpectra)
## Not run: 
mdlfit<-ppcca.metabol.jack(UrineSpectra[[1]], UrineSpectra[[2]][,2], minq=2, maxq=2)
loadings.jack.plot(mdlfit)
ppcca.scores.plot(mdlfit, UrineSpectra[[2]][,2], group=UrineSpectra[[2]][,1], covarnames="Weight")

## End(Not run)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.