lngca: Decompose the original data through LNGCA method.
In singR: Simultaneous Non-Gaussian Component Analysis

lngca

R Documentation

Decompose the original data through LNGCA method.

Description

Implements the methods of linear non-Gaussian component analysis (LNGCA) and likelihood component analysis (when using a density, e.g., tilted Gaussian) from the LNGCA paper

Usage

lngca(
  xData,
  n.comp = NULL,
  Ux.list = NULL,
  whiten = c("sqrtprec", "eigenvec", "none"),
  maxit = 1000,
  eps = 1e-06,
  verbose = FALSE,
  restarts.pbyd = 0,
  restarts.dbyd = 0,
  distribution = c("JB", "tiltedgaussian", "logistic"),
  density = FALSE,
  out.all = FALSE,
  orth.method = c("svd", "givens"),
  df = 0,
  stand = FALSE,
  ...
)

Arguments

`xData`	the original dataset for decomposition, matrix of n x px.
`n.comp`	the number of components to be estimated.
`Ux.list`	list of user specified initial values for Ux. If null, will generate random orthogonal matrices. See restarts.pbyd and restarts.dbyd
`whiten`	whitening method. Defaults to "svd" which uses the n left eigenvectors divided by sqrt(px-1) by 'eigenvec'. Optionally uses the square root of the n x n "precision" matrix by 'sqrtprec'.
`maxit`	max iteration, defalut = 1000
`eps`	default = 1e-06
`verbose`	default = FALSE
`restarts.pbyd`	default = 0. Generates p x d random orthogonal matrices. Use a large number for large datasets. Note: it is recommended that you run lngca twice with different seeds and compare the results, which should be similar when a sufficient number of restarts is used. In practice, stability with large datasets and a large number of components can be challenging.
`restarts.dbyd`	default = 0. These are d x d initial matrices padded with zeros, which results in initializations from the principal subspace. Can speed up convergence but may miss low variance non-Gaussian components.
`distribution`	distribution methods with default to tilted Gaussian. "logistic" is similar to infomax ICA, JB is capable of capture super and sub Gaussian distribution while being faster than tilted Gaussian. (tilted Gaussian tends to be most accurate, but computationally much slower.)
`density`	return the estimated tilted Gaussian density? default = FALSE
`out.all`	default = FALSE
`orth.method`	default = 'svd'. Method to generate random initial matrices. See [gen.inits()]
`df`	default = 0, df of the spline used in fitting the non-parametric density. use df=8 or so for tilted gaussian. set df=0 for JB and logistic.
`stand`	whether to standardize the data to have row and column means equal to 0 and the row standard deviation equal to 1 (i.e., all variables on same scale). Often used when combined with singR for data integration.
`...`	other arguments to tiltedgaussian estimation

Value

Function outputs a list including the following:

U: matrix rx x n, part of the expression that Ax = Ux x Lx and Ax x Xc = Sx, where Lx is the whitener matrix.
loglik: the value of log-likelihood in the lngca method.
S: the variable loading matrix r x px, each row is a component, which can be used to measure nongaussianity
df: egree of freedom.
distribution: the method used for data decomposition.
whitener: A symmetric whitening matrix n x n from dX, the same with whitenerXA = est.sigmaXA %^% -0.5
M: Mx Mtrix with n x rx.
nongaussianity: the nongaussianity score for each component saved in S matrix.

Examples


#get simulation data
data(exampledata)
data=exampledata

# To get n.comp value, we can use NG_number function.

# use JB statistic as the measure of nongaussianity to run lngca with df=0
estX_JB = lngca(xData = data$dX, n.comp = 4,
 whiten = 'sqrtprec', restarts.pbyd = 20, distribution='JB',df=0)

# use the tiltedgaussian distribution to run lngca with df=8. This takes a long time:
estX_tilt = lngca(xData = data$dX, n.comp = 4,
 whiten = 'sqrtprec', restarts.pbyd = 20, distribution='tiltedgaussian',df=8)

# true non-gaussian component of Sx, include individual and joint components
trueSx = rbind(data$sjX,data$siX)

# use pmse to compare the difference of the two methods
pmse(S1 = t(trueSx),S2=t(estX_JB$S),standardize = TRUE)
pmse(S1 = t(trueSx),S2=t(estX_tilt$S),standardize = TRUE)

# the lngca using tiltedgaussian tends to be more accurate
# with smaller pmse value, but takes longer to run.

singR documentation built on April 3, 2025, 10:31 p.m.