lngca: Decompose the original data through LNGCA method.

View source: R/lngca.R

lngcaR Documentation

Decompose the original data through LNGCA method.

Description

Implements the methods of linear non-Gaussian component analysis (LNGCA) and likelihood component analysis (when using a density, e.g., tilted Gaussian) from the LNGCA paper

Usage

lngca(
  xData,
  n.comp = NULL,
  Ux.list = NULL,
  whiten = c("sqrtprec", "eigenvec", "none"),
  maxit = 1000,
  eps = 1e-06,
  verbose = FALSE,
  restarts.pbyd = 0,
  restarts.dbyd = 0,
  distribution = c("JB", "tiltedgaussian", "logistic"),
  density = FALSE,
  out.all = FALSE,
  orth.method = c("svd", "givens"),
  df = 0,
  stand = FALSE,
  ...
)

Arguments

xData

the original dataset for decomposition, matrix of n x px.

n.comp

the number of components to be estimated.

Ux.list

list of user specified initial values for Ux. If null, will generate random orthogonal matrices. See restarts.pbyd and restarts.dbyd

whiten

whitening method. Defaults to "svd" which uses the n left eigenvectors divided by sqrt(px-1) by 'eigenvec'. Optionally uses the square root of the n x n "precision" matrix by 'sqrtprec'.

maxit

max iteration, defalut = 1000

eps

default = 1e-06

verbose

default = FALSE

restarts.pbyd

default = 0. Generates p x d random orthogonal matrices. Use a large number for large datasets. Note: it is recommended that you run lngca twice with different seeds and compare the results, which should be similar when a sufficient number of restarts is used. In practice, stability with large datasets and a large number of components can be challenging.

restarts.dbyd

default = 0. These are d x d initial matrices padded with zeros, which results in initializations from the principal subspace. Can speed up convergence but may miss low variance non-Gaussian components.

distribution

distribution methods with default to tilted Gaussian. "logistic" is similar to infomax ICA, JB is capable of capture super and sub Gaussian distribution while being faster than tilted Gaussian. (tilted Gaussian tends to be most accurate, but computationally much slower.)

density

return the estimated tilted Gaussian density? default = FALSE

out.all

default = FALSE

orth.method

default = 'svd'. Method to generate random initial matrices. See [gen.inits()]

df

default = 0, df of the spline used in fitting the non-parametric density. use df=8 or so for tilted gaussian. set df=0 for JB and logistic.

stand

whether to standardize the data to have row and column means equal to 0 and the row standard deviation equal to 1 (i.e., all variables on same scale). Often used when combined with singR for data integration.

...

other arguments to tiltedgaussian estimation

Value

Function outputs a list including the following:

U

matrix rx x n, part of the expression that Ax = Ux x Lx and Ax x Xc = Sx, where Lx is the whitener matrix.

loglik

the value of log-likelihood in the lngca method.

S

the variable loading matrix r x px, each row is a component, which can be used to measure nongaussianity

df

egree of freedom.

distribution

the method used for data decomposition.

whitener

A symmetric whitening matrix n x n from dX, the same with whitenerXA = est.sigmaXA %^% -0.5

M

Mx Mtrix with n x rx.

nongaussianity

the nongaussianity score for each component saved in S matrix.

Examples


#get simulation data
data(exampledata)
data=exampledata

# To get n.comp value, we can use NG_number function.

# use JB statistic as the measure of nongaussianity to run lngca with df=0
estX_JB = lngca(xData = data$dX, n.comp = 4,
 whiten = 'sqrtprec', restarts.pbyd = 20, distribution='JB',df=0)

# use the tiltedgaussian distribution to run lngca with df=8. This takes a long time:
estX_tilt = lngca(xData = data$dX, n.comp = 4,
 whiten = 'sqrtprec', restarts.pbyd = 20, distribution='tiltedgaussian',df=8)

# true non-gaussian component of Sx, include individual and joint components
trueSx = rbind(data$sjX,data$siX)

# use pmse to compare the difference of the two methods
pmse(S1 = t(trueSx),S2=t(estX_JB$S),standardize = TRUE)
pmse(S1 = t(trueSx),S2=t(estX_tilt$S),standardize = TRUE)

# the lngca using tiltedgaussian tends to be more accurate
# with smaller pmse value, but takes longer to run.



singR documentation built on May 29, 2024, 7:30 a.m.