singR: SImultaneous Non-Gaussian Component analysis for data...

View source: R/singR.R

singRR Documentation

SImultaneous Non-Gaussian Component analysis for data integration.

Description

This function combines all steps from the SING paper

Usage

singR(
  dX,
  dY,
  n.comp.X = NULL,
  n.comp.Y = NULL,
  df = 0,
  rho_extent = c("small", "medium", "large"),
  Cplus = TRUE,
  tol = 1e-10,
  stand = FALSE,
  distribution = "JB",
  maxiter = 1500,
  individual = FALSE,
  whiten = c("sqrtprec", "eigenvec", "none"),
  restarts.dbyd = 0,
  restarts.pbyd = 20
)

Arguments

dX

original dataset for decomposition, matrix of n x px.

dY

original dataset for decomposition, matrix of n x py.

n.comp.X

the number of non-Gaussian components in dataset X. If null, will estimate the number using ICtest::FOBIasymp.

n.comp.Y

the number of non-Gaussian components in dataset Y. If null, will estimate the number using ICtest::FOBIasymp.

df

default value=0 when use JB, if df>0, estimates a density for the loadings using a tilted Gaussian (non-parametric density estimate).

rho_extent

Controls similarity of the scores in the two datasets. Numerical value and three options in character are acceptable. small, medium or large is defined from the JB statistic. Try "small" and see if the loadings are equal, then try others if needed. If numeric input, it will multiply the input by JBall to get the rho.

Cplus

whether to use C code (faster) in curvilinear search.

tol

difference tolerance in curvilinear search.

stand

whether to use standardization, if true, it will make the column and row means to 0 and columns sd to 1. If false, it will only make the row means to 0.

distribution

"JB" or "tiltedgaussian"; "JB" is much faster. In SING, this refers to the "density" formed from the vector of loadings. "tiltedgaussian" with large df can potentially model more complicated patterns.

maxiter

the max iteration number for the curvilinear search.

individual

whether to return the individual non-Gaussian components, default value = F.

whiten

whitening method used in lngca. Defaults to "svd" which uses the n left eigenvectors divided by sqrt(px-1) by 'eigenvec'. Optionally uses the square root of the n x n "precision" matrix by 'sqrtprec'.

restarts.dbyd

default = 0. These are d x d initial matrices padded with zeros, which results in initializations from the principal subspace. Can speed up convergence but may miss low variance non-Gaussian components.

restarts.pbyd

default = 20. Generates p x d random orthogonal matrices. Use a large number for large datasets. Note: it is recommended that you run lngca twice with different seeds and compare the results, which should be similar when a sufficient number of restarts is used. In practice, stability with large datasets and a large number of components can be challenging.

Value

Function outputs a list including the following:

Sjx

variable loadings for joint NG components in dataset X with matrix rj x px.

Sjy

variable loadings for joint NG components in dataset Y with matrix rj x py.

Six

variable loadings for individual NG components in dataset X with matrix riX x px.

Siy

variable loadings for individual NG components in dataset Y with matrix riX x py.

Mix

scores of individual NG components in X with matrix n x riX.

Miy

scores of individual NG components in Y with matrix n x riY.

est.Mjx

Estimated subject scores for joint components in dataset X with matrix n x rj.

est.Mjy

Estimated subject scores for joint components in dataset Y with matrix n x rj.

est.Mj

Average of est.Mjx and est.Mjy as the subject scores for joint components in both datasets with matrix n x rj.

C_plus

whether to use C version of curvilinear search.

rho_extent

the weight of rho in search

df

degree of freedom, = 0 when use JB, >0 when use tiltedgaussian.

Examples


#get simulation data
data(exampledata)

# use JB stat to compute with singR
output_JB=singR(dX=exampledata$dX,dY=exampledata$dY,
df=0,rho_extent="small",distribution="JB",individual=TRUE)

# use tiltedgaussian distribution to compute with singR.
# tiltedgaussian may be more accurate but is considerably slower,
# and is not recommended for large datasets.
output_tilted=singR(dX=exampledata$dX,dY=exampledata$dY,
df=5,rho_extent="small",distribution="tiltedgaussian",individual=TRUE)

# use pmse to measure difference from the truth
pmse(M1 = t(output_JB$est.Mj),M2 = t(exampledata$mj),standardize = TRUE)

pmse(M1 = t(output_tilted$est.Mj),M2 = t(exampledata$mj),standardize = TRUE)



singR documentation built on May 29, 2024, 7:30 a.m.