# jackstraw_subspace: Jackstraw for the User-Defined Dimension Reduction Methods In jackstraw: Statistical Inference for Unsupervised Learning

 jackstraw_subspace R Documentation

## Jackstraw for the User-Defined Dimension Reduction Methods

### Description

Test association between the observed variables and their latent variables, captured by a user-defined dimension reduction method.

### Usage

``````jackstraw_subspace(
dat,
r,
FUN,
r1 = NULL,
s = NULL,
B = NULL,
covariate = NULL,
noise = NULL,
verbose = TRUE
)
``````

### Arguments

 `dat` a data matrix with `m` rows as variables and `n` columns as observations. `r` a number of significant latent variables. `FUN` Provide a specific function to estimate LVs. Must output `r` estimated LVs in a `n*r` matrix. `r1` a numeric vector of latent variables of interest. `s` a number of “synthetic” null variables. Out of `m` variables, `s` variables are independently permuted. `B` a number of resampling iterations. `covariate` a model matrix of covariates with `n` observations. Must include an intercept in the first column. `noise` specify a parametric distribution to generate a noise term. If `NULL`, a non-parametric jackstraw test is performed. `verbose` a logical specifying to print the computational progress.

### Details

This function computes `m` p-values of linear association between `m` variables and their latent variables, captured by a user-defined dimension reduction method. Its resampling strategy accounts for the over-fitting characteristics due to direct computation of PCs from the observed data and protects against an anti-conservative bias.

This function allows you to specify a parametric distribution of a noise term. It is an experimental feature. Then, a small number `s` of observed variables are replaced by synthetic null variables generated from a specified distribution.

### Value

`jackstraw_subspace` returns a list consisting of

 `p.value` `m` p-values of association tests between variables and their principal components `obs.stat` `m` observed statistics `null.stat` `s*B` null statistics

### Author(s)

Neo Christopher Chung nchchung@gmail.com

### References

Chung and Storey (2015) Statistical significance of variables driving systematic variation in high-dimensional data. Bioinformatics, 31(4): 545-554 https://academic.oup.com/bioinformatics/article/31/4/545/2748186

Chung (2020) Statistical significance of cluster membership for unsupervised evaluation of cell identities. Bioinformatics, 36(10): 3107–3114 https://academic.oup.com/bioinformatics/article/36/10/3107/5788523

jackstraw_pca jackstraw

### Examples

``````## simulate data from a latent variable model: Y = BL + E
B = c(rep(1,50),rep(-1,50), rep(0,900))
L = rnorm(20)
E = matrix(rnorm(1000*20), nrow=1000)
dat = B %*% t(L) + E
dat = t(scale(t(dat), center=TRUE, scale=TRUE))

## apply the jackstraw with the svd as a function
out = jackstraw_subspace(dat, FUN = function(x) svd(x)\$v[,1,drop=FALSE], r=1, s=100, B=50)

``````

jackstraw documentation built on June 22, 2024, 7:17 p.m.