jackstraw_lfa: Non-Parametric Jackstraw for Logistic Factor Analysis

View source: R/jackstraw_lfa.R

jackstraw_lfaR Documentation

Non-Parametric Jackstraw for Logistic Factor Analysis

Description

Test association between the observed variables and their latent variables captured by logistic factors (LFs).

Usage

jackstraw_lfa(
  dat,
  r,
  FUN = function(x) lfa::lfa(x, r),
  r1 = NULL,
  s = NULL,
  B = NULL,
  covariate = NULL,
  permute_alleles = FALSE,
  verbose = TRUE
)

Arguments

dat

either a genotype matrix with m rows as variables and n columns as observations, or a BEDMatrix object (see package BEDMatrix, these objects are transposed compared to the above but this works fine as-is, see example, no need to modify a BEDMatrix input). A BEDMatrix input triggers a low-memory mode where permutted data is also written and processed from disk, whereas a regular matrix input stores permutations in memory. The tradeoff is BEDMatrix version typically runs considerably slower, but enables analysis of very large data that is otherwise impossible.

r

a number of significant LFs.

FUN

a function to use for LFA (by default, it uses the lfa package)

r1

a numeric vector of LFs of interest (implying you are not interested in all r LFs).

s

a number of “synthetic” null variables. Out of m variables, s variables are independently permuted.

B

a number of resampling iterations. There will be a total of s*B null statistics.

covariate

a data matrix of covariates with corresponding n observations (do not include an intercept term).

permute_alleles

If TRUE, alleles (rather than genotypes) are permuted, which results in a more Binomial synthetic null when data is highly structured. Default FALSE.

verbose

a logical specifying to print the computational progress.

Details

This function uses logistic factor analysis (LFA) from Wei et al. (2014). Particularly, the deviance in logistic regression (the full model with r LFs vs. the intercept-only model) is used to assess significance.

The random outputs of the regular matrix versus the BEDMatrix versions are equal in distribution. However, fixing a seed and providing the same data to both versions does not result in the same exact outputs. This is because the BEDMatrix version permutes loci in a different order by necessity.

Value

jackstraw_lfa returns a list consisting of

p.value

m p-values of association tests between variables and their LFs

obs.stat

m observed deviances

null.stat

s*B null deviances

Author(s)

Neo Christopher Chung nchchung@gmail.com

References

Chung and Storey (2015) Statistical significance of variables driving systematic variation in high-dimensional data. Bioinformatics, 31(4): 545-554 https://academic.oup.com/bioinformatics/article/31/4/545/2748186

See Also

jackstraw_pca jackstraw jackstraw_subspace

Examples

## Not run: 
## simulate genotype data from a logistic factor model: drawing rbinom from logit(BL)
m <- 5000; n <- 100; pi0 <- .9
m0 <- round(m*pi0)
m1 <- m - round(m*pi0)
B <- matrix(0, nrow=m, ncol=1)
B[1:m1,] <- matrix(runif(m1*n, min=-.5, max=.5), nrow=m1, ncol=n)
L <- matrix(rnorm(n), nrow=1, ncol=n)
BL <- B %*% L
prob <- exp(BL)/(1+exp(BL))

dat <- matrix(rbinom(m*n, 2, as.numeric(prob)), m, n)

## apply the jackstraw_lfa
out <- jackstraw_lfa(dat, r = 2)

# if you had very large genotype data in plink BED/BIM/FAM files,
# use BEDMatrix and save memory by reading from disk (at the expense of speed)
library(BEDMatrix)
dat_BM <- BEDMatrix( 'filepath' ) # assumes filepath.bed, .bim and .fam exist
# run jackstraw!
out <- jackstraw_lfa(dat_BM, r = 2)

## End(Not run)


ncchung/jackstraw documentation built on Aug. 22, 2023, 12:12 p.m.