ligera2_f: LIGERA2: LIght GEnetic Robust Association main function

View source: R/ligera2_f.R

ligera2_fR Documentation

LIGERA2: LIght GEnetic Robust Association main function

Description

This function performs the genetic association tests on every locus of a genotype matrix against a quantitative trait, implicitly computing the kinship matrix in a way that scales better than an explicit kinship estimate. The function returns a tibble containing association statistics and several intermediates. This version calculates p-values using an F-test, which gives calibrated statistics under both quantitative and binary traits. Compared to ligera2(), which uses the faster Wald test (calibrated for quantitative but not binary traits), this F-test version is quite a bit slower, and is optimized for m >> n, so it is a work in progress.

Usage

ligera2_f(
  X,
  trait,
  mean_kinship,
  covar = NULL,
  loci_on_cols = FALSE,
  mem_factor = 0.7,
  mem_lim = NA,
  m_chunk_max = 1000,
  V = 0,
  tol = 1e-15
)

Arguments

X

The m-by-n genotype matrix, containing dosage values in (0, 1, 2, NA) for the reference allele at each locus.

trait

The length-n trait vector, which may be real valued and contain missing values.

mean_kinship

An estimate of the mean kinship produced externally, to ensure internal estimates of kinship are unbiased.

covar

An optional n-by-K matrix of K covariates, aligned with the individuals.

loci_on_cols

If TRUE, X has loci on columns and individuals on rows; if false (the default), loci are on rows and individuals on columns. If X is a BEDMatrix object, loci_on_cols = TRUE is set automatically.

mem_factor

Proportion of available memory to use loading and processing genotypes. Ignored if mem_lim is not NA.

mem_lim

Memory limit in GB, used to break up genotype data into chunks for very large datasets. Note memory usage is somewhat underestimated and is not controlled strictly. Default in Linux and Windows is mem_factor times the free system memory, otherwise it is 1GB (OSX and other systems).

m_chunk_max

Sets the maximum number of loci to process at the time. Actual number of loci loaded may be lower if memory is limiting.

V

Algorithm version (0, 1, 2). Experimental features, not worth explaining.

tol

Tolerance value passed to conjugate gradient method solver.

Details

Suppose there are n individuals and m loci.

Value

A tibble containing the following association statistics

  • pval: The p-value of the association test

  • beta: The estimated effect size coefficient for the trait vector at this locus

  • f_stat: The F statistic

  • df: degrees of freedom: number of non-missing individuals minus number of parameters of full model

See Also

The popkin package.

Examples

# Construct toy data
# genotype matrix
X <- matrix(
    c(0, 1, 2,
      1, 0, 1,
      1, 0, 2),
    nrow = 3,
    byrow = TRUE
)
trait <- 1 : 3
mean_kinship <- mean( diag( 3 ) / 2 ) # unstructured case

tib <- ligera2_f( X, trait, mean_kinship )
tib


OchoaLab/ligera documentation built on Jan. 5, 2023, 8:29 p.m.