ligera: LIGERA: LIght GEnetic Robust Association main function
In OchoaLab/ligera: LIght GEnetic Robust Association

ligera

R Documentation

LIGERA: LIght GEnetic Robust Association main function

Description

This function performs the genetic association tests on every locus of a genotype matrix against a quantitative trait, given a precomputed kinship matrix. The function returns a tibble containing association statistics and several intermediates. This version calculates p-values using a Wald test.

Usage

ligera(
  X,
  trait,
  kinship,
  kinship_inv = NULL,
  inbr = popkin::inbr(kinship),
  covar = NULL,
  loci_on_cols = FALSE,
  mem_factor = 0.7,
  mem_lim = NA,
  m_chunk_max = 1000,
  tol = 1e-15,
  maxIter = 1e+06
)

Arguments

`X`	The `m`-by-`n` genotype matrix, containing dosage values in (0, 1, 2, NA) for the reference allele at each locus.
`trait`	The length-`n` trait vector, which may be real valued and contain missing values.
`kinship`	The `n`-by-`n` kinship matrix, estimated by other methods (i.e. the `popkin` package).
`kinship_inv`	The optional matrix inverse of the kinship matrix. Setting this parameter is not recommended, as internally a conjugate gradient method (`\link[cPCG]{cgsolve}`) is used to implicitly invert this matrix, which is much faster. However, for very large numbers of traits without missingness and the same kinship matrix, inverting once might be faster.
`inbr`	An optional length-`n` vector of inbreeding coefficients. Defaults to the inbreeding coefficients extracted from the provided `kinship` matrix. This parameter, intended for internal use only, enables direct comparison to the `ligera2` version.
`covar`	An optional `n`-by-`K` matrix of `K` covariates, aligned with the individuals.
`loci_on_cols`	If `TRUE`, `X` has loci on columns and individuals on rows; if false (the default), loci are on rows and individuals on columns. If `X` is a BEDMatrix object, `loci_on_cols = TRUE` is set automatically.
`mem_factor`	Proportion of available memory to use loading and processing genotypes. Ignored if `mem_lim` is not `NA`.
`mem_lim`	Memory limit in GB, used to break up genotype data into chunks for very large datasets. Note memory usage is somewhat underestimated and is not controlled strictly. Default in Linux and Windows is `mem_factor` times the free system memory, otherwise it is 1GB (OSX and other systems).
`m_chunk_max`	Sets the maximum number of loci to process at the time. Actual number of loci loaded may be lower if memory is limiting.
`tol`	Tolerance value passed to `\link[cPCG]{cgsolve}`.
`maxIter`	Maximum number of iterations passed to `\link[cPCG]{cgsolve}`.

Details

Suppose there are n individuals and m loci.

Value

A tibble containing the following association statistics

pval: The p-value of the association test
beta: The estimated effect size coefficient for the trait vector at this locus
beta_std_dev: The estimated coefficient variance of this locus (varies due to dependence on minor allele frequency)
p_q: The allele variance estimate (estimate of p*(1-p)). The number of heterozygotes, weighted by inbreeding coefficient, and with pseudocounts included, is used in this estimate (in other words, it does not equal MAF * ( 1 - MAF ), where MAF is the marginal allele frequency.
t_stat: The test statistic, equal to beta / beta_std_dev.

Examples

# Construct toy data
# genotype matrix
X <- matrix(
    c(0, 1, 2,
      1, 0, 1,
      1, 0, 2),
    nrow = 3,
    byrow = TRUE
)
trait <- 1 : 3
kinship <- diag( 3 ) / 2 # unstructured case

tib <- ligera( X, trait, kinship )
tib

OchoaLab/ligera documentation built on Jan. 5, 2023, 8:29 p.m.