ligera_f_multi: LIGERA_multi: LIght GEnetic Robust Association multiscan...

View source: R/ligera_f_multi.R

ligera_f_multiR Documentation

LIGERA_multi: LIght GEnetic Robust Association multiscan function

Description

This function performs multiple genetic association scans, adding one significant locus per iteration to the model (modeled as a covariate) to increase power in the final model. The function returns a tibble containing association statistics and several intermediates. This version calculates p-values using an F-test, which gives calibrated statistics under both quantitative and binary traits. Compared to ligera(), which uses the faster Wald test (calibrated for quantitative but not binary traits), this F-test version is quite a bit slower, and is optimized for m >> n, so it is a work in progress.

Usage

ligera_f_multi(
  X,
  trait,
  kinship,
  q_cut = 0.05,
  one_per_iter = FALSE,
  kinship_inv = NULL,
  covar = NULL,
  loci_on_cols = FALSE,
  mem_factor = 0.7,
  mem_lim = NA,
  m_chunk_max = 1000,
  tol = 1e-15,
  maxIter = 1e+06
)

Arguments

X

The m-by-n genotype matrix, containing dosage values in (0, 1, 2, NA) for the reference allele at each locus.

trait

The length-n trait vector, which may be real valued and contain missing values.

kinship

The n-by-n kinship matrix, estimated by other methods (i.e. the popkin package).

q_cut

The q-value threshold to admit new loci into the polygenic model.

one_per_iter

If true, only the most significant locus per iteration is added to model of next iteration. Otherwise all significant loci per iteration are added to the model of next iteration.

kinship_inv

The optional matrix inverse of the kinship matrix. Setting this parameter is not recommended, as internally a conjugate gradient method (\link[cPCG]{cgsolve}) is used to implicitly invert this matrix, which is much faster. However, for very large numbers of traits without missingness and the same kinship matrix, inverting once might be faster.

covar

An optional n-by-K matrix of K covariates, aligned with the individuals.

loci_on_cols

If TRUE, X has loci on columns and individuals on rows; if false (the default), loci are on rows and individuals on columns. If X is a BEDMatrix object, loci_on_cols = TRUE is set automatically.

mem_factor

Proportion of available memory to use loading and processing genotypes. Ignored if mem_lim is not NA.

mem_lim

Memory limit in GB, used to break up genotype data into chunks for very large datasets. Note memory usage is somewhat underestimated and is not controlled strictly. Default in Linux and Windows is mem_factor times the free system memory, otherwise it is 1GB (OSX and other systems).

m_chunk_max

Sets the maximum number of loci to process at the time. Actual number of loci loaded may be lower if memory is limiting.

tol

Tolerance value passed to \link[cPCG]{cgsolve}.

maxIter

Maximum number of iterations passed to \link[cPCG]{cgsolve}.

Details

Suppose there are n individuals and m loci.

Value

A tibble containing the following association statistics from the last scan for non-selected loci. For selected loci, these are the values from the scan before each was added to the model (as after addition they get beta ~= 0 and pval ~= 1).

  • pval: The p-value of the last association scan.

  • beta: The estimated effect size coefficient for the trait vector at this locus.

  • f_stat: The F statistic.

  • df: degrees of freedom: number of non-missing individuals minus number of parameters of full model

  • qval: The q-value of the last association scan.

  • sel: the order in which loci were selected, or zero if they were not selected.

See Also

The popkin and cPCG packages.

Examples

# Construct random data
# number of individuals we want
n_ind <- 5
# number of loci we want
m_loci <- 100
# a not so small random genotype matrix
X <- matrix(
    rbinom( m_loci * n_ind, 2, 0.5 ),
    nrow = m_loci
)
# random trait
trait <- rnorm( n_ind )
# add a genetic effect from first locus
trait <- trait + X[ 1, ]
# kinship matrix
kinship <- diag( n_ind ) / 2 # unstructured case

tib <- ligera_f_multi( X, trait, kinship )
tib


OchoaLab/ligera documentation built on Jan. 5, 2023, 8:29 p.m.