ligera2_bed_multi: LIGERA2 BED multi: LIght GEnetic Robust Association multiscan...

View source: R/ligera2_bed_multi.R

ligera2_bed_multiR Documentation

LIGERA2 BED multi: LIght GEnetic Robust Association multiscan function

Description

This function performs multiple genetic association scans, adding one significant locus per iteration to the model (modeled as a covariate) to increase power in the final model. The function returns a tibble containing association statistics and several intermediates. This optimized version requires the genotypes to be in a file in BED format.

Usage

ligera2_bed_multi(
  file,
  trait,
  mean_kinship,
  q_cut = 0.05,
  one_per_iter = FALSE,
  covar = NULL,
  mem_factor = 0.7,
  mem_lim = NA,
  m_chunk_max = 1000,
  tol = 1e-15,
  prune_ld = FALSE,
  r2_max = 0.3,
  pos_window = 10000000L
)

Arguments

file

The path to the BED file containing the genotypes, potentially excluding the BED extension.

trait

The length-n trait vector, which may be real valued and contain missing values.

mean_kinship

An estimate of the mean kinship produced externally, to ensure internal estimates of kinship and inbreeding are unbiased.

q_cut

The q-value threshold to admit new loci into the polygenic model.

one_per_iter

If true, only the most significant locus per iteration is added to model of next iteration. Otherwise all significant loci per iteration are added to the model of next iteration.

covar

An optional n-by-K matrix of K covariates, aligned with the individuals.

mem_factor

Proportion of available memory to use loading and processing genotypes. Ignored if mem_lim is not NA.

mem_lim

Memory limit in GB, used to break up genotype data into chunks for very large datasets. Note memory usage is somewhat underestimated and is not controlled strictly. Default in Linux and Windows is mem_factor times the free system memory, otherwise it is 1GB (OSX and other systems).

m_chunk_max

Sets the maximum number of loci to process at the time. Actual number of loci loaded may be lower if memory is limiting.

tol

Tolerance value passed to conjugate gradient method solver.

prune_ld

If TRUE (default FALSE), at every iteration (including the first one) if there is more than one new locus discovered, the set of loci is pruned by removing correlated loci with the following parameters.

r2_max

Maximum squared correlation coefficient between loci (ignored if prune_ld = FALSE).

pos_window

Window size, in basepairs, for on which to apply pruning (ignored if prune_ld = FALSE). If pos_window > 0, only pairs of loci in the same chromosome and with positions less than pos_window away will be pruned. However, if pos_window == 0L then all variants are pruned regardless of chr/pos values.

Details

Suppose there are n individuals and m loci.

Value

A tibble containing the following association statistics from the last scan for non-selected loci. For selected loci, these are the values from the scan before each was added to the model (as after addition they get beta ~= 0 and pval ~= 1).

  • chr: Chromosome of locus.

  • id: Locus ID.

  • posg: Position in genetic distance.

  • pos: Position in basepairs.

  • alt: Alternative allele.

  • ref: Reference allele (counted).

  • pval: The p-value of the last association scan.

  • beta: The estimated effect size coefficient for the trait vector at this locus.

  • beta_std_dev: The estimated coefficient variance of this locus (varies due to dependence on minor allele frequency).

  • p_q: The allele variance estimate (estimate of p*(1-p)). The number of heterozygotes, weighted by inbreeding coefficient, and with pseudocounts included, is used in this estimate (in other words, it does not equal MAF * ( 1 - MAF ), where MAF is the marginal allele frequency.

  • t_stat: The test statistic, equal to beta / beta_std_dev.

  • qval: The q-value of the last association scan.

  • sel: the order in which loci were selected, or zero if they were not selected.

See Also

The popkin package.

Examples

# MISSING SAMPLE BED FILE


OchoaLab/ligera documentation built on Jan. 5, 2023, 8:29 p.m.