LD_pruning: Linkage disequilibrium (LD) pruning procedure

View source: R/LD_pruning.R

LD_pruningR Documentation

Linkage disequilibrium (LD) pruning procedure

Description

The LD_pruning function takes in chromosome, the object of opened annotated GDS file, the object from fitting the null model, and a given list of variants to perform LD pruning among these variants in sequential conditional analysis by using score test. For multiple phenotype analysis (obj_nullmodel$n.pheno > 1), the results correspond to multi-trait sequential conditional analysis by leveraging the correlation structure between multiple phenotypes.

Usage

LD_pruning(
  chr,
  genofile,
  obj_nullmodel,
  variants_list,
  maf_cutoff = 0.01,
  cond_p_thresh = 1e-04,
  method_cond = c("optimal", "naive"),
  QC_label = "annotation/filter",
  variant_type = c("variant", "SNV", "Indel"),
  geno_missing_imputation = c("mean", "minor"),
  geno_position_ascending = TRUE
)

Arguments

chr

chromosome.

genofile

an object of opened annotated GDS (aGDS) file.

obj_nullmodel

an object from fitting the null model, which is either the output from fit_nullmodel function, or the output from fitNullModel function in the GENESIS package and transformed using the genesis2staar_nullmodel function.

variants_list

the data frame of variants to be LD-pruned in sequential conditional analysis and should contain 4 columns in the following order: chromosome (CHR), position (POS), reference allele (REF), and alternative allele (ALT).

maf_cutoff

the cutoff of minimum minor allele frequency in defining individual variants to be LD-pruned (default = 0.01).

cond_p_thresh

the cutoff of maximum conditional p-value allowed for variants to be kept in the LD-pruned list of variants (default = 1e-04).

method_cond

a character value indicating the method for conditional analysis. optimal refers to regressing residuals from the null model on known_loci as well as all covariates used in fitting the null model (fully adjusted) and taking the residuals; naive refers to regressing residuals from the null model on known_loci and taking the residuals (default = optimal).

QC_label

channel name of the QC label in the GDS/aGDS file (default = "annotation/filter").

variant_type

type of variant included in the analysis. Choices include "variant", "SNV", or "Indel" (default = "variant").

geno_missing_imputation

method of handling missing genotypes. Either "mean" or "minor" (default = "mean").

geno_position_ascending

logical: are the variant positions in ascending order in the GDS/aGDS file (default = TRUE).

Value

A data frame containing the list of LD-pruned variants in the given chromosome.

References

Li, Z., Li, X., et al. (2022). A framework for detecting noncoding rare-variant associations of large-scale whole-genome sequencing studies. Nature Methods, 19(12), 1599-1611. (pub)


xihaoli/STAARpipeline documentation built on Feb. 9, 2025, 12:39 a.m.