Individual_Analysis: Individual-variant analysis using score test

View source: R/Individual_Analysis.R

Individual_AnalysisR Documentation

Individual-variant analysis using score test

Description

The Individual_Analysis function takes in chromosome, starting location, ending location, the object of opened annotated GDS file, and the object from fitting the null model to analyze the association between a quantitative/dichotomous phenotype (including imbalanced case-control design) and each individual variant in a genetic region by using score test. For multiple phenotype analysis (obj_nullmodel$n.pheno > 1), the results correspond to multi-trait score test p-values by leveraging the correlation structure between multiple phenotypes.

Usage

Individual_Analysis(
  chr,
  start_loc,
  end_loc,
  genofile,
  obj_nullmodel,
  mac_cutoff = 20,
  subset_variants_num = 5000,
  QC_label = "annotation/filter",
  variant_type = c("variant", "SNV", "Indel"),
  geno_missing_imputation = c("mean", "minor"),
  tol = .Machine$double.eps^0.25,
  max_iter = 1000,
  SPA_p_filter = TRUE,
  p_filter_cutoff = 0.05
)

Arguments

chr

chromosome.

start_loc

starting location (position) of the genetic region for each individual variant to be analyzed using score test.

end_loc

ending location (position) of the genetic region for each individual variant to be analyzed using score test.

genofile

an object of opened annotated GDS (aGDS) file.

obj_nullmodel

an object from fitting the null model, which is either the output from fit_nullmodel function, or the output from fitNullModel function in the GENESIS package and transformed using the genesis2staar_nullmodel function.

mac_cutoff

the cutoff of minimum minor allele count in defining individual variants (default = 20).

subset_variants_num

the number of variants to run per subset for each time (default = 5e3).

QC_label

channel name of the QC label in the GDS/aGDS file (default = "annotation/filter").

variant_type

type of variant included in the analysis. Choices include "variant", "SNV", or "Indel" (default = "variant").

geno_missing_imputation

method of handling missing genotypes. Either "mean" or "minor" (default = "mean").

tol

a positive number specifying tolerance, the difference threshold for parameter estimates in saddlepoint approximation algorithm below which iterations should be stopped (default = ".Machine$double.eps^0.25").

max_iter

a positive integer specifying the maximum number of iterations for applying the saddlepoint approximation algorithm (default = "1000").

SPA_p_filter

logical: are only the variants with a score-test-based p-value smaller than a pre-specified threshold use the SPA method to recalculate the p-value, only used for imbalanced case-control setting (default = TRUE).

p_filter_cutoff

threshold for the p-value recalculation using the SPA method, only used for imbalanced case-control setting (default = 0.05)

Value

A data frame containing the score test p-value and the estimated effect size of the minor allele for each individual variant in the given genetic region. The first 4 columns correspond to chromosome (CHR), position (POS), reference allele (REF), and alternative allele (ALT).

References

Chen, H., et al. (2016). Control for population structure and relatedness for binary traits in genetic association studies via logistic mixed models. The American Journal of Human Genetics, 98(4), 653-666. (pub)

Li, Z., Li, X., et al. (2022). A framework for detecting noncoding rare-variant associations of large-scale whole-genome sequencing studies. Nature Methods, 19(12), 1599-1611. (pub)


xihaoli/STAARpipeline documentation built on Feb. 9, 2025, 12:39 a.m.