Gene_Centric_Noncoding: Gene-centric analysis of noncoding functional categories...

View source: R/Gene_Centric_Noncoding.R

Gene_Centric_NoncodingR Documentation

Gene-centric analysis of noncoding functional categories using STAAR procedure

Description

The Gene_Centric_Noncoding function takes in chromosome, gene name, functional category, the object of opened annotated GDS file, and the object from fitting the null model to analyze the association between a quantitative/dichotomous phenotype (including imbalanced case-control design) and noncoding functional categories of a gene by using STAAR procedure. For each noncoding functional category, the STAAR-O p-value is a p-value from an omnibus test that aggregated SKAT(1,25), SKAT(1,1), Burden(1,25), Burden(1,1), ACAT-V(1,25), and ACAT-V(1,1) together with p-values of each test weighted by each annotation using Cauchy method. For imbalance case-control setting, the results correspond to the STAAR-B p-value, which is a p-value from an omnibus test that aggregated Burden(1,25) and Burden(1,1) together with p-values of each test weighted by each annotation using Cauchy method. For multiple phenotype analysis (obj_nullmodel$n.pheno > 1), the results correspond to multi-trait association p-values (e.g. MultiSTAAR-O) by leveraging the correlation structure between multiple phenotypes.

Usage

Gene_Centric_Noncoding(
  chr,
  gene_name,
  category = c("all_categories", "downstream", "upstream", "UTR", "promoter_CAGE",
    "promoter_DHS", "enhancer_CAGE", "enhancer_DHS"),
  genofile,
  obj_nullmodel,
  rare_maf_cutoff = 0.01,
  rv_num_cutoff = 2,
  rv_num_cutoff_max = 1e+09,
  rv_num_cutoff_max_prefilter = 1e+09,
  QC_label = "annotation/filter",
  variant_type = c("SNV", "Indel", "variant"),
  geno_missing_imputation = c("mean", "minor"),
  Annotation_dir = "annotation/info/FunctionalAnnotation",
  Annotation_name_catalog,
  Use_annotation_weights = c(TRUE, FALSE),
  Annotation_name = NULL,
  SPA_p_filter = TRUE,
  p_filter_cutoff = 0.05,
  silent = FALSE
)

Arguments

chr

chromosome.

gene_name

name of the gene to be analyzed using STAAR procedure.

category

the noncoding functional category to be analyzed using STAAR procedure. Choices include all_categories, downstream, upstream, UTR, promoter_CAGE, promoter_DHS, enhancer_CAGE, enhancer_DHS (default = all_categories).

genofile

an object of opened annotated GDS (aGDS) file.

obj_nullmodel

an object from fitting the null model, which is either the output from fit_nullmodel function, or the output from fitNullModel function in the GENESIS package and transformed using the genesis2staar_nullmodel function.

rare_maf_cutoff

the cutoff of maximum minor allele frequency in defining rare variants (default = 0.01).

rv_num_cutoff

the cutoff of minimum number of variants of analyzing a given variant-set (default = 2).

rv_num_cutoff_max

the cutoff of maximum number of variants of analyzing a given variant-set (default = 1e+09).

rv_num_cutoff_max_prefilter

the cutoff of maximum number of variants before extracting the genotype matrix (default = 1e+09).

QC_label

channel name of the QC label in the GDS/aGDS file (default = "annotation/filter").

variant_type

type of variant included in the analysis. Choices include "SNV", "Indel", or "variant" (default = "SNV").

geno_missing_imputation

method of handling missing genotypes. Either "mean" or "minor" (default = "mean").

Annotation_dir

channel name of the annotations in the aGDS file
(default = "annotation/info/FunctionalAnnotation").

Annotation_name_catalog

a data frame containing the name and the corresponding channel name in the aGDS file.

Use_annotation_weights

use annotations as weights or not (default = TRUE).

Annotation_name

a vector of annotation names used in STAAR (default = NULL).

SPA_p_filter

logical: are only the variants with a normal approximation based p-value smaller than a pre-specified threshold use the SPA method to recalculate the p-value, only used for imbalanced case-control setting (default = TRUE).

p_filter_cutoff

threshold for the p-value recalculation using the SPA method, only used for imbalanced case-control setting (default = 0.05).

silent

logical: should the report of error messages be suppressed (default = FALSE).

Value

A list of data frames containing the STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) corresponding to each noncoding functional category of the given gene.

References

Li, Z., Li, X., et al. (2022). A framework for detecting noncoding rare-variant associations of large-scale whole-genome sequencing studies. Nature Methods, 19(12), 1599-1611. (pub)

Li, X., Li, Z., et al. (2020). Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nature Genetics, 52(9), 969-983. (pub)


xihaoli/STAARpipeline documentation built on Feb. 9, 2025, 12:39 a.m.