STAAR_Binary_SPA: STAAR-SPA procedure using omnibus test

View source: R/STAAR_Binary_SPA.R

STAAR_Binary_SPAR Documentation

STAAR-SPA procedure using omnibus test

Description

The STAAR_Binary_SPA function takes in genotype, the object from fitting the null model, and functional annotation data to analyze the association between a imbalanced case-control phenotype and a variant-set by using STAAR-SPA procedure. For each variant-set, the STAAR-B p-value is a p-value from an omnibus test that aggregated Burden(1,25) and Burden(1,1) together with p-values of each test weighted by each annotation using Cauchy method.

Usage

STAAR_Binary_SPA(
  genotype,
  obj_nullmodel,
  annotation_phred = NULL,
  rare_maf_cutoff = 0.01,
  rv_num_cutoff = 2,
  tol = .Machine$double.eps^0.25,
  max_iter = 1000,
  SPA_p_filter = FALSE,
  p_filter_cutoff = 0.05
)

Arguments

genotype

an n*p genotype matrix (dosage matrix) of the target sequence, where n is the sample size and p is the number of genetic variants.

obj_nullmodel

an object from fitting the null model, which is the output from either fit_null_glm function for unrelated samples or fit_null_glmmkin function for related samples. Note that fit_null_glmmkin is a wrapper of the glmmkin function from the GMMAT package.

annotation_phred

a data frame or matrix of functional annotation data of dimension p*q (or a vector of a single annotation score with length p). Continuous scores should be given in PHRED score scale, where the PHRED score of j-th variant is defined to be -10*log10(rank(-score_j)/total) across the genome. (Binary) categorical scores should be taking values 0 or 1, where 1 is functional and 0 is non-functional. If not provided, STAAR will perform the Burden(1,25) and Burden(1,1) tests (default = NULL).

rare_maf_cutoff

the cutoff of maximum minor allele frequency in defining rare variants (default = 0.01).

rv_num_cutoff

the cutoff of minimum number of variants of analyzing a given variant-set (default = 2).

tol

a positive number specifying tolerance, the difference threshold for parameter estimates in saddlepoint apporximation algorithm below which iterations should be stopped (default = ".Machine$double.eps^0.25").

max_iter

a positive integers pecifying the maximum number of iterations for applying the saddlepoint approximation algorithm (default = "1000").

SPA_p_filter

logical: are only the variants with a normal approximation based p-value smaller than a pre-specified threshold use the SPA method to recalculate the p-value, only used for imbalanced case-control setting (default = FALSE).

p_filter_cutoff

threshold for the p-value recalculation using the SPA method, only used for imbalanced case-control setting (default = 0.05)

Value

A list with the following members:

num_variant: the number of variants with minor allele frequency > 0 and less than rare_maf_cutoff in the given variant-set that are used for performing the variant-set using STAAR.

cMAC: the cumulative minor allele count of variants with minor allele frequency > 0 and less than rare_maf_cutoff in the given variant-set.

RV_label: the boolean vector indicating whether each variant in the given variant-set has minor allele frequency > 0 and less than rare_maf_cutoff.

results_STAAR_B: the STAAR-B p-value that aggregated Burden(1,25) and Burden(1,1) together with p-values of each test weighted by each annotation using Cauchy method.

results_STAAR_B_1_25: a vector of STAAR-B(1,25) p-values, including Burden(1,25) p-value weighted by MAF, the Burden(1,25) p-values weighted by each annotation, and a STAAR-B(1,25) p-value by aggregating these p-values using Cauchy method.

results_STAAR_B_1_1: a vector of STAAR-B(1,1) p-values, including Burden(1,1) p-value weighted by MAF, the Burden(1,1) p-values weighted by each annotation, and a STAAR-B(1,1) p-value by aggregating these p-values using Cauchy method.

References

Li, X., Li, Z., et al. (2020). Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nature Genetics, 52(9), 969-983. (pub)

Li, Z., Li, X., et al. (2022). A framework for detecting noncoding rare-variant associations of large-scale whole-genome sequencing studies. Nature Methods, 19(12), 1599-1611. (pub)

Liu, Y., et al. (2019). Acat: A fast and powerful p value combination method for rare-variant analysis in sequencing studies. The American Journal of Human Genetics, 104(3), 410-421. (pub)

Li, Z., Li, X., et al. (2020). Dynamic scan procedure for detecting rare-variant association regions in whole-genome sequencing studies. The American Journal of Human Genetics, 104(5), 802-814. (pub)


xihaoli/STAAR documentation built on Nov. 3, 2024, 9:34 p.m.