Gene_Centric_Coding_Results_Summary: Summarize gene-centric coding analysis results generated by...

View source: R/Gene_Centric_Coding_Results_Summary.R

Gene_Centric_Coding_Results_SummaryR Documentation

Summarize gene-centric coding analysis results generated by STAARpipeline package and perform conditional analysis for (unconditionally) significant coding masks by adjusting for a given list of known variants

Description

The Gene_Centric_Coding_Results_Summary function takes in the objects of gene-centric coding analysis results generated by STAARpipeline package, the object from fitting the null model, and the set of known variants to be adjusted for in conditional analysis to summarize the gene-centric coding analysis results and analyze the conditional association between a quantitative/dichotomous phenotype (including imbalanced case-control design) and the rare variants in the unconditional significant coding masks.

Usage

Gene_Centric_Coding_Results_Summary(
  agds_dir,
  gene_centric_coding_jobs_num,
  input_path,
  output_path,
  gene_centric_results_name,
  obj_nullmodel,
  known_loci = NULL,
  cMAC_cutoff = 0,
  method_cond = c("optimal", "naive"),
  rare_maf_cutoff = 0.01,
  QC_label = "annotation/filter",
  variant_type = c("SNV", "Indel", "variant"),
  geno_missing_imputation = c("mean", "minor"),
  Annotation_dir = "annotation/info/FunctionalAnnotation",
  Annotation_name_catalog,
  Use_annotation_weights = FALSE,
  Annotation_name = NULL,
  alpha = 2.5e-06,
  manhattan_plot = FALSE,
  QQ_plot = FALSE,
  cond_null_model_name = NULL,
  cond_null_model_dir = NULL,
  SPA_p_filter = FALSE,
  p_filter_cutoff = 0.05
)

Arguments

agds_dir

file directory of annotated GDS (aGDS) files for all chromosomes (1-22)

gene_centric_coding_jobs_num

the number of gene-centric coding analysis results generated by STAARpipeline package.

input_path

the directory of gene-centric coding analysis results that generated by STAARpipeline package.

output_path

the directory for the output files.

gene_centric_results_name

file name of gene-centric coding analysis results generated by STAARpipeline package.

obj_nullmodel

an object from fitting the null model, which is either the output from fit_nullmodel function in the STAARpipeline package, or the output from fitNullModel function in the GENESIS package and transformed using the genesis2staar_nullmodel function in the STAARpipeline package.

known_loci

the data frame of variants to be adjusted for in conditional analysis and should contain 4 columns in the following order: chromosome (CHR), position (POS), reference allele (REF), and alternative allele (ALT) (default = NULL).

cMAC_cutoff

the cutoff of the minimum number of the cumulative minor allele of variants in the masks when summarizing the results (default = 0).

method_cond

a character value indicating the method for conditional analysis. optimal refers to regressing residuals from the null model on known_loci as well as all covariates used in fitting the null model (fully adjusted) and taking the residuals; naive refers to regressing residuals from the null model on known_loci and taking the residuals (default = optimal).

rare_maf_cutoff

the cutoff of maximum minor allele frequency in defining rare variants (default = 0.01).

QC_label

channel name of the QC label in the GDS/aGDS file (default = "annotation/filter").

variant_type

type of variant included in the analysis. Choices include "SNV", "Indel", or "variant" (default = "SNV").

geno_missing_imputation

method of handling missing genotypes. Either "mean" or "minor" (default = "mean").

Annotation_dir

channel name of the annotations in the aGDS file
(default = "annotation/info/FunctionalAnnotation").

Annotation_name_catalog

a data frame containing the name and the corresponding channel name in the aGDS file.

Use_annotation_weights

use annotations as weights or not (default = FALSE).

Annotation_name

a vector of annotation names used in STAAR (default = NULL).

alpha

p-value threshold of significant results (default = 2.5E-06).

manhattan_plot

output manhattan plot or not (default = FALSE).

QQ_plot

output Q-Q plot or not (default = FALSE).

cond_null_model_name

the null model name for conditional analysis in the SPA setting, only used for imbalanced case-control setting (default = NULL).

cond_null_model_dir

the directory of storing the null model for conditional analysis in the SPA setting, only used for imbalanced case-control setting (default = NULL).

SPA_p_filter

logical: are only the variants with a normal approximation based p-value smaller than a pre-specified threshold use the SPA method to recalculate the p-value, only used for imbalanced case-control setting (default = FALSE).

p_filter_cutoff

threshold for the p-value recalculation using the SPA method, only used for imbalanced case-control setting (default = 0.05).

Value

The function returns the following analysis results:

coding_sig.csv: a matrix that summarizes the unconditional significant coding masks detected by STAAR-O or STAAR-B in imbalanced case-control setting (STAAR-O/-B pvalue smaller than the threshold alpha), including gene name ("Gene name"), chromosome ("chr"), coding functional category ("Category"), number of variants ("#SNV"), and unconditional p-values of set-based tests SKAT ("SKAT(1,25)"), Burden ("Burden(1,1)"), ACAT-V ("ACAT-V(1,25)") and STAAR-O ("STAAR-O") or unconditional p-values of set-based tests Burden ("Burden(1,1)") and STAAR-B ("STAAR-B") for imbalanced case-control setting.

coding_sig_cond.csv: a matrix that summarized the conditional analysis results of unconditional significant coding masks detected by STAAR-O or STAAR-B in imbalanced case-control setting (available if known_loci is not a NULL), including gene name ("Gene name"), chromosome ("chr"), coding functional category ("Category"), number of variants ("#SNV"), and conditional p-values of set-based tests SKAT ("SKAT(1,25)"), Burden ("Burden(1,1)"), ACAT-V ("ACAT-V(1,25)") and STAAR-O ("STAAR-O") or conditional p-values of set-based tests Burden ("Burden(1,1)") and STAAR-B ("STAAR-B") for imbalanced case-control setting.

results_plof_genome.Rdata: a matrix contains the STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the coding mask defined by the putative loss of function variants (plof) for all protein-coding genes across the genome.

plof_sig.csv: a matrix contains the unconditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant plof masks.

plof_sig_cond.csv: a matrix contains the conditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant plof masks (available if known_loci is not a NULL).

results_plof_ds_genome.Rdata: a matrix contains the STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the coding mask defined by the putative loss of function variants and disruptive missense variants (plof_ds) for all protein-coding genes across the genome.

plof_ds_sig.csv: a matrix contains the unconditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant plof_ds masks.

plof_ds_sig_cond.csv: a matrix contains the conditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant plof_ds masks (available if known_loci is not a NULL).

results_disruptive_missense_genome.Rdata: a matrix contains the STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the coding mask defined by the disruptive missense variants (disruptive_missense) for all protein-coding genes across the genome.

disruptive_missense_sig.csv: a matrix contains the unconditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant disruptive_missense masks.

disruptive_missense_sig_cond.csv: a matrix contains the conditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant disruptive_missense masks (available if known_loci is not a NULL).

results_missense_genome.Rdata: a matrix contains the STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the coding mask defined by the missense variants (missense) for all protein-coding genes across the genome.

missense_sig.csv: a matrix contains the unconditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant missense masks.

missense_sig_cond.csv: a matrix contains the conditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant missense masks (available if known_loci is not a NULL).

results_synonymous_genome.Rdata: a matrix contains the STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the coding mask defined by the synonymous variants (synonymous) for all protein-coding genes across the genome.

synonymous_sig.csv: a matrix contains the unconditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant synonymous masks.

synonymous_sig_cond.csv: a matrix contains the conditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant synonymous masks (available if known_loci is not a NULL).

manhattan plot (optional) and Q-Q plot (optional) of the gene-centric coding analysis results.

References

Li, Z., Li, X., et al. (2022). A framework for detecting noncoding rare-variant associations of large-scale whole-genome sequencing studies. Nature Methods, 19(12), 1599-1611. (pub)


xihaoli/STAARpipelineSummary documentation built on Oct. 20, 2024, 9:35 p.m.