Dynamic_Window_Results_Summary: Summarize the results of dynamic window analysis generated by...

View source: R/Dynamic_Window_Results_Summary.R

Dynamic_Window_Results_SummaryR Documentation

Summarize the results of dynamic window analysis generated by STAARpipeline package and perform conditional analysis for (unconditionally) significant genetic regions by adjusting for a given list of known variants

Description

The Dynamic_Window_Results_Summary function takes in the results of dynamic window analysis generated by STAARpipeline package, the object from fitting the null model, and the set of known variants to be adjusted for in conditional analysis to summarize the dynamic window analysis results and analyze the conditional association between a quantitative/dichotomous phenotype and the rare variants in the unconditional significant genetic regions.

Usage

Dynamic_Window_Results_Summary(
  agds_dir,
  jobs_num,
  input_path,
  output_path,
  dynamic_window_results_name,
  obj_nullmodel,
  known_loci = NULL,
  method_cond = c("optimal", "naive"),
  QC_label = "annotation/filter",
  geno_missing_imputation = c("mean", "minor"),
  variant_type = c("SNV", "Indel", "variant"),
  Annotation_dir = "annotation/info/FunctionalAnnotation",
  Annotation_name_catalog,
  Use_annotation_weights = FALSE,
  Annotation_name = NULL,
  alpha = 0.05
)

Arguments

agds_dir

a vector containing file directory of annotated GDS (aGDS) files for all chromosomes (1-22).

jobs_num

a data frame containing the number of jobs for association analysis. The data frame must include a column with the name "scang_num"

input_path

file directory of the input dynamic window analysis results.

output_path

file directory of the output summary results.

dynamic_window_results_name

file names of the input dynamic window analysis results.

obj_nullmodel

an object from fitting the null model, which is either the output from fit_nullmodel function in the STAARpipeline package, or the output from fitNullModel function in the GENESIS package and transformed using the genesis2staar_nullmodel function in the STAARpipeline package.

known_loci

a data frame of variants to be adjusted for in conditional analysis and should contain 4 columns in the following order: chromosome (CHR), position (POS), reference allele (REF), and alternative allele (ALT) (default = NULL).

method_cond

a character value indicating the method for conditional analysis. optimal refers to regressing residuals from the null model on known_loci as well as all covariates used in fitting the null model (fully adjusted) and taking the residuals; naive refers to regressing residuals from the null model on known_loci and taking the residuals (default = optimal).

QC_label

channel name of the QC label in the GDS/aGDS file (default = "annotation/filter").

geno_missing_imputation

method of handling missing genotypes. Either "mean" or "minor" (default = "mean").

variant_type

type of variant included in the conditional analysis. Choice includes "SNV", "Indel", or "variant" (default = "SNV").

Annotation_dir

channel name of the annotations in the aGDS file
(default = "annotation/info/FunctionalAnnotation").

Annotation_name_catalog

a data frame containing the name and the corresponding channel name in the aGDS file.

Use_annotation_weights

use annotations as weights or not (default = FALSE).

Annotation_name

a vector of annotation names used in SCANG-STAAR (default = NULL).

alpha

threshod to control the genome-wise (family-wise) error rate (default = 0.05).

Value

The function returns the following analysis results:

SCANG_S_res_uncond_cond.Rdata and SCANG_S_res_uncond_cond.csv: A matrix that summarized the unconditional and conditional results of the significant regions (GWER<alpha) detected by the SCANG-STAAR-S procedure (conditional results available if known_loci is not a NULL), including chromosome ("chr"), start position ("start_pos"), end position ("end_pos"), number of variants ("SNV_nos"), family-wise/genome-wide error rate (GWER), unconditional STAAR-S p-value ("STAAR_S"), conditional STAAR-S p-value ("STAAR_S_cond"), conditional ACAT-V p-value ("ACAT_V_cond"), conditional Burden p-value ("Burden_cond"), conditional SKAT p-value ("SKAT_cond"), and conditional STAAR-O p-value ("STAAR_O_cond").

SCANG_B_res_uncond_cond.Rdata and SCANG_B_res_uncond_cond.csv: A matrix that summarized the unconditional and conditional results of the significant regions detected by the SCANG-STAAR-B procedure (conditional results available if known_loci is not a NULL). Details see SCANG-STAAR-S.

SCANG_O_res_uncond_cond.Rdata and SCANG_O_res_uncond_cond.csv: A matrix that summarized the unconditional and conditional results of the significant regions detected by the SCANG-STAAR-O procedure (conditional results available if known_loci is not a NULL). Details see SCANG-STAAR-S.

results_dynamic_window.Rdata: A Rdata file that summarized the significant regions detected by SCANG-STAAR procedure.

SCANG_S_top1.Rdata and SCANG_S_top1.csv: A matrix that summarized the top 1 unconditional region detected by SCANG-STAAR-S, including the STAAR-S p-value ("STAAR_S"), chromosome ("chr"), start position ("start_pos"), end position ("end_pos"), family-wise/genome-wide error rate (GWER) and the number of variants ("SNV_nos").

SCANG_B_top1.Rdata and SCANG_B_top1.csv: A matrix that summarized the top 1 unconditional region detected by SCANG-STAAR-B. Details see SCANG-STAAR-S.

SCANG_O_top1.Rdata and SCANG_O_top1.csv: A matrix that summarized the top 1 unconditional region detected by SCANG-STAAR-O. Details see SCANG-STAAR-S.

SCANG_S_res.Rdata and SCANG_S_res.csv: A matrix that summarized the significant regions (GWER<alpha) detected by SCANG-STAAR-S, including the negative log transformation of STAAR-S p-value ("-logp"), chromosome ("chr"), start position ("start_pos"), end position ("end_pos"), family-wise/genome-wide error rate (GWER) and the number of variants ("SNV_num").

SCANG_B_res.Rdata and SCANG_B_res.csv: A matrix that summarized the significant regions detected by SCANG-STAAR-B. Details see SCANG-STAAR-S.

SCANG_O_res.Rdata and SCANG_O_res.csv: A matrix that summarized the significant regions detected by SCANG-STAAR-O. Details see SCANG-STAAR-S.

SCANG_S_res_cond.Rdata and SCANG_S_res_cond.csv: A matrix that summarized the conditional p-values of the significant regions (GWER<alpha) detected by SCANG-STAAR-S, including chromosome ("chr"), start position ("Start Loc"), end position ("End Loc"), the number of variants ("#SNV"), annotation-weighted ACAT-V, Burden and SKAT conditional p-values, and STAAR conditional p-values of the regions with GWER smaller than the threshold alpha (available if known_loci is not a NULL).

SCANG_B_res_cond.Rdata and SCANG_B_res_cond.csv: A matrix that summarized the conditional p-values of the significant regions (GWER<alpha) detected by SCANG-STAAR-B (available if known_loci is not a NULL), Details see SCANG-STAAR-S.

SCANG_O_res_cond.Rdata and SCANG_O_res_cond.csv: A matrix that summarized the conditional p-values of the significant regions (GWER<alpha) detected by SCANG-STAAR-O (available if known_loci is not a NULL), Details see SCANG-STAAR-S.

References

Li, Z., Li, X., et al. (2022). A framework for detecting noncoding rare-variant associations of large-scale whole-genome sequencing studies. Nature Methods, 19(12), 1599-1611. (pub)


xihaoli/STAARpipelineSummary documentation built on July 27, 2024, 4:30 p.m.