Individual_Analysis_Results_Summary: Summarize individual-variant analysis results generated by...

View source: R/Individual_Analysis_Results_Summary.R

Individual_Analysis_Results_SummaryR Documentation

Summarize individual-variant analysis results generated by STAARpipeline package

Description

The Individual_Analysis_Results_Summary function takes in the objects of individual analysis results generated by STAARpipeline package, the object from fitting the null model, and the set of known variants to be adjusted for in conditional analysis to summarize the individual analysis results and analyze the conditional association between a quantitative/dichotomous phenotype and the unconditional significant single variants.

Usage

Individual_Analysis_Results_Summary(
  agds_dir,
  jobs_num,
  input_path,
  output_path,
  individual_results_name,
  obj_nullmodel,
  known_loci = NULL,
  method_cond = c("optimal", "naive"),
  QC_label = "annotation/filter",
  variant_type = c("variant", "SNV", "Indel"),
  geno_missing_imputation = c("mean", "minor"),
  alpha = 5e-09,
  manhattan_plot = FALSE,
  QQ_plot = FALSE,
  SPA_p_filter = FALSE,
  p_filter_cutoff = 0.05,
  cond_null_model_name = NULL,
  cond_null_model_dir = NULL
)

Arguments

agds_dir

a data farme containing directory of GDS/aGDS files.

jobs_num

a data frame containing the number of analysis results, including the number of individual analysis results, the number of sliding window analysis results, and the number of dynamic window analysis results.

input_path

the directory of individual analysis results that generated by STAARpipeline package.

output_path

the directory for the output files.

individual_results_name

the file name of individual analysis results generated by STAARpipeline package.

obj_nullmodel

an object from fitting the null model, which is either the output from fit_nullmodel function in the STAARpipeline package, or the output from fitNullModel function in the GENESIS package and transformed using the genesis2staar_nullmodel function in the STAARpipeline package.

known_loci

the data frame of variants to be adjusted for in conditional analysis and should contain 4 columns in the following order: chromosome (CHR), position (POS), reference allele (REF), and alternative allele (ALT) (default = NULL).

method_cond

a character value indicating the method for conditional analysis. optimal refers to regressing residuals from the null model on known_loci as well as all covariates used in fitting the null model (fully adjusted) and taking the residuals; naive refers to regressing residuals from the null model on known_loci and taking the residuals (default = optimal).

QC_label

channel name of the QC label in the GDS/aGDS file.

variant_type

type of variant included in the analysis. Choices include "variant", "SNV", or "Indel" (default = "variant").

geno_missing_imputation

method of handling missing genotypes. Either "mean" or "minor" (default = "mean").

alpha

p-value threshold of significant results (default = 5E-09).

manhattan_plot

output manhattan plot or not (default = FALSE).

QQ_plot

output Q-Q plot or not (default = FALSE).

SPA_p_filter

logical: are only the variants with a score-test-based p-value smaller than a pre-specified threshold use the SPA method to recalculate the p-value, only used for imbalanced case-control setting (default = FALSE).

p_filter_cutoff

threshold for the p-value recalculation using the SPA method, only used for imbalanced case-control setting (default = 0.05)

cond_null_model_name

the null model name for conditional analysis in the SPA setting, only used for imbalanced case-control setting (default = NULL).

cond_null_model_dir

the directory of storing the null model for conditional analysis in the SPA setting, only used for imbalanced case-control setting (default = NULL).

Value

The function returns the following analysis results:

results_individual_analysis_genome.Rdata: a matrix contains the score test p-value and effect size estimation of each variant across the genome.

results_individual_analysis_sig.Rdata and results_individual_analysis_sig.csv: a matrix contains the score test p-values and effect size estimations of significant results (p-value < alpha).

results_sig_cond.Rdata and results_sig_cond.csv: a matrix contains the conditional score test p-values for each significant variant (available if known_loci is not a NULL).

manhattan plot (optional) and Q-Q plot (optional) of the individual analysis results.

References

Li, Z., Li, X., et al. (2022). A framework for detecting noncoding rare-variant associations of large-scale whole-genome sequencing studies. Nature Methods, 19(12), 1599-1611. (pub)


xihaoli/STAARpipelineSummary documentation built on Oct. 20, 2024, 9:35 p.m.