View source: R/Gene_Centric_Noncoding_Results_Summary.R
Gene_Centric_Noncoding_Results_Summary | R Documentation |
STAARpipeline
packageThe Gene_Centric_Noncoding_Results_Summary
function takes in the objects of gene-centric noncoding analysis results
generated by STAARpipeline
package,
the object from fitting the null model, and the set of known variants to be adjusted for in conditional analysis
to summarize the gene-centric noncoding analysis results and analyze the conditional association between a quantitative/dichotomous phenotype
(including imbalanced case-control design) and
the rare variants in the unconditional significant noncoding masks.
Gene_Centric_Noncoding_Results_Summary(
agds_dir,
gene_centric_noncoding_jobs_num,
input_path,
output_path,
gene_centric_results_name,
ncRNA_jobs_num,
ncRNA_input_path,
ncRNA_output_path,
ncRNA_results_name,
obj_nullmodel,
known_loci = NULL,
cMAC_cutoff = 0,
method_cond = c("optimal", "naive"),
rare_maf_cutoff = 0.01,
QC_label = "annotation/filter",
variant_type = c("SNV", "Indel", "variant"),
geno_missing_imputation = c("mean", "minor"),
Annotation_dir = "annotation/info/FunctionalAnnotation",
Annotation_name_catalog,
Use_annotation_weights = FALSE,
Annotation_name = NULL,
alpha = 2.5e-06,
alpha_ncRNA = 2.5e-06,
ncRNA_pos = NULL,
manhattan_plot = FALSE,
QQ_plot = FALSE,
cond_null_model_name = NULL,
cond_null_model_dir = NULL,
SPA_p_filter = FALSE,
p_filter_cutoff = 0.05
)
agds_dir |
a data farme containing directory of GDS/aGDS files. |
gene_centric_noncoding_jobs_num |
the number of results for gene-centric noncoding analysis of protein-coding genes generated by |
input_path |
the directory of gene-centric noncoding analysis results for protein-coding genes that generated by |
output_path |
the directory for the output files of the summary of gene-centric noncoding analysis results for protein-coding genes. |
gene_centric_results_name |
the file name of gene-centric noncoding analysis results for protein-coding genes generated by |
ncRNA_jobs_num |
the number of results for gene-centric noncoding analysis of ncRNA genes generated by |
ncRNA_input_path |
the directory of gene-centric noncoding analysis results for ncRNA genes that generated by |
ncRNA_output_path |
the directory for the output files of the summary of gene-centric noncoding analysis results for ncRNA genes. |
ncRNA_results_name |
file name of gene-centric noncoding analysis results for ncRNA genes that generated by |
obj_nullmodel |
an object from fitting the null model, which is either the output from |
known_loci |
the data frame of variants to be adjusted for in conditional analysis and should contain 4 columns in the following order: chromosome (CHR), position (POS), reference allele (REF), and alternative allele (ALT) (default = NULL). |
cMAC_cutoff |
the cutoff of the minimum number of the cumulative minor allele of variants in the masks when summarizing the results (default = 0). |
method_cond |
a character value indicating the method for conditional analysis.
|
rare_maf_cutoff |
the cutoff of maximum minor allele frequency in defining rare variants (default = 0.01). |
QC_label |
channel name of the QC label in the GDS/aGDS file (default = "annotation/filter"). |
variant_type |
type of variant included in the analysis. Choices include "SNV", "Indel", or "variant" (default = "SNV"). |
geno_missing_imputation |
method of handling missing genotypes. Either "mean" or "minor" (default = "mean"). |
Annotation_dir |
channel name of the annotations in the aGDS file |
Annotation_name_catalog |
a data frame containing the name and the corresponding channel name in the aGDS file. |
Use_annotation_weights |
use annotations as weights or not (default = FALSE). |
Annotation_name |
a vector of annotation names used in STAAR (default = NULL). |
alpha |
p-value threshold of significant results of protein coding genes (default = 2.5E-06). |
alpha_ncRNA |
p-value threshold of significant results of ncRNA genes (default = 2.5E-06). |
ncRNA_pos |
positions of ncRNA genes, required for generating the Manhattan plot and Q-Q plot of the results of ncRNA genes (default=NULL). |
manhattan_plot |
output manhattan plot or not (default = FALSE). |
QQ_plot |
output Q-Q plot or not (default = FALSE). |
cond_null_model_name |
the null model name for conditional analysis in the SPA setting, only used for imbalanced case-control setting (default = NULL). |
cond_null_model_dir |
the directory of storing the null model for conditional analysis in the SPA setting, only used for imbalanced case-control setting (default = NULL). |
SPA_p_filter |
logical: are only the variants with a normal approximation based p-value smaller than a pre-specified threshold use the SPA method to recalculate the p-value, only used for imbalanced case-control setting (default = FALSE). |
p_filter_cutoff |
threshold for the p-value recalculation using the SPA method, only used for imbalanced case-control setting (default = 0.05). |
The function returns the following analysis results:
noncoding_sig.csv
: a matrix that summarized the unconditional significant region detected by STAAR-O or STAAR-B in imbalanced case-control setting (STAAR-O/-B pvalue smaller than the threshold alpha),
including gene name ("Gene name"), chromosome ("chr"), coding functional category ("Category"), number of variants ("#SNV"),
and the unconditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting).
noncoding_sig_cond.csv
: a matrix that summarized the conditional analysis results of the unconditional significant region detected by STAAR-O or STAAR-B in imbalanced case-control setting (available if known_loci is not a NULL),
including gene name ("Gene name"), chromosome ("chr"), coding functional category ("Category"), number of variants ("#SNV"),
and the conditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting).
results_UTR_genome
: a matrix contains the STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the noncoding masks defined by UTR variants (UTR) for all protein-coding genes across the genome.
UTR_sig.csv
: a matrix contains the unconditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant UTR masks.
UTR_sig_cond.csv
: a matrix contains the conditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant UTR masks (available if known_loci is not a NULL).
results_upstream_genome
: a matrix contains the STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the noncoding masks defined by upstream variants (upstream) for all protein-coding genes across the genome.
upstream_sig.csv
: a matrix contains the unconditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant upstream masks.
upstream_sig_cond.csv
: a matrix contains the conditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant upstream masks (available if known_loci is not a NULL).
results_downstream_genome
: a matrix contains the STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the noncoding masks defined by downstream variants (downstream) for all protein-coding genes across the genome.
downstream_sig.csv
: a matrix contains the unconditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant downstream masks.
downstream_sig_cond.csv
: a matrix contains the conditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant downstream masks (available if known_loci is not a NULL).
results_promoter_CAGE_genome
: a matrix contains the STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the noncoding masks defined by variants overlaid with CAGE sites in the promoter (promoter_CAGE) for all protein-coding genes across the genome.
promoter_CAGE_sig.csv
: a matrix contains the unconditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant promoter_CAGE masks.
promoter_CAGE_sig_cond.csv
: a matrix contains the conditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant promoter_CAGE masks (available if known_loci is not a NULL).
results_promoter_DHS_genome
: a matrix contains the STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the noncoding masks defined by variants overlaid with DHS sites in the promoter (promoter_DHS) for all protein-coding genes across the genome.
promoter_DHS_sig.csv
: a matrix contains the unconditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant promoter_DHS masks.
promoter_DHS_sig_cond.csv
: a matrix contains the conditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant promoter_DHS masks (available if known_loci is not a NULL).
results_enhancer_CAGE_genome
: a matrix contains the STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the noncoding masks defined by variants overlaid with CAGE sites in the enhancer (enhancer_CAGE) for all protein-coding genes across the genome.
enhancer_CAGE_sig.csv
: a matrix contains the unconditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant enhancer_CAGE masks.
enhancer_CAGE_sig_cond.csv
: a matrix contains the conditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant enhancer_CAGE masks (available if known_loci is not a NULL).
results_enhancer_DHS_genome
: a matrix contains the STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the noncoding masks defined by variants overlaid with DHS sites in the enhancer (enhancer_DHS) for all protein-coding genes across the genome.
enhancer_DHS_sig.csv
: a matrix contains the unconditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant enhancer_DHS masks.
enhancer_DHS_sig_cond.csv
: a matrix contains the conditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant enhancer_DHS masks (available if known_loci is not a NULL).
results_ncRNA_genome
: a matrix contains the STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the noncoding masks defined by exonic and splicing ncRNA variants (ncRNA) for all ncRNA genes across the genome.
ncRNA_sig.csv
: a matrix contains the unconditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant ncRNA masks.
ncRNA_sig_cond.csv
: a matrix contains the conditional STAAR p-values (including STAAR-O or STAAR-B in imbalanced case-control setting) of the unconditional significant ncRNA masks (available if known_loci is not a NULL).
manhattan plot (optional) and Q-Q plot (optional) of the gene-centric noncoding analysis results.
Li, Z., Li, X., et al. (2022). A framework for detecting noncoding rare-variant associations of large-scale whole-genome sequencing studies. Nature Methods, 19(12), 1599-1611. (pub)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.