predict_target_genes: Predict target genes of fine-mapped variants for a trait

View source: R/predict_target_genes.R

predict_target_genesR Documentation

Predict target genes of fine-mapped variants for a trait

Description

The master, user-facing function of this package.

Usage

predict_target_genes(
  trait = NULL,
  out_dir = NULL,
  variants_file = NULL,
  known_genes_file = NULL,
  reference_panels_dir = NULL,
  celltype_of_interest = NULL,
  tissue_of_interest = NULL,
  celltypes = "enriched_tissues",
  variant_to_gene_max_distance = 2e+06,
  max_n_known_genes_per_CS = Inf,
  do_scoring = T,
  do_performance = T,
  do_XGBoost = F,
  do_timestamp = F,
  HiChIP = NULL,
  H3K27ac = NULL
)

Arguments

trait

Optional. The name of the trait of interest.

out_dir

The output directory in which to save the predictions. Default is "./out/trait/celltypes/".

variants_file

A BED file of trait-associated variants grouped by association signal, for example SNPs correlated with an index variant, or credible sets of fine-mapped variants

known_genes_file

Optional. The file containing a list of trait known gene symbols. If do_performance is TRUE, must provide a known_genes_file.

reference_panels_dir

The directory containing the external, accompanying reference panels data.

celltype_of_interest

Optional. The celltype(s) of interest for the trait. Only annotations in these celltypes will be used to make predictions. Argument(s) must match the names of celltypes in the metadata. Make sure the celltype of interest has coverage across all annotations (TADs, HiChIP, expression, H3K27ac) in the metadata table.

tissue_of_interest

Optional. The tissue(s) of interest for the trait. Only annotations in these tissues will be used to make predictions. Argument(s) must match the names of tissues in the metadata.

celltypes

Dictates which celltypes' annotations are used. Must be one of c("enriched_celltypes", "enriched_tissues", "all_celltypes"). If "enriched_celltypes", annotations from only the enriched celltype(s) will be used. The enriched celltype(s) must have coverage across all annotations (TADs, HiChIP, expression, H3K27ac) in the metadata table for this to work. If "enriched_tissues", all annotations from the tissue of the enriched celltype(s) will be used. If "all_celltypes", the enrichment analysis is skipped and annotations from all available cell types will be used. Default is "enriched_tissues".

variant_to_gene_max_distance

The maximum absolute distance (bp) across which variant-gene pairs are considered. Default is 2Mb. The HiChIP data is also already filtered to 2Mb.

max_n_known_genes_per_CS

In performance analysis, the maximum number of known genes within variant_to_gene_max_distance of the credible set.

do_scoring

If TRUE, runs the scoring chunk of the script, which combines all of the constituent MAE annotations into one score per transcript-variant pair. Default is FALSE.

do_performance

If TRUE, runs the performance chunk of the script, which measures the performance of the score and each of its constituent annotations in predicting known genes as the targets of nearby variants. Default is FALSE.

do_XGBoost

If TRUE, runs the XGBoost chunk of the script, which generates a model to predict the targets of variants from all available annotations and rates the importance of each annotation. Default is FALSE.

do_timestamp

If TRUE, will save output into a subdirectory timestamped with the data/time of the run.

HiChIP

If you are repeatedly running predict_target_genes, you can load the HiChIP object from the reference_panels_dir into the global environment and pass it to the function to prevent redundant re-loading each call to predict_target_genes.

H3K27ac

If you are repeatedly running predict_target_genes, you can load the H3K27ac object from the reference_panels_dir into the global environment and pass it to the function to prevent redundant re-loading with each call to predict_target_genes.

Value

A MultiAssayExperiment object with one assay object per annotation, one row per variant-transcript pair and one column per cell type (or 'value' if it is a non-cell-type-specific annotation).


alextidd/tgp documentation built on June 1, 2022, 9:25 a.m.