In jeinson/ExOutBench: [Ex]pression [Out]lier [Bench]marking tool

library(tidyverse)
library(ExOutBench)

ExOutBench is a tool used to benchmark the performance expression outlier calling methods in prioritizing genes whose expression is associated with rare genetic variation. For each function in the package the input is:

A file with rare variant information. Which individual-gene pairs have a rare variant? This is provided, or pre-compiled beforehand.
A file with the expresion outlier status of the same individual-gene pair. This is the thing you're trying to test. The file with variant information (called outlier.calls) is formatted like this:

| SampleName | GeneID | chr | start | end | consdetail | |------------|-----------------|------|--------|--------|-------------------| | IndvA | ENSG00000186092 | chr1 | 64904 | 64905 | upstream | | IndvA | ENSG00000230021 | chr1 | 631495 | 631496 | upstream | | IndvA | ENSG00000230021 | chr1 | 666203 | 666204 | intron,non_coding | | IndvA | ENSG00000230021 | chr1 | 666398 | 666399 | intron,non_coding | | IndvB | ENSG00000225972 | chr1 | 631495 | 631496 | downstream | | IndvB | ENSG00000225630 | chr1 | 631495 | 631496 | downstream | | IndvB | ENSG00000237973 | chr1 | 631495 | 631496 | non_coding_exon | | IndvB | ENSG00000229344 | chr1 | 631495 | 631496 | upstream |

Every line is an individual-gene pair and a rare variant associated with that gene. There can be multiple rare variants associated with one individual-gene pair, but if an individual-gene pair does not carry any rare variants within the specified window, it will be excluded from the table.

I have compiled this file for European individuals in the GTEx v8 cohort. It lists all rare variants (MAF < 1% in GnomAD and in GTEx) within 10 kb upstream of the gene and within the gene body. The file will have to be shared internally, since it contains personal genotype data.

rare.variants <- read_tsv(
  "/gpfs/commons/groups/lappalainen_lab/jeinson/data/ExOutBench_data/all_rare_variants_SNPs_10kb_genebody_w_consdetail_no_NA.tsv", 
  progress = F)

The test also takes in expression outlier calls from any method you might want to benchmark. The formatted is as such, where every row is an individual-gene pair with a corresponding "outlier score":

|GeneID |SampleName | outlier.score| |:---------------|:----------|-------------:| |ENSG00000000971 |GTEX-1117F | 0.9098267| |ENSG00000001561 |GTEX-1117F | 0.9605523| |ENSG00000001617 |GTEX-1117F | 0.9692674| |ENSG00000002549 |GTEX-1117F | 0.9269707| |ENSG00000002822 |GTEX-1117F | 0.8792655| |ENSG00000002834 |GTEX-1117F | 0.9964071|

tiss.outlier.scores <-
  read_csv("/gpfs/commons/groups/lappalainen_lab/jeinson/projects/aneva-dot/analysis_v8/ANEVA-DOT-pipeline/Filtering/combined.ad.scores.in.MSCLSK.tsv") %>%
  rename("GeneID" = "X1") %>%
  gather(SampleName, DOT.score, -GeneID) %>%
  filter(complete.cases(.)) %>%
  rename("outlier.score" = "DOT.score")

!! The column names in the outlier call file and the rare variant call file must match! The package joins them together then performs enrichment analyses.

Right now, this package is able to calculate enrichment by

Outlier score threshold: This indicates how likely it is to observe a rare variant given outlier status, increasingly stringent outlier score thresholds.
Variant annotation: This shows the relative risk of a certain type of rare variant causing an expression outlier to occur, under a fixed threshold.

In the future, I will add support for showing enrichment of rare variants based on their distance to outlier genes, and enrichment of rare variants given their population frequency.

By default, each function removes all individual-gene pairs from genes that never occur as outliers. This functionality can be turned off by setting limit.to.genes.w.outliers to FALSE. The functions produce a data frame, and plots the results automatically by default. See the package documentation for more options and descriptions.

Run the enrichment by significance pipeline!

enrichment.by.significance.output <- 
  enrichment_by_significance(outlier.calls = tiss.outlier.scores, rare.variants = rare.variants, draw.plot = T)

enrichment.by.significance.output

Run the enrichment by annotation pipeline!

enrichment.by.annotation.out <- 
  enrichment_by_annotation(outlier.calls = tiss.outlier.scores, rare.variants = rare.variants, draw.plot = T)

enrichment.by.annotation.out

jeinson/ExOutBench documentation built on Nov. 4, 2019, 2:38 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jeinson/ExOutBench
[Ex]pression [Out]lier [Bench]marking tool

In jeinson/ExOutBench: [Ex]pression [Out]lier [Bench]marking tool

R Package Documentation

Browse R Packages

We want your feedback!

jeinson/ExOutBench [Ex]pression [Out]lier [Bench]marking tool

In jeinson/ExOutBench: [Ex]pression [Out]lier [Bench]marking tool

R Package Documentation

Browse R Packages

We want your feedback!

jeinson/ExOutBench
[Ex]pression [Out]lier [Bench]marking tool