knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(LocusXcanR)

Introduction

The interpretation of TWAS results is not always straightforward and can be quite challenging, especially for large-scale analyses, such as the Genetic Epidemiology on Adult Health and Aging (GERA) cohort (N ~ 55,000) and UK Biobank (N ~ 500,000). Care must be taken to interpret TWAS results in context (e.g. alongside information known from other 'omics studies such as GWAS), understand how multiple TWAS signals at a given locus may be affected by correlated gene expression and/or the sharing of model variants, and evaluate how fine-mapping results either clarify the most likely causal genes, or may be confounded by some of the same challenges as the original TWAS analysis.

Thus, in order to facilitate the interpretation of TWAS findings, we present an R package which creates an R Shiny application designed to interactively visualize TWAS results. We discuss, in detail, the features of the R package and R Shiny application and show how it can be used to quickly interpret very complex TWAS loci in the context of prior knowledge from GWAS.

dir <- system.file("extdata", package="LocusXcanR")

twas_ds <- read.table(file.path(dir, "twas_ds.txt"), header=T, stringsAsFactors = F)

ld_gwas <- read.table(file.path(dir, "ld_gwas.txt"), header=T, stringsAsFactors = F)

load(file.path(dir,"pred_exp_corr.Rda"))

db_genes <- read.table(file.path(dir, "db_genes.txt"), header=T, stringsAsFactors = F)

all_gwas <- read.table(file.path(dir, "all_gwas.txt"), header=T, stringsAsFactors = F)

weight_tbl <- read.table(file.path(dir, "weight_tbl.txt"), header=T, stringsAsFactors = F)

known_variants <- read.table(file.path(dir, "gwas_sentinel.txt"), header=T, stringsAsFactors = F)

known_gwas <- read.table(file.path(dir, "known_gwas.txt"), header=T, stringsAsFactors = F)

cytoband_ds <- read.table(file.path(dir, "cytoBandIdeo.txt.gz"), header = F, sep = "\t")

LocusXcanR Function

The LocusXcanR function takes several datasets as input, including TWAS results, GWAS results, gene expression reference panel weights, GWAS LD, and predicted expression correlation. These results are then integrated into a single HTML webpage via R Shiny, and the interactive visualizations and data tables can then be used to facilitate TWAS results interpretation.

LocusXcanR(twas_result, weight_tbl, study_name = "", pred_exp_corr, conditional_present = FALSE, multiple_tissues = FALSE, known_variants, known_gwas, db_genes, all_gwas, ld_gwas, ref_expr_name = "", head_details = "", method_details = "", primary_tissue, meta_present = FALSE, ideogram_present = FALSE, genome_build, cytoband_ds, add_UI = "", add_server = "")

Required parameters

The LocusXcanR function contains the following required parameters and their definitions:

Optional parameters

The LocusXcanR function contains the following optional parameters and their definitions:

Data Pre-processing

TWAS results dataset (twas_ds)

The TWAS results dataset includes, at the very minimum, TWAS results (in the form of a p-value) from a primary TWAS analysis. P-values from a primary TWAS analysis may come from a single-tissue TWAS analysis, a multi-tissue TWAS analysis, or a TWAS meta-analysis. The TWAS results dataset may optionally include results from a TWAS meta-analysis (e.g. for replication), TWAS conditional analysis, and/or TWAS results from secondary tissue gene expression reference panels for comparison.

The dataset twas_ds requires the following variables:

Optional variables may also be included in the twas_ds dataset. These include the following:

Additional variables (beyond those that are required or optional) may also be included in the twas_ds dataset. Any additional variables will be ignored when plotting figures but will be printed in data tables within the application.

Here is an example of twas_ds:

knitr::kable(twas_ds[79:83,])

GWAS results dataset (all_gwas)

The GWAS results dataset all_gwas contains all of the analytic cohort GWAS p-values that you wish to plot. It is recommended that you subset the GWAS results dataset based on a p-value threshold (e.g. only include variants with GWAS p-value < 0.05). The dataset contains variant information, the GWAS p-value, and the locus in which the variant needs to be plotted.

The dataset all_gwas requires the following variables:

Here is an example of all_gwas:

knitr::kable(all_gwas[c(1:5),])

Weights table (weight_tbl)

The weight_tbl dataset contains a set of genes from the primary gene expression reference panel along with their corresponding predictive model variant weights. This data set is used to connect TWAS genes to their respective GWAS model variants and report the weight and direction of the effect.

The dataset weight_tbl requires the following variables:

Here is an example of weight_tbl:

knitr::kable(weight_tbl[c(1:5),])

Known variants dataset (known_variants)

The known_variants dataset contains a list of GWAS sentinel variants within the region. GWAS sentinel variants may come from GWAS Catalog or some other curated list (e.g. from the most recent and/or comprehensive GWAS for your trait of interest). This dataset is used to identify TWAS genes which have already been assigned to GWAS variants and/or identify within-cohort GWAS variants which have already been identified by previous GWAS studies.

The dataset known_variants requires the following variables:

Additional fields may be included in the dataset. Any additional fields will be printed in the data table reported in the application.

Here is an example of known_variants:

knitr::kable(known_variants[c(1:5),])

Known GWAS dataset (known_gwas)

The known_gwas dataset contains the set of variants that are included in both the analytic GWAS cohort and are known sentinel GWAS variants from other GWAS studies. The dataset includes a variant identifier along with the GWAS p-value for the analytic cohort and the locus number in which you wish to plot the variant.

The dataset known_gwas requires the following variables:

Additional fields (beyond those that are required) may also be included in the dataset. Any additional variables will be printed in the data table reported in the application.

Here is an example of known_gwas:

knitr::kable(known_gwas[c(1:5),])

Predicted expression correlation dataset (pred_exp_corr)

The predicted expression correlation dataset is an M X M matrix of M genes required for plotting, which contains pairwise correlations between each gene in the matrix. You may include all genes from your TWAS analysis; however, for computational efficiency, it is recommended that you only include those genes that you wish to plot (e.g. only the statistically significant genes and those previously known from GWAS). This correlation matrix must be in .Rdata format, and the name of the correlation matrix object must be saved as capital M.

Here is an example of the M correlation matrix object loaded by the pred_exp_corr parameter.

knitr::kable(M[1:5,1:5])

GWAS LD (ld_gwas)

The dataset containing pair-wise LD values among the predictive model variants at the locus is specified by the ld_gwas parameter. The dataset includes chromosome and genomic position for each variant, and each row in the table additionally contains the LD between the pair-wise variants and the locus in which the variants should be plotted. Chromosome and genomic position in this table should match the chromosome and genomic position of the model variants in the weight_tbl dataset. The variables chra and posa at each locus represent the chromosome and position for the index SNP at the locus.

The dataset ld_gwas requires the following variables:

Here is an example of ld_gwas:

knitr::kable(ld_gwas[1:5,])

Reference panel genes (db_genes)

The db_genes dataset contains a list of genes from the primary gene expression reference panel that were either not predicted in the analytic cohort or no gene expression model was fit for the gene. This data set is used to determine whether there are any genes from the reference panel which were not predicted in the analytic cohort or no gene expression model was fit for the gene, but one or more GWAS sentinel variants implicate the gene as potentially causal.

The dataset db_genes requires the following variables:

Here is an example of db_genes:

knitr::kable(db_genes[c(19:23),])

Ideogram plot (cytoband_ds)

A note on cytogenetic bands:

Each human chromosome has a short arm ("p" for "petit") and long arm ("q" for "queue"), separated by a centromere. The ends of the chromosome are called telomeres. Each chromosome arm is divided into regions, called cytogenetic bands, that can be seen using a microscope and special stains. The cytogenetic bands are labeled p1, p2, p3, q1, q2, q3, etc., counting from the centromere out toward the telomeres. At higher resolutions, sub-bands can be seen within the bands. The sub-bands are also numbered from the centromere out toward the telomere.

The dataset (cytoband_ds) used to identify the cytogenetic bands in IdeogramTrack() is typically downloaded directly from UCSC. For example, the example data used here was obtained from the UCSC genome annotation database for the Feb. 2009 assembly of the human genome (hg19, GRCh37 Genome Reference Consortium Human Reference 37 (GCA_000001405.1)), downloaded from http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/.

Here is an example of cytoband_ds used in this example:

colnames(cytoband_ds) <- c("chrom", "chromStart", "chromEnd", "name", "gieStain")
knitr::kable(cytoband_ds[1:5,])

Here is an example of the output from IdeogramTrack() which uses the cytogenetic band dataset (cytoband_ds):

genome_build = "hg19"
itrack = Gviz::IdeogramTrack(genome = genome_build, chromosome = "chr3", bands = cytoband_ds)
Gviz::plotTracks(c(itrack), from = 51744800, to=53804965, cex=1.5)

R shiny example function

An example of the R shiny application used on a real data analysis (i.e. results from TWAS and GWAS) is available in the R script R/example/example_script.R.

Publishing your Shiny app

R Shiny applications may be shared via several different formats. Shiny applications are fundamentally a web page, which subsequently allows them to be shared over the web. Essentially, in order for the application to be shared over the web, you must arrange to replace your own computer, which shares your app locally, with a web server that can share your app over the cloud.

Files must be saved in a standardized format and directory structure in order for any Shiny server to build your app. Once your app is appropriately packaged for a Shiny server, RStudio recommends starting by launching your app on shinyapps.io. Shinyapps.io is a service run by RStudio that allows you to share your apps online; the free server is built by RStudio and is easy to use, secure, and scalable to multiple users.

For additional details and recommendations about publishing your application online, see RStudio's R Shiny tutorial: https://vimeo.com/rstudioinc/review/131218530/212d8a5a7a/#t=25m49s

Frequently Asked Questions

Where can I learn more about getting started with R Shiny?

RStudio's R Shiny video tutorials provide a great beginner's guide to the R Shiny framework: https://shiny.rstudio.com/tutorial/. Links to more advanced Shiny topics are available from this web page as well.

What can I include in my header details?

The header_details parameter is a character string parameter which may include simple text characters and/or HTML formatting commands. For example, simple text characters may include the following:

header_details = "All genomic positions are from GRCh37. TWAS results for 10 traits are presented in the figures and tables below."

HTML commands for formatting may be included as in the following example:

header_details = "All genomic positions are from GRCh37.<p>TWAS results for 10 traits are presented in the figures and tables below. Traits and trait categories are listed and defined as follows:<p><ul><li>Platelet count (PLT)</li><li>Red blood cell indices (RBC): red blood cell count (RBC), hematocrit (HCT), hemoglobin (HGB), mean corpuscular volume (MCV), and red cell distribution width (RDW)</li><li>White blood cell indices (WBC): white blood cell count (WBC), monocyte (MONO), neutrophil (NEUTRO), and lymphocyte (LYMPH)</li></ul>"

What can I include in my method details?

Similar to header_details, the method_details parameter is a character string parameter which may include simple text characters and/or HTML formatting commands. Additionally, method_details may include R shiny functions as follows.

For example, you could save the following code in a script called methods.R:

shiny::tabPanel("Methods",
  shiny::h3(shiny::strong("Methods")),
  "A current draft of the manuscript describing these results and detailing the methods can be found here:",
  shiny::tags$a(href="https://docs.google.com/document/d/1TfqzBPGhL6TEqUuxN5lpy7cJVZcZhyiy/edit", "https://docs.google.com/document/d/1TfqzBPGhL6TEqUuxN5lpy7cJVZcZhyiy/edit",target="_blank"),
  ". All cohorts included in the analysis are described individually below. We analyze 10 hematological phenotypes (platelet count, red blood cell count, hematocrit, hemoglobin, mean corpuscular volume, red cell distribution width, white blood cell count, monocyte count, neutrophil count, and lymphocyte count) across all cohorts.",

  shiny::hr(),
  shiny::h4(shiny::strong("PrediXcan method:")),

  shiny::br()
)

Then load your methods.R script into your working script, using for example:

method <- source("methods.R")

Finally, use the method variable as the value of the method_details parameter within the LocusXcanR function, as in:

LocusXcanR(method_details = method)

For a more detailed example of this procedure, see the example_script.R and the methods.R scripts in R/example/.

Does the order of the column names in any of the datasets matter?

The order of the column names does not matter. However, the names of the columns in your data must match the column names (required or optional) listed for each dataset in the Data Pre-processing sections. Any additional variables included in your datasets which do not match those listed in the Data Pre-processing sections will also be printed in the data tables within the application.

Is the methods.R example file necessary for the shiny app?

methods.R isn't required for the app. If you don't give anything to the method_details parameter then it defaults to method_details="".

What is the difference between gwas_sentinel.txt and known_gwas.txt example files?

The gwas_sentinel.txt example data file contains the data that are loaded into the known_variants parameter. These data will represent your list of known GWAS sentinel variants from, for example, the latest GWAS analysis on your phenotype of interest. The known_gwas.txt example file contains the analytic cohort GWAS p-values that correspond to the known variants from the latest GWAS analysis. So the variants in the known_gwas.txt file match the variants in the gwas_sentinel.txt file (where there are matches), but the p-value in the known_gwas.txt file should come from GWAS in your analytic cohort.

Acknowledgements

Thank you to the TWAS working group in Dr. Yun Li's lab at the University of North Carolina at Chapel Hill. Special thanks go out to Bryce Rowland, Jia Wen, and Laura Raffield.



amanda-tapia/LocusXcanR documentation built on March 9, 2021, 7:36 p.m.