In amanda-tapia/LocusXcanR: What The Package Does (one line, title case required)

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(LocusXcanR)

Introduction

The interpretation of TWAS results is not always straightforward and can be quite challenging, especially for large-scale analyses, such as the Genetic Epidemiology on Adult Health and Aging (GERA) cohort (N ~ 55,000) and UK Biobank (N ~ 500,000). Care must be taken to interpret TWAS results in context (e.g. alongside information known from other 'omics studies such as GWAS), understand how multiple TWAS signals at a given locus may be affected by correlated gene expression and/or the sharing of model variants, and evaluate how fine-mapping results either clarify the most likely causal genes, or may be confounded by some of the same challenges as the original TWAS analysis.

Thus, in order to facilitate the interpretation of TWAS findings, we present an R package which creates an R Shiny application designed to interactively visualize TWAS results. We discuss, in detail, the features of the R package and R Shiny application and show how it can be used to quickly interpret very complex TWAS loci in the context of prior knowledge from GWAS.

dir <- system.file("extdata", package="LocusXcanR")

twas_ds <- read.table(file.path(dir, "twas_ds.txt"), header=T, stringsAsFactors = F)

ld_gwas <- read.table(file.path(dir, "ld_gwas.txt"), header=T, stringsAsFactors = F)

load(file.path(dir,"pred_exp_corr.Rda"))

db_genes <- read.table(file.path(dir, "db_genes.txt"), header=T, stringsAsFactors = F)

all_gwas <- read.table(file.path(dir, "all_gwas.txt"), header=T, stringsAsFactors = F)

weight_tbl <- read.table(file.path(dir, "weight_tbl.txt"), header=T, stringsAsFactors = F)

known_variants <- read.table(file.path(dir, "gwas_sentinel.txt"), header=T, stringsAsFactors = F)

known_gwas <- read.table(file.path(dir, "known_gwas.txt"), header=T, stringsAsFactors = F)

cytoband_ds <- read.table(file.path(dir, "cytoBandIdeo.txt.gz"), header = F, sep = "\t")

LocusXcanR Function

The LocusXcanR function takes several datasets as input, including TWAS results, GWAS results, gene expression reference panel weights, GWAS LD, and predicted expression correlation. These results are then integrated into a single HTML webpage via R Shiny, and the interactive visualizations and data tables can then be used to facilitate TWAS results interpretation.

LocusXcanR(twas_result, weight_tbl, study_name = "", pred_exp_corr, conditional_present = FALSE, multiple_tissues = FALSE, known_variants, known_gwas, db_genes, all_gwas, ld_gwas, ref_expr_name = "", head_details = "", method_details = "", primary_tissue, meta_present = FALSE, ideogram_present = FALSE, genome_build, cytoband_ds, add_UI = "", add_server = "")

Required parameters

The LocusXcanR function contains the following required parameters and their definitions:

twas_result: character, file path to TWAS results (required). See TWAS results dataset section below for more details.

pvalthresh: numeric, -log10 p-value threshold for TWAS results.

weight_tbl: character, file path to TWAS weights database (required). See Weights table section below for more details.

known_variants: character, file path to known GWAS variants (required). See Known variants dataset section below for more details.

pred_exp_corr: character, file path to predicted expression correlation (required). See Predicted expression correlation dataset section for more details.

known_gwas: character, file path to study GWAS data matching known variants (required). See Known GWAS dataset section below for more details.

db_genes: character, file path to a list of genes in the database (required). See Reference panel genes for more details.

all_gwas: character, file path to study GWAS data (required). See GWAS results dataset for more details.

ld_gwas: character, file path to the LD among study variants or an LD reference panel (required). See LD GWAS section for more details.

Optional parameters

The LocusXcanR function contains the following optional parameters and their definitions:

study_name: character, the name of the study/cohort (optional, default is missing). For example study_name = "Genetic Epidemiology on Adult Health and Aging" or study_name = "GERA".

conditional_present: logical, TRUE if conditional analysis results are available for plotting, FALSE otherwise (default is FALSE). See TWAS results dataset section for more details on including conditional analysis results.

multiple_tissues: logical, TRUE if TWAS results are available from more than one tissue, FALSE if TWAS results are only available from a single tissue or a multi-tissue analysis (default is FALSE). This parameter can be used to compare, for example, results from a primary tissue gene expression reference panel (e.g. Depression Genes and Networks whole blood gene expression) to one or more secondary tissue gene expression reference panel(s) (e.g. Genotype-tissue Expression Project whole blood gene expression), when multiple_tissues = TRUE. See TWAS results dataset section for more details on including secondary tissue gene expression reference panels. This functionality is not currently available. The application only displays results for a single-tissue TWAS analysis or a multi-tissue analysis, but cannot be used to compare TWAS results from multiple tissues at this time.

ref_expr_name: character, name of the reference expression data set used in the analysis (optional, default is missing). For example ref_expr_name = "Depression Genes and Networks whole blood" or ref_expr_name = "DGN".

head_details: character, any additional header details to be included in the app (optional, default is missing). HTML formatting commands may be used. See Frequently Asked Questions section for header_details examples.

method_details: character, detailed methods section (optional, default is missing). HTML formatting commands may be used. See Frequently Asked Questions for method_details examples.

primary_tissue: character, if multiple tissues are available for analysis, list the name of the primary tissue (required if multiple_tissues = TRUE. For example, if you have results to compare from DGN and GTEx, and DGN is your primary tissue reference panel, then primary_tissue = "DGN". The character string given to primary_tissue must match a tissue name in the dataset loaded by twas_result. See TWAS results dataset section for more details.

meta_present: logical, TRUE if results from TWAS meta-analysis are present for comparison, FALSE otherwise (optional, default is FALSE).

meta_thresh: numeric, p-value threshold for meta-analysis results (required if meta_present = TRUE).
ideogram_present logical, TRUE if an IdeogramTrack should be plotted, FALSE otherwise (optional, default is FALSE)
genome_build: character, the genome build for the IdeogramTrack data (required if ideogram_present = TRUE). For example, genome_build = "hg19" for Genome Reference Consortium Human Build 37 (GRCh37).

cytoband_ds: character, file path to the cytogenetic bands data set (required if ideogram_present = TRUE). See Ideogram plot section for more details on including an IdeogramTrack in the app.

add_UI: character, additional elements to add to UI (optional, default is missing). This parameter can be used to include your own additional items at the end of the UI.

add_server: character, additional elements to add to server (optional, default is missing). This parameter can be used to include your own additional items in the server.

Data Pre-processing

TWAS results dataset (`twas_ds`)

The TWAS results dataset includes, at the very minimum, TWAS results (in the form of a p-value) from a primary TWAS analysis. P-values from a primary TWAS analysis may come from a single-tissue TWAS analysis, a multi-tissue TWAS analysis, or a TWAS meta-analysis. The TWAS results dataset may optionally include results from a TWAS meta-analysis (e.g. for replication), TWAS conditional analysis, and/or TWAS results from secondary tissue gene expression reference panels for comparison.

The dataset twas_ds requires the following variables:

tissue: character, name (or an abbreviated name) of the tissue reference panel

locus: integer, value indicating the locus number

locstart: integer, value indicating the genomic start position of the locus

locstop: integer, value indicating the genomic stop position of the locus

chr: integer, chromosome number

index: character, the index (most significant) gene within the locus

pheno: character, the name of the phenotype being analyzed (may be one or more phenotypes)

genestart: integer, value indicating the genomic start position of the gene (may be the gene start position or transcription start site or whatever starting position you would like to plot for the gene)

genestop: integer, value indicating the genomic stop position of the gene (may be the gene stop position or transcription stop site or whatever stopping position you would like to plot for the gene)

p: numeric, p-value of the TWAS association test between phenotype and predicted gene expression

genename: character, the name of the gene (may be the gene name or the ENSMBL ID or any other name you use to identify the gene)

SignifGene: boolean, indicator of whether the gene-trait association is statistically significant (1 if significant, 0 otherwise)

Optional variables may also be included in the twas_ds dataset. These include the following:

p_meta: numeric, p-value from a TWAS meta-analysis used for replication, if available (required if meta_present = TRUE)

p_conditional: numeric, p-value from a TWAS conditional analysis, if available (required if conditional_present = TRUE

Additional variables (beyond those that are required or optional) may also be included in the twas_ds dataset. Any additional variables will be ignored when plotting figures but will be printed in data tables within the application.

Here is an example of twas_ds:

knitr::kable(twas_ds[79:83,])

GWAS results dataset (`all_gwas`)

The GWAS results dataset all_gwas contains all of the analytic cohort GWAS p-values that you wish to plot. It is recommended that you subset the GWAS results dataset based on a p-value threshold (e.g. only include variants with GWAS p-value < 0.05). The dataset contains variant information, the GWAS p-value, and the locus in which the variant needs to be plotted.

The dataset all_gwas requires the following variables:

`SNP`: character, indicating chromosome : position : allele1 : allele2
`PVAL`: numeric, the GWAS p-value for the variant in the analytic cohort
`LOCUS`: numeric, the locus number in which you wish to plot the variant

Here is an example of all_gwas:

knitr::kable(all_gwas[c(1:5),])

Weights table (`weight_tbl`)

The weight_tbl dataset contains a set of genes from the primary gene expression reference panel along with their corresponding predictive model variant weights. This data set is used to connect TWAS genes to their respective GWAS model variants and report the weight and direction of the effect.

The dataset weight_tbl requires the following variables:

`rsid`: character, the RSID of the model variant
`genename`: character, the name of the gene
`weight`: numeric, the weight corresponding to the effect of the model variant on the gene
`ref_allele`: character, the reference allele
`eff_allele`: character, the effect allele
`chr`: numeric, chromosome number
`position`: numeric, genomic position of the model variant

Here is an example of weight_tbl:

knitr::kable(weight_tbl[c(1:5),])

Known variants dataset (`known_variants`)

The known_variants dataset contains a list of GWAS sentinel variants within the region. GWAS sentinel variants may come from GWAS Catalog or some other curated list (e.g. from the most recent and/or comprehensive GWAS for your trait of interest). This dataset is used to identify TWAS genes which have already been assigned to GWAS variants and/or identify within-cohort GWAS variants which have already been identified by previous GWAS studies.

The dataset known_variants requires the following variables:

`chr`: numeric, the chromosome number for the sentinel variant
`position`: numeric, the genomic position of the sentinel variant
`genename`: character, the name of the gene that has been assigned by the sentinel variant
`incohort`: boolean, indicator identifying whether the sentinel variant is also well-imputed in the analytic cohort (0 = sentinel variant is not in the analytic cohort, 1 = sentinel variant is in the analytic cohort)
`effectallele`: character, the effect allele of the sentinel variant
`otherallele`: character, the alternate allele of the sentinel variant

Additional fields may be included in the dataset. Any additional fields will be printed in the data table reported in the application.

Here is an example of known_variants:

knitr::kable(known_variants[c(1:5),])

Known GWAS dataset (`known_gwas`)

The known_gwas dataset contains the set of variants that are included in both the analytic GWAS cohort and are known sentinel GWAS variants from other GWAS studies. The dataset includes a variant identifier along with the GWAS p-value for the analytic cohort and the locus number in which you wish to plot the variant.

The dataset known_gwas requires the following variables:

`SNP`: character, an identifier for the GWAS variant in the analytic cohort given by chr : pos : allele1 : allele2
`PVAL`: numeric, the GWAS p-value for the variant in the analytic cohort
`Locus`: numeric, the locus number in which you wish to plot the variant

Additional fields (beyond those that are required) may also be included in the dataset. Any additional variables will be printed in the data table reported in the application.

Here is an example of known_gwas:

knitr::kable(known_gwas[c(1:5),])

Predicted expression correlation dataset (`pred_exp_corr`)

The predicted expression correlation dataset is an M X M matrix of M genes required for plotting, which contains pairwise correlations between each gene in the matrix. You may include all genes from your TWAS analysis; however, for computational efficiency, it is recommended that you only include those genes that you wish to plot (e.g. only the statistically significant genes and those previously known from GWAS). This correlation matrix must be in .Rdata format, and the name of the correlation matrix object must be saved as capital M.

Here is an example of the M correlation matrix object loaded by the pred_exp_corr parameter.

knitr::kable(M[1:5,1:5])

GWAS LD (`ld_gwas`)

The dataset containing pair-wise LD values among the predictive model variants at the locus is specified by the ld_gwas parameter. The dataset includes chromosome and genomic position for each variant, and each row in the table additionally contains the LD between the pair-wise variants and the locus in which the variants should be plotted. Chromosome and genomic position in this table should match the chromosome and genomic position of the model variants in the weight_tbl dataset. The variables chra and posa at each locus represent the chromosome and position for the index SNP at the locus.

The dataset ld_gwas requires the following variables:

chra: numeric, the chromosome number for variant (a). This is for the index variant at the locus.

posa: numeric, the genomic position for variant (a). This is for the index variant at the locus.

chrb: numeric, the chromosome number for variant (b)

posb: numeric, the genomic position for variant (b)

corrab: numeric, LD between variant (a) and variant (b)

locus: numeric, the locus number in which these variants will be plotted

Here is an example of ld_gwas:

knitr::kable(ld_gwas[1:5,])

Reference panel genes (`db_genes`)

The db_genes dataset contains a list of genes from the primary gene expression reference panel that were either not predicted in the analytic cohort or no gene expression model was fit for the gene. This data set is used to determine whether there are any genes from the reference panel which were not predicted in the analytic cohort or no gene expression model was fit for the gene, but one or more GWAS sentinel variants implicate the gene as potentially causal.

The dataset db_genes requires the following variables:

`genename`: character, the name of the gene
`genestatus`: boolean, status indicator for the gene. 0 = gene was not predicted in the analytic cohort, 1 = no gene expression model was fit for the gene

Here is an example of db_genes:

knitr::kable(db_genes[c(19:23),])

Ideogram plot (`cytoband_ds`)

A note on cytogenetic bands:

Each human chromosome has a short arm ("p" for "petit") and long arm ("q" for "queue"), separated by a centromere. The ends of the chromosome are called telomeres. Each chromosome arm is divided into regions, called cytogenetic bands, that can be seen using a microscope and special stains. The cytogenetic bands are labeled p1, p2, p3, q1, q2, q3, etc., counting from the centromere out toward the telomeres. At higher resolutions, sub-bands can be seen within the bands. The sub-bands are also numbered from the centromere out toward the telomere.

The dataset (cytoband_ds) used to identify the cytogenetic bands in IdeogramTrack() is typically downloaded directly from UCSC. For example, the example data used here was obtained from the UCSC genome annotation database for the Feb. 2009 assembly of the human genome (hg19, GRCh37 Genome Reference Consortium Human Reference 37 (GCA_000001405.1)), downloaded from http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/.

Here is an example of cytoband_ds used in this example:

colnames(cytoband_ds) <- c("chrom", "chromStart", "chromEnd", "name", "gieStain")
knitr::kable(cytoband_ds[1:5,])

Here is an example of the output from IdeogramTrack() which uses the cytogenetic band dataset (cytoband_ds):

genome_build = "hg19"
itrack = Gviz::IdeogramTrack(genome = genome_build, chromosome = "chr3", bands = cytoband_ds)
Gviz::plotTracks(c(itrack), from = 51744800, to=53804965, cex=1.5)

R shiny example function

An example of the R shiny application used on a real data analysis (i.e. results from TWAS and GWAS) is available in the R script R/example/example_script.R.

Publishing your Shiny app

R Shiny applications may be shared via several different formats. Shiny applications are fundamentally a web page, which subsequently allows them to be shared over the web. Essentially, in order for the application to be shared over the web, you must arrange to replace your own computer, which shares your app locally, with a web server that can share your app over the cloud.

Files must be saved in a standardized format and directory structure in order for any Shiny server to build your app. Once your app is appropriately packaged for a Shiny server, RStudio recommends starting by launching your app on shinyapps.io. Shinyapps.io is a service run by RStudio that allows you to share your apps online; the free server is built by RStudio and is easy to use, secure, and scalable to multiple users.

For additional details and recommendations about publishing your application online, see RStudio's R Shiny tutorial: https://vimeo.com/rstudioinc/review/131218530/212d8a5a7a/#t=25m49s

Frequently Asked Questions

Where can I learn more about getting started with R Shiny?

RStudio's R Shiny video tutorials provide a great beginner's guide to the R Shiny framework: https://shiny.rstudio.com/tutorial/. Links to more advanced Shiny topics are available from this web page as well.

What can I include in my header details?

The header_details parameter is a character string parameter which may include simple text characters and/or HTML formatting commands. For example, simple text characters may include the following:

header_details = "All genomic positions are from GRCh37. TWAS results for 10 traits are presented in the figures and tables below."

HTML commands for formatting may be included as in the following example:

header_details = "All genomic positions are from GRCh37.<p>TWAS results for 10 traits are presented in the figures and tables below. Traits and trait categories are listed and defined as follows:<p><ul><li>Platelet count (PLT)</li><li>Red blood cell indices (RBC): red blood cell count (RBC), hematocrit (HCT), hemoglobin (HGB), mean corpuscular volume (MCV), and red cell distribution width (RDW)</li><li>White blood cell indices (WBC): white blood cell count (WBC), monocyte (MONO), neutrophil (NEUTRO), and lymphocyte (LYMPH)</li></ul>"

What can I include in my method details?

Similar to header_details, the method_details parameter is a character string parameter which may include simple text characters and/or HTML formatting commands. Additionally, method_details may include R shiny functions as follows.

For example, you could save the following code in a script called methods.R:

shiny::tabPanel("Methods",
  shiny::h3(shiny::strong("Methods")),
  "A current draft of the manuscript describing these results and detailing the methods can be found here:",
  shiny::tags$a(href="https://docs.google.com/document/d/1TfqzBPGhL6TEqUuxN5lpy7cJVZcZhyiy/edit", "https://docs.google.com/document/d/1TfqzBPGhL6TEqUuxN5lpy7cJVZcZhyiy/edit",target="_blank"),
  ". All cohorts included in the analysis are described individually below. We analyze 10 hematological phenotypes (platelet count, red blood cell count, hematocrit, hemoglobin, mean corpuscular volume, red cell distribution width, white blood cell count, monocyte count, neutrophil count, and lymphocyte count) across all cohorts.",

  shiny::hr(),
  shiny::h4(shiny::strong("PrediXcan method:")),

  shiny::br()
)

Then load your methods.R script into your working script, using for example:

method <- source("methods.R")

Finally, use the method variable as the value of the method_details parameter within the LocusXcanR function, as in:

LocusXcanR(method_details = method)

For a more detailed example of this procedure, see the example_script.R and the methods.R scripts in R/example/.

Does the order of the column names in any of the datasets matter?

The order of the column names does not matter. However, the names of the columns in your data must match the column names (required or optional) listed for each dataset in the Data Pre-processing sections. Any additional variables included in your datasets which do not match those listed in the Data Pre-processing sections will also be printed in the data tables within the application.

Is the `methods.R` example file necessary for the shiny app?

methods.R isn't required for the app. If you don't give anything to the method_details parameter then it defaults to method_details="".

What is the difference between `gwas_sentinel.txt` and `known_gwas.txt` example files?

The gwas_sentinel.txt example data file contains the data that are loaded into the known_variants parameter. These data will represent your list of known GWAS sentinel variants from, for example, the latest GWAS analysis on your phenotype of interest. The known_gwas.txt example file contains the analytic cohort GWAS p-values that correspond to the known variants from the latest GWAS analysis. So the variants in the known_gwas.txt file match the variants in the gwas_sentinel.txt file (where there are matches), but the p-value in the known_gwas.txt file should come from GWAS in your analytic cohort.

Acknowledgements

Thank you to the TWAS working group in Dr. Yun Li's lab at the University of North Carolina at Chapel Hill. Special thanks go out to Bryce Rowland, Jia Wen, and Laura Raffield.

amanda-tapia/LocusXcanR documentation built on March 9, 2021, 7:36 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

amanda-tapia/LocusXcanR
What The Package Does (one line, title case required)

In amanda-tapia/LocusXcanR: What The Package Does (one line, title case required)

Introduction

LocusXcanR Function

Required parameters

Optional parameters

Data Pre-processing

TWAS results dataset (`twas_ds`)

GWAS results dataset (`all_gwas`)

Weights table (`weight_tbl`)

Known variants dataset (`known_variants`)

Known GWAS dataset (`known_gwas`)

Predicted expression correlation dataset (`pred_exp_corr`)

GWAS LD (`ld_gwas`)

Reference panel genes (`db_genes`)

Ideogram plot (`cytoband_ds`)

R shiny example function

Publishing your Shiny app

Frequently Asked Questions

Where can I learn more about getting started with R Shiny?

What can I include in my header details?

What can I include in my method details?

Does the order of the column names in any of the datasets matter?

Is the `methods.R` example file necessary for the shiny app?

What is the difference between `gwas_sentinel.txt` and `known_gwas.txt` example files?

Acknowledgements

R Package Documentation

Browse R Packages

We want your feedback!

amanda-tapia/LocusXcanR What The Package Does (one line, title case required)

In amanda-tapia/LocusXcanR: What The Package Does (one line, title case required)

Introduction

LocusXcanR Function

Required parameters

Optional parameters

Data Pre-processing

TWAS results dataset (twas_ds)

GWAS results dataset (all_gwas)

Weights table (weight_tbl)

Known variants dataset (known_variants)

Known GWAS dataset (known_gwas)

Predicted expression correlation dataset (pred_exp_corr)

GWAS LD (ld_gwas)

Reference panel genes (db_genes)

Ideogram plot (cytoband_ds)

R shiny example function

Publishing your Shiny app

Frequently Asked Questions

Where can I learn more about getting started with R Shiny?

What can I include in my header details?

What can I include in my method details?

Does the order of the column names in any of the datasets matter?

Is the methods.R example file necessary for the shiny app?

What is the difference between gwas_sentinel.txt and known_gwas.txt example files?

Acknowledgements

R Package Documentation

Browse R Packages

We want your feedback!

amanda-tapia/LocusXcanR
What The Package Does (one line, title case required)

TWAS results dataset (`twas_ds`)

GWAS results dataset (`all_gwas`)

Weights table (`weight_tbl`)

Known variants dataset (`known_variants`)

Known GWAS dataset (`known_gwas`)

Predicted expression correlation dataset (`pred_exp_corr`)

GWAS LD (`ld_gwas`)

Reference panel genes (`db_genes`)

Ideogram plot (`cytoband_ds`)

Is the `methods.R` example file necessary for the shiny app?

What is the difference between `gwas_sentinel.txt` and `known_gwas.txt` example files?