README.md

R build
status License:
GPL-3

¶ Authors: Brian Schilder, Jack Humphrey, Towfique Raj

¶ README updated: Jan-20-2023

echolocatoR: Automated statistical and functional fine-mapping

with extensive access to genome-wide datasets.

The echoverse

echolocatoR is part of the echoverse, a suite of R packages designed to facilitate different steps in genetic fine-mapping.

echolocatoR calls each of these other packages (i.e. “modules”) internally to create a unified pipeline. However, you can also use each module independently to create your own custom workflows.

echoverse dependency graph

Made with echodeps, yet another echoverse module. See here for the interactive version with package descriptions and links to each GitHub repo.

Citation

If you use echolocatoR, or any of the echoverse modules, please cite:

Brian M Schilder, Jack Humphrey, Towfique Raj (2021) echolocatoR: an automated end-to-end statistical and functional genomic fine-mapping pipeline, Bioinformatics; btab658, https://doi.org/10.1093/bioinformatics/btab658

Installation

if(!require("remotes")) install.packages("remotes")

remotes::install_github("RajLabMSSM/echolocatoR")
library(echolocatoR)

Installation troubleshooting

[Optional] Docker/Singularity

echolocatoR now has its own dedicated Docker/Singularity container! This greatly reduces issues related to system dependency conflicts and provides a containerized interface for Rstudio through your web browser. See here for installation instructions.

Documentation

Website

Get started

Bugs/requests

Please report any bugs/requests on GitHub Issues.

Contributions are welcome!

All echoverse vignettes

echoverse <- c('echolocatoR','echodata','echotabix',
               'echoannot','echoconda','echoLD',
               'echoplot','catalogueR','downloadR',
               'echofinemap','echodeps', # under construction
               'echogithub')
toc <- echogithub::github_pages_vignettes(owner = "RajLabMSSM",
                                          repo = echoverse,
                                          as_toc = TRUE,
                                          verbose = FALSE)

Introduction

Fine-mapping methods are a powerful means of identifying causal variants underlying a given phenotype, but are underutilized due to the technical challenges of implementation. echolocatoR is an R package that automates end-to-end genomics fine-mapping, annotation, and plotting in order to identify the most probable causal variants associated with a given phenotype.

It requires minimal input from users (a GWAS or QTL summary statistics file), and includes a suite of statistical and functional fine-mapping tools. It also includes extensive access to datasets (linkage disequilibrium panels, epigenomic and genome-wide annotations, QTL).

The elimination of data gathering and preprocessing steps enables rapid fine-mapping of many loci in any phenotype, complete with locus-specific publication-ready figure generation. All results are merged into a single per-SNP summary file for additional downstream analysis and results sharing. Therefore echolocatoR drastically reduces the barriers to identifying causal variants by making the entire fine-mapping pipeline rapid, robust and scalable.

Literature

For applications of echolocatoR in the literature, please see:

  1. E Navarro, E Udine, K de Paiva Lopes, M Parks, G Riboldi, BM Schilder…T Raj (2020) Dysregulation of mitochondrial and proteo-lysosomal genes in Parkinson’s disease myeloid cells. Nature Genetics. https://doi.org/10.1101/2020.07.20.212407
  2. BM Schilder, T Raj (2021) Fine-Mapping of Parkinson’s Disease Susceptibility Loci Identifies Putative Causal Variants. Human Molecular Genetics, ddab294, https://doi.org/10.1093/hmg/ddab294
  3. K de Paiva Lopes, G JL Snijders, J Humphrey, A Allan, M Sneeboer, E Navarro, BM Schilder…T Raj (2022) Genetic analysis of the human microglial transcriptome across brain regions, aging and disease pathologies. Nature Genetics, https://doi.org/10.1038/s41588-021-00976-y

echolocatoR v1.0 vs. v2.0

There have been a series of major updates between echolocatoR v1.0 and v2.0. Here are some of the most notable ones (see Details):

Output descriptions

By default, echolocatoR::finemap_loci() returns a nested list containing grouped by locus names (e.g. $BST1, $MEX3C). The results of each locus contain the following elements:

In addition, the following object summarizes the results from the locus-specific elements: - merged_dat: A merged data.table with all fine-mapping results from all loci.

Multi-finemap results files

The main output of echolocatoR are the multi-finemap files (for example, echodata::BST1). They are stored in the locus-specific Multi-finemap subfolders.

Column descriptions

Notes

Fine-mapping tools

Fine-mapping functions are now implemented via echofinemap:

fm_methods <- echofinemap::required_cols(add_versions = FALSE, 
                                         embed_links = TRUE,
                                         verbose = FALSE)
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
knitr::kable(x = fm_methods)

| method | required | suggested | source | citation | |:-----------------|:-----------|:-----------|:---------------------------------------------------|:----------------------------------------------------------| | ABF | SNP, CHR…. | | source | cite | | COJO_conditional | SNP, CHR…. | Freq, P, N | source | cite | | COJO_joint | SNP, CHR…. | Freq, P, N | source | cite | | COJO_stepwise | SNP, CHR…. | Freq, P, N | source | cite | | FINEMAP | SNP, CHR…. | A1, A2, …. | source | cite | | PAINTOR | SNP, CHR…. | MAF | source | cite | | POLYFUN_FINEMAP | SNP, CHR…. | MAF, N | source | cite | | POLYFUN_SUSIE | SNP, CHR…. | MAF, N | source | cite | | SUSIE | SNP, CHR…. | N | source | cite |

Datasets

Datasets are now stored/retrieved via the following echoverse subpackages: - echodata: Pre-computed fine-mapping results. Also handles the semi-automated standardization of summary statistics. - echoannot: Annotates GWAS/QTL summary statistics using epigenomics, pre-compiled annotation matrices, and machine learning model predictions of variant-specific functional impacts. - catalogueR: Large compendium of fully standardized e/s/t-QTL summary statistics.

For more detailed information about each dataset, use ?:

### Examples ###

library(echoannot)   
?NOTT_2019.interactome # epigenomic annotations
library(echodata) 
?BST1 # fine-mapping results 

MungeSumstats:

catalogueR: QTLs

eQTL Catalogue: catalogueR::eQTL_Catalogue.query()

echodata: fine-mapping results

echolocatoR Fine-mapping Portal: pre-computed fine-mapping results

echoannot: Epigenomic & genome-wide annotations

Nott et al. (2019): echoannot::NOTT2019_*()

Corces et al.2020: echoannot::CORCES2020_*()

XGR: echoannot::XGR_download_and_standardize()

Roadmap: echoannot::ROADMAP_query()

biomaRt: echoannot::annotate_snps()

HaploR: echoannot::annotate_snps()

Enrichment tools

Annotation enrichment functions are now implemented via echoannot:

Implemented

XGR: echoannot::XGR_enrichment()

motifbreakR: echoannot::MOTIFBREAKR()

regioneR: echoannot::test_enrichment()

Under construction

GARFIELD

GoShifter

S-LDSC

LD reference panels

LD reference panels are now queried/processed by echoLD, specifically the function get_LD():

UK Biobank

1000 Genomes Phase 1

1000 Genomes Phase 3

Custom LD panel:

Custom LD panel

Plotting

Plotting functions are now implemented via: - echoplot: Multi-track locus plots with GWAS, fine-mapping results, and functional annotations (plot_locus()). Can also plot multi-GWAS/QTL and multi-ancestry results (plot_locus_multi()). - echoannot: Study-level summary plots showing aggregted info across many loci at once (super_summary_plot()). - echoLD: Plot an LD matrix using one of several differnt plotting methods (plot_LD()).

Tabix queries

All queries of tabix-indexed files (for rapid data subset extraction) are implemented via echotabix.

Downloads

Single- and multi-threaded downloads are now implemented via downloadR.

Developer

Brian M. Schilder, Bioinformatician II Raj Lab Department of Neuroscience, Icahn School of Medicine at Mount Sinai

Session info

utils::sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur ... 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##   [1] utf8_1.2.2                  reticulate_1.27            
##   [3] R.utils_2.12.2              tidyselect_1.2.0           
##   [5] RSQLite_2.2.20              AnnotationDbi_1.60.0       
##   [7] htmlwidgets_1.6.1           grid_4.2.1                 
##   [9] BiocParallel_1.32.5         echogithub_0.99.1          
##  [11] XGR_1.1.8                   munsell_0.5.0              
##  [13] codetools_0.2-18            interp_1.1-3               
##  [15] DT_0.27                     colorspace_2.0-3           
##  [17] OrganismDbi_1.40.0          Biobase_2.58.0             
##  [19] filelock_1.0.2              highr_0.10                 
##  [21] knitr_1.41                  supraHex_1.36.0            
##  [23] rstudioapi_0.14             stats4_4.2.1               
##  [25] DescTools_0.99.47           gitcreds_0.1.2             
##  [27] MatrixGenerics_1.10.0       GenomeInfoDbData_1.2.9     
##  [29] mixsqp_0.3-48               bit64_4.0.5                
##  [31] echoconda_0.99.9            rprojroot_2.0.3            
##  [33] basilisk_1.10.2             vctrs_0.5.1                
##  [35] generics_0.1.3              xfun_0.36                  
##  [37] biovizBase_1.46.0           BiocFileCache_2.6.0        
##  [39] R6_2.5.1                    GenomeInfoDb_1.34.6        
##  [41] AnnotationFilter_1.22.0     bitops_1.0-7               
##  [43] cachem_1.0.6                reshape_0.8.9              
##  [45] DelayedArray_0.24.0         assertthat_0.2.1           
##  [47] BiocIO_1.8.0                scales_1.2.1               
##  [49] nnet_7.3-18                 rootSolve_1.8.2.3          
##  [51] gtable_0.3.1                ggbio_1.46.0               
##  [53] lmom_2.9                    ensembldb_2.22.0           
##  [55] rlang_1.0.6                 echodata_0.99.16           
##  [57] splines_4.2.1               lazyeval_0.2.2             
##  [59] rtracklayer_1.58.0          dichromat_2.0-0.1          
##  [61] hexbin_1.28.2               checkmate_2.1.0            
##  [63] reshape2_1.4.4              BiocManager_1.30.19        
##  [65] yaml_2.3.6                  backports_1.4.1            
##  [67] snpStats_1.48.0             GenomicFeatures_1.50.3     
##  [69] ggnetwork_0.5.10            Hmisc_4.7-2                
##  [71] RBGL_1.74.0                 tools_4.2.1                
##  [73] ggplot2_3.4.0               ellipsis_0.3.2             
##  [75] RColorBrewer_1.1-3          proxy_0.4-27               
##  [77] BiocGenerics_0.44.0         coloc_5.1.0.1              
##  [79] Rcpp_1.0.9                  plyr_1.8.8                 
##  [81] base64enc_0.1-3             progress_1.2.2             
##  [83] zlibbioc_1.44.0             purrr_1.0.1                
##  [85] RCurl_1.98-1.9              basilisk.utils_1.10.0      
##  [87] prettyunits_1.1.1           rpart_4.1.19               
##  [89] deldir_1.0-6                viridis_0.6.2              
##  [91] S4Vectors_0.36.1            cluster_2.1.4              
##  [93] SummarizedExperiment_1.28.0 ggrepel_0.9.2              
##  [95] fs_1.5.2                    here_1.0.1                 
##  [97] crul_1.3                    magrittr_2.0.3             
##  [99] data.table_1.14.6           echotabix_0.99.8           
## [101] dnet_1.1.7                  openxlsx_4.2.5.1           
## [103] gh_1.3.1                    mvtnorm_1.1-3              
## [105] ProtGenerics_1.30.0         matrixStats_0.63.0         
## [107] patchwork_1.1.2             hms_1.1.2                  
## [109] evaluate_0.20               rworkflows_0.99.5          
## [111] XML_3.99-0.13               jpeg_0.1-10                
## [113] readxl_1.4.1                IRanges_2.32.0             
## [115] gridExtra_2.3               testthat_3.1.6             
## [117] compiler_4.2.1              biomaRt_2.54.0             
## [119] tibble_3.1.8                crayon_1.5.2               
## [121] R.oo_1.25.0                 htmltools_0.5.4            
## [123] echoannot_0.99.10           tzdb_0.3.0                 
## [125] Formula_1.2-4               tidyr_1.2.1                
## [127] expm_0.999-7                Exact_3.2                  
## [129] DBI_1.1.3                   dbplyr_2.3.0               
## [131] MASS_7.3-58.1               rappdirs_0.3.3             
## [133] boot_1.3-28.1               dlstats_0.1.6              
## [135] Matrix_1.5-3                badger_0.2.2               
## [137] readr_2.1.3                 piggyback_0.1.4            
## [139] brio_1.1.3                  cli_3.6.0                  
## [141] R.methodsS3_1.8.2           parallel_4.2.1             
## [143] echofinemap_0.99.4          igraph_1.3.5               
## [145] GenomicRanges_1.50.2        pkgconfig_2.0.3            
## [147] rvcheck_0.2.1               GenomicAlignments_1.34.0   
## [149] dir.expiry_1.6.0            RCircos_1.2.2              
## [151] foreign_0.8-84              osfr_0.2.9                 
## [153] xml2_1.3.3                  XVector_0.38.0             
## [155] yulab.utils_0.0.6           echoLD_0.99.8              
## [157] stringr_1.5.0               VariantAnnotation_1.44.0   
## [159] digest_0.6.31               graph_1.76.0               
## [161] httpcode_0.3.0              Biostrings_2.66.0          
## [163] rmarkdown_2.19              cellranger_1.1.0           
## [165] htmlTable_2.4.1             gld_2.6.6                  
## [167] restfulr_0.0.15             curl_5.0.0                 
## [169] Rsamtools_2.14.0            rjson_0.2.21               
## [171] lifecycle_1.0.3             nlme_3.1-161               
## [173] jsonlite_1.8.4              desc_1.4.2                 
## [175] viridisLite_0.4.1           BSgenome_1.66.2            
## [177] fansi_1.0.3                 downloadR_0.99.5           
## [179] pillar_1.8.1                susieR_0.12.27             
## [181] GGally_2.1.2                lattice_0.20-45            
## [183] KEGGREST_1.38.0             fastmap_1.1.0              
## [185] httr_1.4.4                  survival_3.5-0             
## [187] glue_1.6.2                  zip_2.2.2                  
## [189] png_0.1-8                   bit_4.0.5                  
## [191] Rgraphviz_2.42.0            class_7.3-20               
## [193] stringi_1.7.12              blob_1.2.3                 
## [195] latticeExtra_0.6-30         memoise_2.0.1              
## [197] dplyr_1.0.10                irlba_2.3.5.1              
## [199] e1071_1.7-12                ape_5.6-2



RajLabMSSM/echolocatoR documentation built on Jan. 29, 2023, 6:04 a.m.