knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
seqNdisplayR aims to provide reproducible, visually appealling genome coverage data visualization including in-built normalization and batch correction features. This vignette describes the basic features of seqNdisplayR using tracks from Lykke-Andersen et al. Mol Cell 2021.
Locate the source tar.gz or the uncompressed folder containing the source and then install from within R. You need to either set the working directory to the same location as the tar file or add the full path.
Install from tar.gz file:
install.packages('XXXXXXXXXXXXX.tar.gz', repos = NULL, type="source") #to update when package is finished
Install from source directory:
install.packages('<path>/XXXXXXXXXXXXX', repos = NULL, type="source") #to update when package is finished
Alternative is to install from Terminal (see comment about path from above): ```{bash, eval=FALSE} R CMD INSTALL XXXXXXXXXXXXXXXXXXXX.tar.gz #to update when package is finished
Using RStudio: Open Project and select the source directory. Build > Install & Restart. There are options on how to build in Build > Configure Build Tools ... The package is also available on GitHub at \link{https://rdrr.io/github/THJlab/seqNdisplayR/}. and can be installed with remotes \link{https://CRAN.R-project.org/package=remotes}. ```r install.packages("remotes") remotes::install_github("THJlab/seqNdisplayR")
seqNdisplayR basic usage assumes that sequence coverage tracks to be displayed are grouped, often containing nested subgroups.
For plotting this groups are representet in object samples which contains nested lists of sample names.
samples = list("TT-seq" = c('CTRL', 'CPSF73', 'INTS11', 'RRP40'), "RNA-seq" = c('CTRL', 'CPSF73', 'INTS11', 'RRP40'), "3'-seq" = list('total' = list('noPAP' = c('WT','RRP40'), 'EPAP' = c('WT','RRP40')), '4sU' = list('noPAP' = c('WT','RRP40'), 'EPAP' = c('WT','RRP40'))), 'ChIP-seq'=c('INTS11'))
Colors each coverage track is displayed in are in a similar nested list, except that deepest list is a named vector.
colors = list("TT-seq" = c('CTRL'='#5E5760', 'CPSF73'='#C7164C', 'INTS11'='#1989B6', 'RRP40'='#E2856A'), "RNA-seq" = c('CTRL'='#5E5760', 'CPSF73'='#C7164C', 'INTS11'='#1989B6', 'RRP40'='#E2856A'), "3'-seq" = list('total' = list('noPAP' = c('WT'='#5D5D5D', 'RRP40'='#6061A9'), 'EPAP' = c('WT'='#5D5D5D', 'RRP40'='#6061A9')), '4sU' = list('noPAP' = c('WT'='#5D5D5D', 'RRP40'='#6061A9'), 'EPAP' = c('WT'='#5D5D5D', 'RRP40'='#6061A9'))), 'ChIP-seq'=c('INTS11'='#000000'))
Common directories for each outer level of sample groups
bigwig_dirs = c("TT-seq" = 'http://mbg-ftp-ro:MBG-F-RO-17461@genome-ftp.mbg.au.dk/files/THJ/NGS/Human/THJ/2017_Molska_TT-seq_RNAseq/TT-seq/hg38/', "RNA-seq" = 'http://mbg-ftp-ro:MBG-F-RO-17461@genome-ftp.mbg.au.dk/files/THJ/NGS/Human/THJ/2017_Molska_TT-seq_RNAseq/RNA-seq/hg38/', "3'-seq" = 'http://mbg-ftp-ro:MBG-F-RO-17461@genome-ftp.mbg.au.dk/files/THJ/NGS/Human/THJ/2019_Wu_3pseq_NEXTPAXT/hg38/', "ChIP-seq" = 'http://mbg-ftp-ro:MBG-F-RO-17461@genome-ftp.mbg.au.dk/files/THJ/NGS/Human/GEO/GSE125534_Beckedorff_(Shiekhattar)_2020_nuclear-ncRNA-seq_RNA-seq_CAGE-seq_ChIP-seq_ATAC-seq/ChIP-seq/hg38/')
Individual track files in bigwig format. Multiple files are allowed per sample, these are treated internally as replicates from the same sample. Bigwigs are separated by strand, for unstranded data simply omit the '-' entry ########### Has this last point been changed?
bigwigs = list( '+' = list( "TT-seq" = list( 'CTRL' = c( 'L_EGFP_rep1_tt_corr_ff_noJncReads_plus.bw', 'L_EGFP_rep2_tt_corr_ff_noJncReads_plus.bw' ), 'CPSF73' = c( 'L_CPSF73_rep1_tt_corr_ff_noJncReads_plus.bw', 'L_CPSF73_rep2_tt_corr_ff_noJncReads_plus.bw' ), 'INTS11' = c( 'L_INTS11_rep1_tt_corr_ff_noJncReads_plus.bw', 'L_INTS11_rep2_tt_corr_ff_noJncReads_plus.bw' ), 'RRP40' = c( 'L_RRP40_rep1_tt_corr_ff_noJncReads_plus.bw', 'L_RRP40_rep2_tt_corr_ff_noJncReads_plus.bw' ) ), "RNA-seq" = list( 'CTRL' = c( 'T_EGFP_rep1_tt_corr_plus.bw', 'T_EGFP_rep2_tt_corr_plus.bw' ), 'ARS2' = c( 'T_ARS2_rep1_tt_corr_plus.bw', 'T_ARS2_rep2_tt_corr_plus.bw' ), 'CPSF73' = c( 'T_CPSF73_rep1_tt_corr_plus.bw', 'T_CPSF73_rep2_tt_corr_plus.bw' ), 'INTS11' = c( 'T_INTS11_rep1_tt_corr_plus.bw', 'T_INTS11_rep2_tt_corr_plus.bw' ), 'RRP40' = c( 'T_RRP40_rep1_tt_corr_plus.bw', 'T_RRP40_rep2_tt_corr_plus.bw' ) ), "3'-seq" = list( 'total' = list( 'noPAP' = list( 'WT' = c( 'siGFP_noPAP_in_batch1_plus.bw', 'siGFP_noPAP_in_batch2_plus.bw', 'siGFP_noPAP_in_batch3_plus.bw' ), 'RRP40' = c( 'siRRP40_noPAP_in_batch1_plus.bw', 'siRRP40_noPAP_in_batch2_plus.bw', 'siRRP40_noPAP_in_batch3_plus.bw' ) ), 'EPAP' = list( 'WT' = c( 'siGFP_xPAP_in_batch1_plus.bw', 'siGFP_xPAP_in_batch2_plus.bw', 'siGFP_xPAP_in_batch3_plus.bw' ), 'RRP40' = c( 'siRRP40_xPAP_in_batch1_plus.bw', 'siRRP40_xPAP_in_batch2_plus.bw', 'siRRP40_xPAP_in_batch3_plus.bw' ) ) ), '4sU' = list( 'noPAP' = list( 'WT' = c( 'siGFP_noPAP_ip_batch1_plus.bw', 'siGFP_noPAP_ip_batch3_plus.bw' ), 'RRP40' = c( 'siRRP40_noPAP_ip_batch1_plus.bw', 'siRRP40_noPAP_ip_batch3_plus.bw' ) ), 'EPAP' = list( 'WT' = c( 'siGFP_xPAP_ip_batch1_plus.bw', 'siGFP_xPAP_ip_batch3_plus.bw' ), 'RRP40' = c( 'siRRP40_xPAP_ip_batch1_plus.bw', 'siRRP40_xPAP_ip_batch3_plus.bw' ) ) ) ), 'ChIP-seq' = list( 'INTS11' = c( 'GSM3576716_ChIP_INTS11_abcam_shINTS11_ctrl_Hg38.bw', 'GSM3576717_ChIP_INTS11_sig_shINTS11_ctrl_r1_Hg38.bw' ) ) ), '-' = list( "TT-seq" = list( 'CTRL' = c( 'L_EGFP_rep1_tt_corr_ff_noJncReads_minus.bw', 'L_EGFP_rep2_tt_corr_ff_noJncReads_minus.bw' ), 'CPSF73' = c( 'L_CPSF73_rep1_tt_corr_ff_noJncReads_minus.bw', 'L_CPSF73_rep2_tt_corr_ff_noJncReads_minus.bw' ), 'INTS11' = c( 'L_INTS11_rep1_tt_corr_ff_noJncReads_minus.bw', 'L_INTS11_rep2_tt_corr_ff_noJncReads_minus.bw' ), 'RRP40' = c( 'L_RRP40_rep1_tt_corr_ff_noJncReads_minus.bw', 'L_RRP40_rep2_tt_corr_ff_noJncReads_minus.bw' ) ), "RNA-seq" = list( 'CTRL' = c( 'T_EGFP_rep1_tt_corr_minus.bw', 'T_EGFP_rep2_tt_corr_minus.bw' ), 'ARS2' = c( 'T_ARS2_rep1_tt_corr_minus.bw', 'T_ARS2_rep2_tt_corr_minus.bw' ), 'CPSF73' = c( 'T_CPSF73_rep1_tt_corr_minus.bw', 'T_CPSF73_rep2_tt_corr_minus.bw' ), 'INTS11' = c( 'T_INTS11_rep1_tt_corr_minus.bw', 'T_INTS11_rep2_tt_corr_minus.bw' ), 'RRP40' = c( 'T_RRP40_rep1_tt_corr_minus.bw', 'T_RRP40_rep2_tt_corr_minus.bw' ) ), "3'-seq" = list( 'total' = list( 'noPAP' = list( 'WT' = c( 'siGFP_noPAP_in_batch1_minus.bw', 'siGFP_noPAP_in_batch2_minus.bw', 'siGFP_noPAP_in_batch3_minus.bw' ), 'RRP40' = c( 'siRRP40_noPAP_in_batch1_minus.bw', 'siRRP40_noPAP_in_batch2_minus.bw', 'siRRP40_noPAP_in_batch3_minus.bw' ) ), 'EPAP' = list( 'WT' = c( 'siGFP_xPAP_in_batch1_minus.bw', 'siGFP_xPAP_in_batch2_minus.bw', 'siGFP_xPAP_in_batch3_minus.bw' ), 'RRP40' = c( 'siRRP40_xPAP_in_batch1_minus.bw', 'siRRP40_xPAP_in_batch2_minus.bw', 'siRRP40_xPAP_in_batch3_minus.bw' ) ) ), '4sU' = list( 'noPAP' = list( 'WT' = c( 'siGFP_noPAP_ip_batch1_minus.bw', 'siGFP_noPAP_ip_batch3_minus.bw' ), 'RRP40' = c( 'siRRP40_noPAP_ip_batch1_minus.bw', 'siRRP40_noPAP_ip_batch3_minus.bw' ) ), 'EPAP' = list( 'WT' = c( 'siGFP_xPAP_ip_batch1_minus.bw', 'siGFP_xPAP_ip_batch3_minus.bw' ), 'RRP40' = c( 'siRRP40_xPAP_ip_batch1_minus.bw', 'siRRP40_xPAP_ip_batch3_minus.bw' ) ) ) ) ) )
Parameters for how to transform values, treat replicates etc. Specific for each outermost group. ######### premean has been removed right? Add bin_stats, enhance_signal, force scale, groupautoscale
parameters = list( "TT-seq" = list( 'whichSamples' = NULL, 'bin_stats'=c('mean'), 'log2transform' = F, 'pseudoCount' = 1, 'batchCorrect' = T, 'batch' = c(1, 2, 1, 2, 1, 2, 1, 2, 2, 2), 'whichReps' = NULL, 'calcMean' = T, 'negValsSet0' = T, 'enhance_signals'=F, 'preMean' = F ), "RNA-seq" = list( 'whichSamples' = NULL, 'bin_stats'=c('mean'), 'log2transform' = F, 'pseudoCount' = 1, 'batchCorrect' = T, 'batch' = c(1, 2, 1, 2, 1, 2, 1, 2, 2, 2), 'whichReps' = NULL, 'calcMean' = T, 'negValsSet0' = T, 'enhance_signals'=F, 'preMean' = F ), "3'-seq" = list( 'whichSamples' = NULL, 'bin_stats'=c('mean'), 'log2transform' = F, 'pseudoCount' = 1, 'batchCorrect' = T, 'batch' = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3), 'whichReps' = NULL, 'calcMean' = T, 'negValsSet0' = T, 'enhance_signals'=T, 'preMean' = F ), "ChIP-seq" = list( 'whichSamples' = NULL, 'bin_stats'=c('mean'), 'log2transform' = F, 'pseudoCount' = 1, 'batchCorrect' = F, 'batch' = NULL, 'whichReps' = NULL, 'calcMean' = T, 'negValsSet0' = T, 'enhance_signals'=F, 'preMean' = F ) )
Each plot will usually also contain one or more types of annotations to be plotted. These can be assembled in a simple named list. For this vignette, we use a simplified annotation file containing only all isoforms of TAF1D from human Gencode v21 annotations.
bed=rtracklayer::import(system.file('extdata', 'hg38_TAF1D.bed', package='seqNdisplayR')) annotations=list('gencode v21'=bed)
seqNdisplay(samples, colors, bigwig_dirs, bigwigs, parameters, annots=annotations, feature='TAF1D')
parameters$`3'-seq`$calcMean <- FALSE parameters$`3'-seq`$batchCorrect <- FALSE
seqNdisplay(samples, colors, bigwig_dirs, bigwigs, parameters, annots=annotations, feature='TAF1D')
Given the numerous plotting arguments, it can be cumbersome to keep track of details. To facilitate this to some extent, one can also work with seqNdisplayRSession objects, which bundle all arguments in one object, essentially a named list containing all the arguments to seqNdisplayR().
session <- seqNdisplayRSession(samples=samples, colors=colors, bigwig_dirs = bigwig_dirs, bigwigs = bigwigs, parameters = parameters, annotations = annotations)
The session object contains a lot of info, simple print gives an overview of the sample contained
session
To inspect the details
print(session, verbose=T)
For convenience one can assemble track information in an Excel sheet based on the template at \link{XXXXXXXXXXXXXXupdate when finished} for an example. Information about Excel sheet template organization is in sheet 'README' in the above Excel sheet.
Usage of Excel templates with seqNdisplayR is straightforward:
xl_fname <- system.file("extdata", "example_excel_template.xls", package = "seqNdisplayR") session <- load_excel(xl_fname, load_annotations = TRUE)
The load_annotations option specifies whether annotation files are directly parsed into GRanges. If FALSE, only the path is stored and annotations are reloaded during each plotting. Recommended is to load them at this stage.
To get an overview over what was loaded:
print(session)
To create a plot using default settings this simply use:
plot(session_obj, feature='TAF1D')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.