"covNames must be colnames(GenomicRanges::values(gr))"
Consider adding workaround when web security certificate fails the initial check.
Apparent possible workaround is to update httr
option like this:
httr::set_config(httr::config(ssl_verifypeer=FALSE))
The change causes the SSL verification to be skipped, which should only be done when the risks are mitigated, for example when using an internal host within a firewall for example.
DONE. Remove warning "setShadow is deprecated and will be removed..."
width()
function not found, caused by not having
proper package prefix, GenomicRanges::width()
or Biobase::width()
.pkgdown
package documentation. Latest version of R used for
testing is 4.2.3, although it should be equivalent for R-4.4.0.Minimize R package dependencies?
plyr
is not called, and is implied by dplyr
already.
DONE. Stop fully importing jamba, ggplot2
DONE. Use proper package prefixing ggplot2::ggplot()
to avoid needing
to import the full package and all functions.
DONE. General request for more flexible splice count label positions.
Currently uses ggrepel
to position labels, inconsistently placed
sometimes inside or outside the junction ribbon.
Consider NCBI tracks for miscellaneous RNA-seq data tracks as a convenient default set of tracks to use when viewing "supporting evidence" for transcription of a particular gene:
"RNA-seq exon coverage, aggregate (filtered) - log base2 scaled"
review R packages that produce spliced RNA plots: similar tricks; potential for interoperability; better ideas to make the workflow "easy"
Ularcirc: https://bioconductor.org/packages/3.16/bioc/html/Ularcirc.html
plotAllelicGene()
switchPlot()
and switchPlotTranscript()
plotTranscripts()
plotCoverage()
which appears to show coverage by exon, and arcs per splice;
also plotSpliceGraph()
which plots schematic splice graph, flattened per gene.makeTx2geneFromGtf()
optionally include other gtf/gff3 columns in the output:
"source"
(column 2 in GTF/GFF3)seqname,start,end,strand
: columns 1, 4, 5, 7optional helper function to describe the observed gene and tx
attribute names, for the purpose of using them with makeTx2geneFromGtf()
.
Suggest adding long gene name to the plot output somehow, in addition to the gene symbol.
FIXED: reported bug that upon reproducing on local machine shows R error from duplicate row.names, derived from coercing splice GRanges to data.frame.
This bug appears to be fixed, at least when testing genes Gria1
and
Tnr
with all four CB
samples. Those two test cases reproduced the
reported error, and the updated splicejam
no longer shows an error.
BGA plot function bgaPlotly3d()
allow adjustment to text label color distinct of polygon color
COMPLETE: Update (minor): Import STAR "SJ.out.tab"
format equivalent to
importing BED12 format.
When the junctions file has 9 columns it is assumed to be STAR format.
The import now uses data.table::fread()
and not
rtracklayer::import.bed()
.
COMPLETE: Update (minor): Optional R-shiny startup without displaying a gene.
Use default_gene="blank"
to start with a blank plot.
Test (minor): test against alternative, commonly used GTF sources
Use case: Most steps were designed to use Gencode GTF. The workflow should also work when using other GTF files.
makeTx2geneDFfromGtf()
extend to include genome coordinates (major)
currently tx2geneDF
does not return genome coordinates,
in part because a given gene symbol may be located in
multiple locations, multiple chromosomes, e.g. "7SK"
.
Consider bookdown online documentation (major)
Need model to follow for where to put documentation, how to host. Check: jokergoo/ComplexHeatmap; clusterProfiler, enrichplot.
Consider using patchwork instead of cowplot?
There is some issue with including the gene model panel twice, maybe patchwork does that step well?
Update (minor): Update Shiny progress bar after coverage is complete, before splice junctions are being loaded.
Use case: After loading coverage files, for example the label says "coverage (16 of 16)", it pauses for a long time. What is it doing? I want to know!
New feature (minor): Allow resizing the gene-transcript panel
Use case: Sometimes there are way too many transcripts, the panel should be much taller. Sometimes the exon labels are clipped off the bottom of the figure.
Note that adjusting "per panel size" affects gene-transcript panel, making it much too crowded.
New feature (minor): Allow down-sampling the coverage profile
Use case: exons are rarely displayed one base per pixel, more often an exon is roughly 100-500 bases wide, but displayed in the equivalent of 25-50 pixels. The adjustment would be to down-sample data to help speed up the rendering step.
New feature (minor): Optionally display gene model per column for multi-column layout.
Use case: Currently with two-column display the gene model does not visually align to the coverages.
New feature (inquiry): Test porting to ggbio or Gviz for broader re-use.
Use case: To integrate other track types, coverage, peaks, genes, etc. the ggbio and Gviz R packages are more feature-rich. I think ggbio is no longer officially supported?
GdObject-class
and create CustomTrack,
which should mimic the workflow used by AlignmentsTrack()
:R
Gviz::CustomTrack(plottingFunction=function(GdObject, prepare=FALSE, ...){},
variables=list(), name="CustomTrack", ...)
New feature (minor): Color picker alongside Shiny app "Sample Selection"
Use case is to display, and allow selection of colors per sample
Not sure if "easy" color picker widgets will work with the selectable table widget.
New feature (major): allow multiple genes/features in display region, not just one gene
Use case: display more than one gene; display genes and peaks.
splicejam::make_ref2compressed()
with a set of GRanges
.Major: junction strandedness should not be forced to match the strand of the displayed gene. This change may cause problems using STAR junctions, since STAR junction strandedness is not always accurate.
New feature (moderate): Extend splicejam::make_ref2compressed()
Use case: If displaying RNA-seq and ChIP-seq data together, it may be useful to extend regions around the TSS before compressing gaps, to show signal near the TSS. Similar with exons, peaks, etc. Some configuration options could be useful to describe specifically in this function.
New feature (major): Display multiple genes in adjacent panels.
Use case: Given 3 genes of interest, display sashimi plots side-by-side
cowplot
or patchwork
to
assemble them into multi-panel figure.Conclusion: Probably will not implement. May need to add this use case to the vignette.
New feature (major): display Sashimi plots similar to ggridges ridge plots
Use case: ggplot2 facet panels take up much extra space, making figures look heavy and bulky - they take up a lot of web browser space. The idea is to plot coverage/junctions slightly overlapping so they are adjacent and perhaps more easily compared visually. Also, more samples can be plotted together without using so much figure space.
Could allow some trickery, like ordering samples by a factor column,
where visual gaps could be inserted by using empty factor levels.
For example: GroupA_Veh, GroupA_Treated, GroupA_blank, GroupB_Veh, GroupB_Treated.
In this case "GroupA_blank"
would be a factor level, but there would
be no sashimi plot data for that factor level, so it would be drawn empty,
leaving a visual gap.
New features (major): coverage and junctions from BAM files
Use case is to allow BAM input, both for coverage and for junctions.
Major enabling feature: This step is a precursor to handling scRNA-seq, which could dynamically split the BAM reads by tSNE/UMAP clusters.
New data (major): pre-defined hg19 transcriptome data required for sashimi plots
Gencode comprehensive with all genes, flat exons pre-computed.
Describe steps used so others can use their own GTF as needed.
New feature (nice to have): ability to select/deselect transcripts displayed in the R-shiny app for a given gene.
Use case is when displaying all transcripts for a given gene, but there are too many un-expressed transcripts, e.g. human ACTB. Could allow a separate table to select transcripts to include/hide. When performing "Update" the compressed exons would be re-calculated for that gene using the selected exons.
COMPLETE: Fix broken Sample Selection - change to table selection/ordering
include sessionInfo()
in a hidden section on the R-shiny app
to be able to confirm all packages and version numbers being used.
for interactive plots, highlighting a splice junction also highlights all splice junctions for the same exon span, which is good
for flat gene-exon models, name the gap using the format
geneSymbolexon_name1-geneSymbolexon_name2
which will allow
the gap to be highlighted as well.
Workflow:
Idea is to allow manually selecting subset or superset of transcripts to include in the flatExonsByGene, partly to allow removing transcripts that mess up the overall gene model, partly to allow showing a highlighted subset of transcripts for example for diffSplice/DEXseq splice hits. Sometimes a gene has 8 detected isoforms, but the predominant change involves only 2 or 3 of those isoforms. Would be great to be able to create the ideal subset.
(Note this workflow is already possible with manual steps outside the R-shiny app, but who wants to do that?)
Allow expanding the x-axis genomic region being displayed, beyond the gene body. Note this step requires defining the compression to use for upstream region -- for example should it be 10:1 compressed in visual space?
Note the genomic coordinates are not being displayed below certain genes (human GAS5) -- why not? It might be because this gene has one contiguous exon due to some noisy unspliced isoforms. The coordinate labeling function might be trying to use only the outside coordinate. Could adjust that function to use either the edge, or if two labels are separated by more then 5% the total exon width, display that label.
Genomic axis label logic for compressed GRanges coordinates: Start by labeling outer edge of each exon, then internal exon boundaries. Add each label if it will be at least 5% distant from another label, based upon the total exon width, and total number of gaps.
Review new ggplot2 axis labeling rules and whether we can rely upon ggplot2 to hide axis labels that would otherwise overlap.
Allow manually setting one common y-axis range. Currently only possible to adjust the y-axis by using interactive plotly, which loses some important junction labeling.
COMPLETE: Allow the gene-transcript panel to be adjusted taller.
By default, gene panel height should be auto-adjusted based upon the number of transcripts shown.
Ideally the panel height should be adjusted based upon label height.
Enable more default settings such as number of columns, panel height, font sizes, etc. The app should be fully customizable upon startup, so that new users get the intended default experience.
Available:
c("coordinates", "exon names")
c("+","-","both")
strands to displayc(TRUE, FALSE)
whether to use shared y-axis rangeTodo:
Consider minimum junction threshold based upon % coverage range instead of absolute number. Not urgent, can be manually adjusted.
detectedTxInfo()
option to return only detected results.
change labels to be user-friendly
new function to convert detectedTxInfo()
into ComplexHeatmap:
box around cells which meet the "detected" thresholds overall (cell_fun)
option to display "detected" or "all" Tx
Consider optional per-panel label to supplement/replace the text label used by ggplot2 facet wrap. Label could be positioned topright, topleft, bottomright, bottomleft, or auto.
prepareSashimi()
throws an error when flatExonsByGene
contains different seqlevels than the BigWigFile
, usually
when flatExonsByGene
contains more seqlevels than present
in the BigWigFile
. It happens even when the exon features
of interest involve seqlevels present in BigWigFile
.
The fix is to reduce the seqlevels in flatExonsByGene
to match those in the BigWigFile
, or to reduce seqlevels
to actual features in flatExonsByGene
. At that point
an error would accurately reflect that BigWigFile
does
not contain seqlevels for the requested seqlevels, and
thus would be helpful.sashimiAppConstants()
to its own
proper standalone function, that prepares all the dependency
data objects like flatExonsByGene
, flatExonsByTx
,
tx2geneDF
.filesDF
. Essentially appends
a file to an existing data.frame
.plotlySashimi()
that optionally includes
the gene-transcript-exon model in the visualization.zoom_by=c("Gria1_exon11", "Gria1_exon16")
.sample_id
entries. Default is to overlay
replicates in the same facet panel, facet by sample_group
.
Alternative is to offset the y-axis similar to the "ggridges"
package. Unclear if the layers coverage polygons, and
junction arcs, will be out of sync; that is sample_1
should draw coverage and junctions, before sample_2 is
drawn.Rsamtools
package, but show recommended examples
for properly paired reads.prepareSashimi()
to take GRanges range
an optional coordinate range, or multiple genes (which would
imply the coordinate range if on the same chromosome).detectedTxInfo()
output in
the form of plots. For example, show percent max expression
as a heatmap, each call labeled to show the counts, TPM,
percentage, and whether it was called "detected" by
the thresholds.flattenExonsBy()
assignGRLexonNames()
- dependent upon jamba::makeNames()
annotateGRfromGR()
- dependent upon shrinkMatrix()
shrinkMatrix()
combineGRcoverage()
df2colorSub()
prepareSashimi()
. It shrinks data volume to about 7 times smaller,
reducing duplicated annotations per row for polygon coordinates.
However, ggplot2 still requires unnesting the tibble into a tall
format for plotting, it is unclear whether that step will incur
its own performance hit.Plotly tooltips are so finicky, but try anyway:
Make font sizes more configurable.
Transformed axis:
More Sashimi examples:
ALE-specific RNA-seq analysis workflow.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.