The Shiny Variant Explorer (tSVE) was primarily developped to demonstrate
features implemented in the r Biocpkg("TVTB")
, not as a production
environment.
As a result, a few important considerations should be made
to clarify what should and should not be expected from the
web-application:
...
[^1]).ggplot
objects,
definition of custom genomic ranges).ggplot
) are currently the only output that can be exported from
the web-application (using the web browser
"Download image", or equivalent context menu item).
In the future, action buttons may be added to export
tables (e.g. CSV format) and figures (e.g. PDF format).r Biocpkg("TVTB")
package installed, so that users may follow
the instructions marked by the word Action and bulleted points
in the following sections.[^1]: The ...
argument is called "ellipsis".
The Shiny Variant Explorer suggests a few additional package dependencies compared to the package, to support certain forms of data input and display.
Input
r Biocpkg("ensembldb")
package and relevant EnsDb
[^2] annotation
packages are required if that interface is used to query genomic ranges
(demonstrated in this section).r Biocpkg("EnsDb.Hsapiens.v75")
is required to query
genomic ranges associated by gene names for the demonstration data[^3].r Biocpkg("rtracklayer")
package is required if a BED file is used
to provide genomic ranges (demonstrated in this section).[^2]: In the future, the web-application may also
support TxDb
and OrganismDb
annotation packages.
[^3]: In the future, the web-application may also use annotation packages to facet statistics and figures by genomic range(s).
Display
r Githubpkg("rstudio/DT")
package is recommended
to benefit from the latest developments (e.g. column filters inactivated
if a single value exist in that column; version >= 0.2.2).r Githubpkg("rstudio/shiny")
package is required for all Shiny
web-applications.The TVTB::tSVE()
method launches the web-application.
Overall, the web-application is implemented as a web-page with a top level navigation bar organised from left to right to reflect progression through a typical analysis, with the exception of the last two menu items Settings and Session, which may be useful to check and update at any point.
Here is a brief overview of the menu items:
EnsDb
annotation package may be selected to use the associated
database interface.The Input panel controls the major input parameters of the analysis, including phenotypes (and therefore samples), genomic ranges, and fields to import from VCF file(s). Those inputs are useful to import only data of interest, as well as to limit memory usage and duration of calculations.
Phenotypes are critical to define groups of samples that may be compared
in summary statistics, tables, and plots.
Moreover, phenotypes also implicitely define the set of samples required
in the analysis (unique sample identifiers usually set as rownames
of the phenotypes).
The web-application accepts phenotypes stored in a text file, with the following requirements:
read.table
function).rownames
.colnames
.When provided, phenotypes will be used to import from VCF file(s) only genotypes for the corresponding samples identifiers. Moreover, an error message will be displayed if any of the sample identifiers present in the phenotypes is absent from the VCF file(s).
Note that the web-application does not absolutely require phenotype information. In the absence of phenotype information, all samples are imported from VCF file(s).
Action:
- Click on the Browse action button
- Navigate to the
extdata
folder of the TVTB installation directory- Select the file
integrated_samples.txt
Alternatively: click the Sample file button
Notes
r Biocpkg("TVTB")
installation directory can be identified
using the following command in an R session:system.file("extdata", package = "TVTB")
Genomic ranges are critical to import only variants in targeted genomic regions or features (e.g. genes, transcripts, exons), as well as to limit memory usage and duration of calculations.
The Shiny Variant Explorer currently supports three types of input to define genomic ranges:
EnsDb
annotation packagesCurrently, the web-application uses genomic ranges solely to query the corresponding variants from VCF file(s). In the future, those genomic ranges may also be used to produce faceted summary statistics and plots.
Notes:
If a BED file is supplied, the web-application parses it using the
r Biocpkg("rtracklayer")
import.bed
method.
Therefore the file must respect the
BED file format
guidelines.
Action:
- Click on the Browse action button
- Navigate to the
extdata
folder of the TVTB installation directory- Select the file
SLC24A5.bed
Alternatively: click the Sample file button
Notes:
Sequence names (i.e. chromosomes), start, and end positions of one or more
genomic ranges may be defined in the text field,
with individual regions separated by ";"
.
Action:
- Paste
15:48,413,169-48,434,869
in the text fieldAlternatively: click the Sample input button
Notes:
","
characters from the text
input, before coercing the start and end positions to numeric
1:123-456;2:234-345;2:456-789
)Currently, genomic ranges encoding only gene-coding regions may be retrieved
from an Ensembl-based database.
This feature was adapted from the web-application implemented in the
r Biocpkg("ensembldb")
package.
\bioccomment{ In the future, the interface to query transcripts and exons annotations may be added to the web-application. }
Action:
- Paste
SLC24A5
in the text fieldAlternatively: click the Sample input button
\fixme{ Genomic feature located on contigs may cause problems when working with one VCF per chromosome. In the future, an option may be added to ignore contigs. }
At the core of the r Biocpkg("TVTB")
package, variants must be imported from
one or more VCF file(s) annotated by the Ensembl Variant Effect Predictor
(VEP)
script [@RN1].
Considering the large size of most VCF file(s), it is common practice to split genetic variants into multiple files, each file used to store variants located on a single chromosome (more generally; a single sequence). The Shiny Variant Explorer supports two situations:
seqnames
slot of the genomic ranges described
above ("Multi-VCF mode").In addition, VCF files can store a plethora of information in their various
fields. It is often useful to select only a subset of fields relevant for
a particular analysis, to limit memory usage.
The web-application uses the
r Biocpkg("VariantAnnotation")
scanVcfHeader
to parse the header of
the VCF file (Single-VCF mode) or the first VCF file (Multi-VCF mode),
to display the list of available fields that users may choose to import.
A few considerations must be made:
"GT"
key be present in the
FORMAT field.This mode display an action button that must be used to select the VCF file from which to import variants.
Action:
- Click on the Browse action button
- Navigate to the
extdata
folder of the TVTB installation directory- Select the file
chr15.phase3_integrated.vcf.gz
Alternatively: click the Sample file button
This mode requires two pieces of information:
"%s"
to declare the emplacement of the
sequence (i.e. chromosome) name in the pattern.Note that a summary of VCF file(s) detected using the given the folder and pattern is displayed on the right, to help users determine whether the parameters are correct. In addition, the content of the given folder is displayed at the bottom of the page, beside the same content filtered for the VCF file naming pattern.
Action:
None. The text fields should already be filled with default values, pointing to the single example VCF file (
chr15.phase3_integrated.vcf.gz
).
This panel allows users to select the INFO and FORMAT fields to import
(in the info
and geno
slots of the VCF
object, respectively).
It is important to note that the FORMAT/GT and INFO/<vep>
stands for the INFO key where Ensembl VEP predictions are stored---are
implicitely imported from the VCF.
Similarly, the mandatory FIXED fields CHROM
, POS
, ID
, REF
, ALT
,
QUAL
, and FILTER
are automatically imported to populate
the rowRanges
slot of the VCF
object.
Action:
- Click the Deselect all action button under the INFO fields selection input to import only the INFO/CSQ and FORMAT/GT fields.
- Click the Import variants action button
A summary of variants, phenotypes, and samples imported will appear beside the action button.
This panel allows users to select a pre-installed annotation package.
Currently, only EnsDb
annotation packages are supported,
and only gene-coding regions may be queried.
Action:
- If none of the
EnsDb
packages are installed, it will simply not be possible to use theensembl
interface of the Genomic ranges input tab.- If the
EnsDb.Hsapiens.v75
package is the onlyEnsDb
packages installed, no action is required; the package should already be pre-selected.- If the
EnsDb.Hsapiens.v75
package is not the onlyEnsDb
packages installed, users should select it in the list of choices.
This panel demonstrates the use of three methods implemented in the
r Biocpkg("TVTB")
package, namely addFrequencies
, addOverallFrequencies
,
and addPhenoLevelFrequencies
.
This panel allows users to Add and Remove INFO fields that contain genotype counts (i.e. homozygote reference, heterozygote, homozygote alternate) and allele frequencies (i.e. alternate allele frequency, minor allele frequency) calculated across all the samples and variants imported. The web-application uses the homozygote reference, heterozygote, and homozygote alternate genotypes defined in the Advanced settings panel.
Importantly, the name of the INFO keys that are used to store the calculated values can be defined in the Advanced settings panel.
Action:
- Click the Add action button
- See the Latest changes message update at the top of the screen.
- Optionally, the Views panel can be used to examine the new fields
This panel allows users to Refresh the list of INFO fields that contain genotype counts and allele frequencies calculated within groups of samples associated with various levels of a given phenotype.
Action:
- Select
super_pop
in the list of phenotypes- Click the Select all action button
- Click the Refresh action button
- See the Latest changes message update at the top of the screen.
- Optionally, the Views panel can be used to examine the new fields
One of the flagship features of the r Biocpkg("TVTB")
package
are the VCF filter rules, extending the
r Biocpkg("S4Vectors")
FilterRules
class to new classes of filter rules
that can be evaluated within environments defined by the various slots
of VCF
objects.
Generally speaking, FilterRules
greatly facilitate the design
and combination of powerful filter rules for table-like objects,
such as the fixed
and info
slots of
r Biocpkg("VariantAnnotation")
VCF
objects,
as well as Ensembl VEP predictions stored in the meta-columns of GRanges
returned by the r Biocpkg("ensemblVEP")
parseCSQToGRanges
method.
A separate vignette describes in greater detail the use of classes that contain VCF filter rules. A simple example is shown below.
Action:
- Select
VEP
as the Type of filter- Paste
grepl("missense",Consequence)
in the text field- Leave the Active? checkbox ticked
- Click the Add filter action button
- See the list of rules update at the bottom of the screen
- Click the Apply filters action button
- See the summary of filtered variants update beside the action button
- Optionally, the Views panel can be used to examine the new fields
Alternatively: click the Sample input button
This panel offers the chance to examine the main objects of the session, namely:
rowRanges
and selected meta-columns of the filtered variants.info
slot (of the filtered variants).ggplot
).Action:
- In the various panels, select fields to examine each object
- In particular, note the INFO fields that contain genotype counts and allele frequencies calculated earlier
- Go to the Heatmap tab of the Genotypes panel
- Click the Go! action button to calculate and display the heatmap
This panel demonstrates the use of two methods implemented in the
r Biocpkg("TVTB")
package, namely tabulateVepByPhenotype
and densityVepByPhenotype
.
This panel stores more advanced settings that users may not need to edit as frequently, if at all. Those settings are divided in two sub-panels:
It is critical to accurately identify and define how the different
genotypes---homozygote reference, heterozygote, and homozygote alternate---are
encoded in the VCF file, to produce accurate
genotypes counts and frequencies, for instance.
This generally requires examining the
content of the FORMAT/GT field outside of the web-application.
For instance, the functions unique
and table
may be used to identify
(and count) all the distinct genotype codes in the geno
slot ("GT"
key) of
a VCF
object.
The default selected values are immediately compatible with the demonstration data set. Users who wish to select genotypes codes not yet available among the current choices may either contact the package maintainer to add them in a future release, or edit the Global configuration file of the web-application locally.
Currently, the three calculated genotypes counts and two allele frequencies require five INFO fields to store their respective values.
Considering that r Biocpkg("TVTB")
offers the possibility to calculate
counts and frequencies for the overall data set, and for each level of each
phenotype, it is important to define a clear and consistent naming mechanism
that does not conflict with INFO keys imported from the VCF file(s).
In the r Biocpkg("TVTB")
package, a suffix is required for each type of
genotype and frequency calculated, to generate INFO as follows:
<suffix>
<phenotype>_<level>_<suffix>
Again, the default values are immediately compatible with the demonstration data set. For other data sets, it may be necessary to change those values, either by preference, or to avoid conflict with INFO keys imported from the VCF file(s).
Other rarely used settings in this panel include:
r Biocpkg("Rsamtools")
documentation.Several functionalities of the r Biocpkg("TVTB")
package are applied
to independent subsets of data (e.g. counting genotypes in various levels
of a given phenotype). Such processes can benefit from multi-threaded
calculations. Multi-threading settings in the Shiny web-application are
somewhat experimental, as they have been validated only on a small set of
operating systems, while some issues have been reported for others.
| Report | Operating System | Cluster Class | Cluster type | # Cores | | :----: | :---:| :-----------: | :----------: | :-----: | | OK | Ubuntu 14.04 | Multicore | FORK | 2 | | OK | Scientific Linux 6.7 | Multicore | FORK | 2 | | Hang~1~ | OS X El Capitan | Snow | SOCK | 2 |
\bioccomment{ Users are welcome to send feedback to report additional successful configuration, as well as newly identified issues. }
The last panel of the Shiny Variant Explorer offers detailed views of objects and settings in the current session, including:
sessionInfo()
valueVCF
objectGRanges
that store the Ensembl VEP predictionsgeno
slot ("GT"
key) of the raw variantsMost default values are stored in the global.R
file of the web-application.
All the files of the web-application are stored in the extdata/shinyApp
folder of the r Biocpkg("TVTB")
installation directory
(see an earlier section to identify this directory).
Users who wish to change the default values of certain input widgets
(e.g. genotype codes)
may edit the global.R
file accordingly. However, the file will be reset at
each package update.
\bioccomment{ In the future, a mechanism may be implemented to override global settings locally, without risk of seeeing this custom configuration overwritten at the next package update (e.g. a file in the user home folder that would be parsed to overwrite certain settings). }
Here is the output of sessionInfo()
on the system on which this
document was compiled:
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.