knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(jellyfisher)
Jellyfisher is an R package (a htmlwidget) for visualizing tumor evolution and subclonal compositions using Jellyfish plots. The package is based on the Jellyfish visualization tool, bringing its functionality to R users. Jellyfisher supports both ClonEvol results and plain data frames, making it compatible with various tools and workflows.
The input data should follow specific structures for samples, phylogeny, subclonal compositions, and optional ranks.
The jellyfisher
package includes an example data set
(jellyfisher_example_tables
) based on the following publication:
Lahtinen, A., Lavikka, K., Virtanen, A., et al. "Evolutionary states and
trajectories characterized by distinct pathways stratify patients with ovarian
high-grade serous carcinoma." Cancer Cell 41, 1103–1117.e12 (2023). DOI:
10.1016/j.ccell.2023.04.017.
# Subset the data to keep the compiled vignette a bit smaller jellyfisher_example_tables <- jellyfisher_example_tables |> select_patients(c("EOC69", "EOC677", "EOC495", "EOC809"))
sample
(string): specifies the unique identifier for each sample.displayName
(string, optional): allows for specifying a custom name for each sample. If the column is omitted, the sample
column is used as the display name.rank
(integer): specifies the position of each sample in the Jellyfish plot. For example, different stages of a disease can be ranked in chronological order: diagnosis (1), interval (2), and relapse (3). The zeroth rank is reserved for the root of the sample tree. Ranks can be any integer, and unused ranks are automatically excluded from the plot. If the rank
column is
absent, ranks are assigned based on each sample’s depth in the sample tree.parent
(string): identifies the parent sample for each entry. Samples without a specified parent are treated as children of an imaginary root sample.head(jellyfisher_example_tables$samples, 25)
subclone
(string): specifies subclone IDs, which can be any string.parent
(string): designates the parent subclone. The subclone without a parent is considered the root of the phylogeny.color
(string, optional): specifies the color for the subclone. If the column is omitted, colors will be generated automatically.branchLength
(number): specifies the length of the branch leading to the subclone. The length may be based on, for example, the number of unique mutations in the subclone. The branch length is shown in the Jellyfish plot's legend as a bar chart. It is also used when generating a phylogeny-aware color scheme.head(jellyfisher_example_tables$phylogeny, 25)
Subclonal compositions are specified in a tidy format, where each row represents a subclone in a sample.
sample
(string): specifies the sample ID.subclone
(string): specifies the subclone ID.clonalPrevalence
(number): specifies the clonal prevalence of the subclone in the sample. The clonal prevalence is the proportion of the subclone in the sample. The clonal prevalences in a sample must sum to 1.head(jellyfisher_example_tables$compositions, 25)
The ranks in the example data set are used to indicate the time points when the samples were acquired.
rank
(integer): specifies the rank number. The zeroth rank is reserved for the inferred root of the sample tree. However, you are free to define a title for it.title
(string): specifies the title for the rank.head(jellyfisher_example_tables$ranks, 6)
The three tables are passed to the jellyfisher
function as a named list. The
function generates an interactive Jellyfish plot based on the input data. If the
data set contains multiple patients, the Jellyfisher htmlwidget shows navigation
buttons to switch between patients.
jellyfisher(jellyfisher_example_tables, width = "100%", height = 450)
jellyfisher(jellyfisher_example_tables, options = list( sampleHeight = 70, sampleTakenGuide = "none", tentacleWidth = 3, showLegend = FALSE ), width = "100%", height = 400)
When plotting multiple patients, Jellyfisher shows buttons (Previous and Next)
to navigate between patients. When the data contains only one patient, these
buttons are hidden. The package also provides a select_patients
function to
filter the data set with ease.
jellyfisher_example_tables |> select_patients("EOC677") |> jellyfisher(width = "100%", height = 400)
The sample trees in the example data set were constructed as follows:
"For each sample, we checked whether an earlier time point included exactly one sample from the same anatomical location. If such a sample existed, it was assigned as the parent; otherwise, the inferred root was used as the parent."
However, this mechanistic approach may not always produce credible sample trees.
The r1Bow1 (bowel) sample in the following jellyfish plot is derived from an earlier bowel sample p2Bow1_c, which has no traces of the subclone 12.
jellyfisher_example_tables |> select_patients("EOC809") |> jellyfisher(width = "100%", height = 600)
Using the set_parents
function, we can adjust the parent of the r1Bow1 sample
to be p2Per1_cO (peritoneum), which is a possible source of the metastasis due
to its proximity. The high prevalence of subclone 12 in this sample suggests
that it is the likely source of the metastasis in the r1Bow1 sample.
jellyfisher_example_tables |> select_patients("EOC809") |> set_parents(list("EOC809_r1Bow1_DNA1" = "EOC809_p2Per1_cO_DNA2")) |> jellyfisher(width = "100%", height = 600)
While ranks (the columns) can indicate the time points when the samples were acquired, they can also be used to simply show the sample's depth in the sample tree. For instance, the following plot shows all the samples on the same rank, indicating that they were diagnostic samples acquired at the same time.
jellyfisher_example_tables |> select_patients("EOC495") |> jellyfisher(width = "100%", height = 650)
However, one can argue that the LN (lymph node) samples represent a later development in the disease, and thus, they should be placed on a later rank. We can remove the existing ranks, define new parent-child relationships, and let Jellyfisher assign the ranks based on the sample tree depth.
tables <- jellyfisher_example_tables |> select_patients("EOC495") # Remove existing ranks. The ranks will be assigned automatically based # on samples' depths in the sample tree. tables$samples$rank <- NA # Rank titles should be removed as well because they are no longer valid. tables$ranks <- NULL tables |> set_parents(list("EOC495_pLNL1_DNA1" = "EOC495_pLNR_DNA1", "EOC495_pLNL2_DNA1" = "EOC495_pLNL1_DNA1")) |> jellyfisher(width = "100%", height = 500)
If we think that the lymph node samples represent an even later development,
we can manually assign ranks to the samples. The set_ranks
function provides
an easy way to do this.
tables |> set_parents(list("EOC495_pLNL1_DNA1" = "EOC495_pLNR_DNA1", "EOC495_pLNL2_DNA1" = "EOC495_pLNL1_DNA1")) |> set_ranks(list("EOC495_pLNR_DNA1" = 2, "EOC495_pLNL1_DNA1" = 3, "EOC495_pLNL2_DNA1" = 4), default = 1) |> jellyfisher(width = "100%", height = 400)
rm(tables)
Typically in tumor evolution studies, the focus is on the subclonal compositions
of the tumor cells. Thus, the root node in the phylogenetic tree represents the
founding clone. However, the data may also contain non-aberrant cells, i.e., the
tumor purity is less than 100%. For these cases, the normalsInPhylogenyRoot
option instructs Jellyfisher to treat the root node as non-aberrant cells. The
option has two consequences: (1) No tentacles are drawn between the root
subclones, and (2) the root subclone is colored as white when the
phylogeny-aware color scheme is used.
# Subclone N at the root represents the non-aberrant cells. # The letter N has no special meaning in Jellyfisher. non_aberrant <- list( samples = data.frame(sample = c("A", "B")), compositions = data.frame( sample = c("A", "A", "A", "B", "B", "B"), subclone = c("N", "1", "2", "N", "1", "2"), clonalPrevalence = c(0.2, 0.4, 0.4, 0.3, 0.3, 0.4) ), phylogeny = data.frame( subclone = c("N", "1", "2"), parent = c(NA, "N", "1") ) )
non_aberrant |> jellyfisher(options = list( normalsAtPhylogenyRoot = TRUE ), width = "100%", height = 350)
In the above plot, the root clone, which represents the non-aberrant cells, is hidden from the inferred root sample. However, sometimes a patient may have multiple independent clones, and in these cases the root clone is shown:
# Change the parent of subclone 2 to N non_aberrant$phylogeny$parent[non_aberrant$phylogeny$subclone == "2"] <- "N" non_aberrant |> jellyfisher(options = list( normalsAtPhylogenyRoot = TRUE ), width = "100%", height = 350)
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.