knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Data from a cohort of patients can be represented as a dataframe with at least 7 columns, where every row represents one genomic alteration annotated for the analysis.
| Field | Type | Description |
| --- | --- | --- |
| Misc
| string | Customary annotation which is never used to analyse the data, but that might be good to carry around |
| patientID
| string | A patient identifier |
| variantID
| string | An alteration identifier |
| cluster
| string | Group ID (eg., a clone, with CCF data) |
| is.driver
| logical | TRUE if the alteration is a driver to correlate when repeated trajectores are extracted |
| is.clonal
| logical | TRUE if the alteration belongs to the group of clonal (truncal) events; there should only one such clonal group per patient |
| CCF
| string | A parsable format for storage of input CCFs or binary data (see below) |
All identifiers cannot contain spaces or dash/ hyphen (-
) symbols, and must be unique:
the patientID
must be unique across patients;
the variantID
for driver events (those with is.driver = TRUE
) must be shared across all patients that have the same driver event, and cannot appear multiple times in the data of a single patient. Notice that you should not annotate an alteration as driver if it is not found in multiple patients.
the variantID
for non-driver events (those with is.driver = FALSE
) is in practice never used and therefore can be set freely.
Notice that the input dataframe has the same structure for both CCF and binary input data.
The scope of supported alterations is broad, and include any SNV, larger chromosomal re-arrangment or other covariate that can be encoded in CCF or binary format.
The variantID
field of driver alterations (is.driver=TRUE
) will be matched to correlate trajectories in multiple patients, and can be you whatever you find more suitable for your analysis, for instance:
a Hugo_Symbol (BRAF
)
a name for a well-known SNV (BRAF_600E
)
a reference to some cytoband (3q26.32
)
your custom annotation (MyFavoritePathway
).
Alterations are also associated to groups (via cluster
), which constitute the nodes of the computed trees. A group can have 0 drivers annotated, but every patient should have at least one driver to be analyzed with REVOLVER
.
See also the Frequently Asked Questions if you are interested in modelling parallel evolution.
Field CCF
represents data from both types of supported datasets:
real-valued CCF values (in [0, 1]),
or input binary values (either 0 or 1).
Since patients can have different number of samples/ regions associated, CCF
is a general string.
The format that we propose is simple, and easy to parse it:
# Example string that reports a CCF value 0.86 in region R1, 1 in R2 etc. R1:0.86;R2:1;R3:1 # Similarly, in binary presence/ absence format ... R1:0;R2:1;R3:1
If you use this format, REVOLVER
provides you a parsing function
that you can pass to the cohort construction function
An example cohort is released in the evoverse.datasets
R package, which also provides a REVOLVER
objectwith the final analysis of the input cohort.
# Load the package and runs available_data() to print what is available library(evoverse.datasets) # Load an example dataset - the TRACERx cohort data("TRACERx_NEJM_2017", package = "evoverse.datasets") # This is a dataset that follows the specifications given above print(TRACERx_NEJM_2017)
Once you have a dataframe/ tibble in the required format, you can use a cohort construction function.
In this case, to make it faster we only use 3 patients from the released cohort.
require(tidyverse) require(revolver) subset_data = TRACERx_NEJM_2017 %>% filter(patientID %in% c("CRUK0001", "CRUK0002", "CRUK0003")) # Create a cohort my_cohort = revolver_cohort( subset_data, CCF_parser = CCF_parser, ONLY.DRIVER = FALSE, MIN.CLUSTER.SIZE = 0, annotation = "My small REVOLVER dataset" ) # This is now a REVOLVER cohort with S3 methods available print(my_cohort) plot(my_cohort)
The cohort is now ready to be inspected and analysed (note that since we only used 3 patients a warning is raised to mention that the cohort contains drivers annotated which are not really recurrent and that, therefore, should be removed; if you construct the cohort object with the full set of 99 patients that warning will vanish).
# Load a full cohort - the TRACERx cohort data("TRACERx_NEJM_2017_REVOLVER", package = "evoverse.datasets") # We can use S3 object functions to retrieve simple information about the cohort. # The `print` functions runs also the `revolver_check_cohort` function which # tells us that some patient have only 1 clone with drivers, and therefore they # can just be expanded. print(TRACERx_NEJM_2017_REVOLVER) # Plot plot(TRACERx_NEJM_2017_REVOLVER)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.