knitr::opts_chunk$set( collapse = TRUE, comment = "#>", crop = NULL ## Related to ## https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016656.html )
## Bib setup library("RefManageR") ## Write bibliography information bib <- c( R = citation(), BiocStyle = citation("BiocStyle")[1], knitr = citation("knitr")[1], RefManageR = citation("RefManageR")[1], rmarkdown = citation("rmarkdown")[1], sessioninfo = citation("sessioninfo")[1], testthat = citation("testthat")[1], ISAnalytics = citation("ISAnalytics")[1], ClonalTrackingPaper = BibEntry( bibtype = "Article", title = paste( "Efficient gene editing of human long-term hematopoietic", "stem cells validated by clonal tracking" ), author = "Ferrari Samuele, Jacob Aurelien, Beretta Stefano", journaltitle = "Nat Biotechnol 38, 1298–1308", date = "2020-11", doi = "https://doi.org/10.1038/s41587-020-0551-y" ) ) ngs_exp_fig <- fs::path("../man", "figures", "ngs_data_exp.png")
ISAnalytics is an R package developed to analyze gene therapy vector insertion sites data identified from genomics next generation sequencing reads for clonal tracking studies.
ISAnalytics
can be installed quickly in different ways:
devtools
There are always 2 versions of the package active:
RELEASE
is the latest stable versionDEVEL
is the development version, it is the most up-to-date version where
all new features are introducedRELEASE version:
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("ISAnalytics")
DEVEL version:
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") # The following initializes usage of Bioc devel BiocManager::install(version='devel') BiocManager::install("ISAnalytics")
RELEASE:
if (!require(devtools)) { install.packages("devtools") } devtools::install_github("calabrialab/ISAnalytics", ref = "RELEASE_3_17", dependencies = TRUE, build_vignettes = TRUE)
DEVEL:
if (!require(devtools)) { install.packages("devtools") } devtools::install_github("calabrialab/ISAnalytics", ref = "devel", dependencies = TRUE, build_vignettes = TRUE)
ISAnalytics
has a verbose option that allows some functions to print
additional information to the console while they're executing.
To disable this feature do:
# DISABLE options("ISAnalytics.verbose" = FALSE) # ENABLE options("ISAnalytics.verbose" = TRUE)
Some functions also produce report in a user-friendly HTML format, to set this feature:
# DISABLE HTML REPORTS options("ISAnalytics.reports" = FALSE) # ENABLE HTML REPORTS options("ISAnalytics.reports" = TRUE)
In the newer version of ISAnalytics, we introduced a "dynamic variables system",
to allow more flexibility in terms of input formats. Before starting with the
analysis workflow, you can specify how your inputs are structured so that
the package can process them. For more information on how to do this
take a look at vignette("workflow_start", package = "ISAnalytics")
.
The first steps of the analysis workflow involve the import and parsing of data and metadata files from disk.
import_association_file()
and/or
import_Vispa2_stats()
import_single_Vispa2Matrix()
or
import_parallel_Vispa2Matrices()
Refer to the vignette
vignette("workflow_start", package = "ISAnalytics")
for
more details.
ISAnalytics offers several different functions for cleaning and pre-processing your data.
compute_near_integrations()
outlier_filter()
remove_collisions()
, see also the dedicated vignette
vignette("workflow_start", package = "ISAnalytics")
purity_filter()
aggregate_values_by_key()
, aggregate_metadata()
, see also the
dedicated vignette
vignette("workflow_start", package = "ISAnalytics")
You can answer very different biological questions by using the provided functions with appropriate inputs.
sample_statistics()
compute_abundance()
, integration_alluvial_plot()
top_integrations()
top_targeted_genes()
CIS_grubbs()
,
CIS_volcano_plot()
gene_frequency_fisher()
, fisher_scatterplot()
, circos_genomic_density()
is_sharing()
, iss_source()
, sharing_heatmap()
,
sharing_venn()
HSC_population_size_estimate()
,
HSC_population_plot()
For more, please refer to the full function reference.
ISAnalytics is designed to be flexible concerning input formats, thus it is suited to process various kinds of data provided the correct dynamic configuration is set.
We demonstrate this with an example that uses barcodes data.
The matrix is publicly available here r Citep(bib[["ClonalTrackingPaper"]])
, metadata
was provided to us by the authors and it is available in the package
additional files.
library(ISAnalytics) # Set appropriate data and metadata specs ---- metadata_specs <- tibble::tribble( ~names, ~types, ~transform, ~flag, ~tag, "ProjectID", "char", NULL, "required", "project_id", "SubjectID", "char", NULL, "required", "subject", "Tissue", "char", NULL, "required", "tissue", "TimePoint", "int", NULL, "required", "tp_days", "CellMarker", "char", NULL, "required", "cell_marker", "ID", "char", NULL, "required", "pcr_repl_id", "SourceFileName", "char", NULL, "optional", NA_character_, "Link", "char", NULL, "optional", NA_character_ ) set_af_columns_def(metadata_specs) mandatory_specs <- tibble::tribble( ~names, ~types, ~transform, ~flag, ~tag, "BarcodeSeq", "char", NULL, "required", NA_character_ ) set_mandatory_IS_vars(mandatory_specs) # Files ---- data_folder <- tempdir() utils::unzip(zipfile = system.file("testdata", "testdata.zip", package = "ISAnalytics"), exdir = data_folder, overwrite = TRUE) meta_file <- "barcodes_example_af.tsv.xz" matrix_file <- "GSE144340_Matrix_542.tsv.xz" # Data import ---- af <- import_association_file(fs::path(data_folder, meta_file), report_path = NULL ) af matrix <- import_single_Vispa2Matrix(fs::path(data_folder, matrix_file), sample_names_to = "ID" ) matrix # Descriptive stats ---- desc_stats <- sample_statistics(matrix, af, sample_key = pcr_id_column(), value_columns = "Value" )$metadata |> dplyr::rename(distinct_barcodes = "nIS") desc_stats # Aggregation and new stats ---- agg_key <- c("SubjectID") agg <- aggregate_values_by_key(matrix, af, key = agg_key, group = "BarcodeSeq", join_af_by = pcr_id_column() ) agg agg_meta_functions <- tibble::tribble( ~Column, ~Function, ~Args, ~Output_colname, "TimePoint", ~ mean(.x, na.rm = TRUE), NA, "{.col}_avg", "CellMarker", ~ length(unique(.x)), NA, "distinct_cell_marker_count", "ID", ~ length(unique(.x)), NA, "distinct_id_count" ) agg_meta <- aggregate_metadata( af, aggregating_functions = agg_meta_functions, grouping_keys = agg_key ) agg_meta agg_stats <- sample_statistics(agg, agg_meta, sample_key = agg_key, value_columns = "Value_sum" )$metadata |> dplyr::rename(distinct_barcodes = "nIS") agg_stats # Abundance ---- abundance <- compute_abundance(agg, columns = "Value_sum", key = agg_key) abundance reset_dyn_vars_config()
The package provides a simple Shiny interface for data exploration and plotting. To start the interface use:
NGSdataExplorer()
The application main page will show a loading screen for a file. It is possible to load files also from the R environment, for example, before opening the app, we can load the included association file:
data("association_file")
Once in the application we can choose "association_file"
from the
R environment loading option screen and click on "Import data". Once
data is imported, we can click on the "Explore" tab in the upper navbar:
here we will see 2 tabs, one allows interactive exploration of data in tabular
form, in the other tab we can plot data. It is possible to customize several
different parameters for the plot and finally save it to file with the
dedicated button at the end of the page.
The Shiny interface is still currently under active development and new features will be added in the near future.
Several implemented functions produce static HTML reports that can be saved on disk, or tabular files. Reports contain the relevant information on how the function was called, inputs and outputs statistics, and session info for reproducibility.
ISAnalytics has it's dedicated package website where you can browse the documentation and vignettes easily, in addition to keeping up to date with all relevant updates. Visit the website at https://calabrialab.github.io/ISAnalytics/
If you have any issues the documentation can't solve, get in touch by opening an issue on GitHub or contacting the maintainers
## Print bibliography PrintBibliography(bib, .opts = list(hyperlink = "to.doc", style = "html"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.