knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" ) # test data library(dplyr) library(isoswitch) seurat <- readRDS("/data/10x_data/10x_5psanger/output/adult50.rds") switch_markers <- read.table("/data/10x_data/10x_5psanger/d537/output/06-isoswitch_markers_meta.csv", sep=",")
The goal of isoswitch is to facilitate the characterization of isoform expression
in long-read single-cell datasets. It includes a set of functions and reports
built on top of Seurat
, ggplot
and rmarkdown
that can be used to search,
visualize and document isoform expression patterns, and particularly isoform
switches between cell identities.
You can install the development version of isoswitch from GitHub with:
# install.packages("devtools") devtools::install_github("atienza-ipmc/isoswitch")
Below is a short overview of the package functionality:
Isoswitch is designed to work with Seurat
objects with gene- and isoform-level counts.
In particular, the ScNaUmi-seq protocol (Lebrigand et al 2020) generates two
count matrices that need to be loaded into respective Seurat assays before starting
the analysis, more specifically:
Note
isoswith expects the row names of the isoform count matrix to follow the format "[gene_name]..[transcript_id]", example: "BCS1L..ENST00000359273"
head(rownames(seurat@assays$multi@counts))
After loading up gene and isoform counts on the Seurat object, the method iso_preprocess()
prepares the isoform matrix by removing low-expression transcripts as well as removing
single-isoform genes which are irrelevant for the isoform switch analysis.
The resulting "multi-isoform" matrix is stored as a new assay on the input Seurat object.
seurat <- iso_preprocess(seurat, assay="ISO", new_assay="multi", filter_threshold=5)
The method iso_compute_stats()
parses the isoform raw count matrix returning
a data frame with stats on the expression of each transcript
feature
, gene_id
, transcript_id
gene/transcript identifierssum
Total UMI counts for the transcripttotal_gene
Total UMI counts for the genen_isofs
Number of distinct isoforms detected for the genemax_sum
Max sum value for the isoform with the highest expressionperc
Relative percentage of isoform UMI count vs gene total.is_major
(Boolean) is this considered a major isoformis_top
(Boolean) is this highest expressed isoformstats <- iso_compute_stats(seurat@assays$multi@counts) %>% arrange(gene_id) head(stats, n=4)
The method plot_assay_stats()
builds on this data to plot a summary with number
of genes, number of transcripts, distribution of isoforms and number of genes per
cell type that can be used to describe succintly the isoform distribution in the
dataset.
plot_assay_stats(seurat, "isoform")
The term “isoform switch” refers to an event where two isoforms of the same gene are considered markers of different clusters.
The marker search is implemented on the method ISO_SWITCH_ALL()
. Any extra
parameters are passed on to the underlying Seurat's FindMarkers call to fine tune
the search space.
clusters <- levels(seurat@active.ident) switch_markers <- ISO_SWITCH_ALL(seurat, clusters, assay="isoform", min.pct=0, logfc.threshold=0.40)
ISO_SWITCH_ALL()
returns a data frame of transcripts identified as markers of
a given cluster for a given gene, one transcript per row.
To facilitate the graphical interpretation of the results:
plot_marker_matrix()
produces a heatmap of number of unique genes per contrast between clusters plot_marker_score()
produces a volcano plot showing p-values and average logFC for each gene with an isoform switch pl1 <- plot_marker_matrix(seurat, switch_markers) pl2 <- plot_marker_score(adult, switch_markers, facet=FALSE, overlaps=16) pl1 | pl2
plot_marker_score()
can also plot individual volcano plots for each cluster
analyzed with the parameter facet=TRUE
plot_marker_score(adult, switch_markers, facet=TRUE, ncol=3)
The method compute_switches()
and gene_switch_table()
compute isoform
switchs from a list of markrs and return a data.frame and an html table respectively
(only compatible with html output documents such as Rmd reports)
switches <- compute_switches(switch_markers) gene_switch_table(switch_markers)
After identifying genes of interest, twere are two ways of drilling down and producing gene-level reports:
isoswitch_report()
method produces a compact plot of the gene. This function requires
an extract from a gene annotation file and transcript metadata, documented hereisoswitch_report(seurat, "isoform", gene="HYAL2", marker_list=switch_markers, gtf_df, transcript_metadata)
render_html_gene_report()
method renders an html version of the reportAdd the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.