Lifecycle:maturing

Brings SummarizedExperiment to the tidyverse!

website: stemangiola.github.io/tidySummarizedExperiment/

Please also have a look at

library(knitr)
knitr::opts_chunk$set(warning=FALSE, message=FALSE)

Introduction

tidySummarizedExperiment provides a bridge between Bioconductor SummarizedExperiment [@morgan2020summarized] and the tidyverse [@wickham2019welcome]. It creates an invisible layer that enables viewing the Bioconductor SummarizedExperiment object as a tidyverse tibble, and provides SummarizedExperiment-compatible dplyr, tidyr, ggplot and plotly functions. This allows users to get the best of both Bioconductor and tidyverse worlds.

Functions/utilities available

SummarizedExperiment-compatible Functions | Description ------------ | ------------- all | After all tidySummarizedExperiment is a SummarizedExperiment object, just better

tidyverse Packages | Description ------------ | ------------- dplyr | Almost all dplyr APIs like for any tibble tidyr | Almost all tidyr APIs like for any tibble ggplot2 | ggplot like for any tibble plotly | plot_ly like for any tibble

Utilities | Description ------------ | ------------- as_tibble | Convert cell-wise information to a tbl_df

Installation

if (!requireNamespace("BiocManager", quietly=TRUE)) {
      install.packages("BiocManager")
  }

BiocManager::install("tidySummarizedExperiment")

From Github (development)

devtools::install_github("stemangiola/tidySummarizedExperiment")

Load libraries used in the examples.

library(ggplot2)
library(tidySummarizedExperiment)

Create tidySummarizedExperiment, the best of both worlds!

This is a SummarizedExperiment object but it is evaluated as a tibble. So it is fully compatible both with SummarizedExperiment and tidyverse APIs.

pasilla_tidy <- tidySummarizedExperiment::pasilla 

It looks like a tibble

pasilla_tidy

But it is a SummarizedExperiment object after all

assays(pasilla_tidy)

Tidyverse commands

We can use tidyverse commands to explore the tidy SummarizedExperiment object.

We can use slice to choose rows by position, for example to choose the first row.

pasilla_tidy %>%
    slice(1)

We can use filter to choose rows by criteria.

pasilla_tidy %>%
    filter(condition == "untreated")

We can use select to choose columns.

pasilla_tidy %>%
    select(.sample)

We can use count to count how many rows we have for each sample.

pasilla_tidy %>%
    count(.sample)

We can use distinct to see what distinct sample information we have.

pasilla_tidy %>%
    distinct(.sample, condition, type)

We could use rename to rename a column. For example, to modify the type column name.

pasilla_tidy %>%
    rename(sequencing=type)

We could use mutate to create a column. For example, we could create a new type column that contains single and paired instead of single_end and paired_end.

pasilla_tidy %>%
    mutate(type=gsub("_end", "", type))

We could use unite to combine multiple columns into a single column.

pasilla_tidy %>%
    unite("group", c(condition, type))

We can also combine commands with the tidyverse pipe %>%.

For example, we could combine group_by and summarise to get the total counts for each sample.

pasilla_tidy %>%
    group_by(.sample) %>%
    summarise(total_counts=sum(counts))

We could combine group_by, mutate and filter to get the transcripts with mean count > 0.

pasilla_tidy %>%
    group_by(.feature) %>%
    mutate(mean_count=mean(counts)) %>%
    filter(mean_count > 0)

Plotting

my_theme <-
    list(
        scale_fill_brewer(palette="Set1"),
        scale_color_brewer(palette="Set1"),
        theme_bw() +
            theme(
                panel.border=element_blank(),
                axis.line=element_line(),
                panel.grid.major=element_line(size=0.2),
                panel.grid.minor=element_line(size=0.1),
                text=element_text(size=12),
                legend.position="bottom",
                aspect.ratio=1,
                strip.background=element_blank(),
                axis.title.x=element_text(margin=margin(t=10, r=10, b=10, l=10)),
                axis.title.y=element_text(margin=margin(t=10, r=10, b=10, l=10))
            )
    )

We can treat pasilla_tidy as a normal tibble for plotting.

Here we plot the distribution of counts per sample.

pasilla_tidy %>%
    ggplot(aes(counts + 1, group=.sample, color=`type`)) +
    geom_density() +
    scale_x_log10() +
    my_theme

Session Info

sessionInfo()

References



stemangiola/tidySE documentation built on June 2, 2024, 9:51 a.m.