Comparison with base R

Lifecycle:maturing

library(knitr)
knitr::opts_chunk$set(cache = TRUE, warning = FALSE,
                      message = FALSE, cache.lazy = FALSE)

library(dplyr)
library(tidyr)
library(tibble)
library(magrittr)
library(ggplot2)
library(ggrepel)
library(tidybulk)


my_theme =  
    theme_bw() +
    theme(
        panel.border = element_blank(),
        axis.line = element_line(),
        panel.grid.major = element_line(size = 0.2),
        panel.grid.minor = element_line(size = 0.1),
        text = element_text(size=12),
        legend.position="bottom",
        aspect.ratio=1,
        strip.background = element_blank(),
        axis.title.x  = element_text(margin = margin(t = 10, r = 10, b = 10, l = 10)),
        axis.title.y  = element_text(margin = margin(t = 10, r = 10, b = 10, l = 10))
    )

In this article we show some examples of the differences in coding between tidybulk/tidyverse and base R. We noted a decrease > 10x of assignments and a decrease of > 2x of line numbers.

Create tidybulk tibble.

tt = counts_mini %>% tidybulk(sample, transcript, count)

Aggregate duplicated transcripts

Tidy transcriptomics wzxhzdk:2
Base R wzxhzdk:3

Scale counts

Tidy transcriptomics wzxhzdk:4
Base R wzxhzdk:5

Filter variable transcripts

We may want to identify and filter variable transcripts.

Tidy transcriptomics wzxhzdk:6
Base R wzxhzdk:7

Reduce dimensions

Tidy transcriptomics wzxhzdk:8
Base R wzxhzdk:9

PCA

Tidy transcriptomics

tt.norm.PCA =
  tt.norm %>%
  reduce_dimensions(method="PCA", .dims = 2)

Base R

count_m_log = log(count_m + 1)
pc = count_m_log %>% prcomp(scale = TRUE)
variance = pc$sdev^2
variance = (variance / sum(variance))[1:6]
pc$cell_type = counts[
    match(counts$sample, rownames(pc)),
    "Cell type"
]

tSNE

Tidy transcriptomics

tt.norm.tSNE =
    breast_tcga_mini %>%
    tidybulk(       sample, ens, count_scaled) %>%
    identify_abundant() %>%
    reduce_dimensions(
        method = "tSNE",
        perplexity=10,
        pca_scale =TRUE
    )

Base R

count_m_log = log(count_m + 1)

tsne = Rtsne::Rtsne(
    t(count_m_log),
    perplexity=10,
        pca_scale =TRUE
)$Y
tsne$cell_type = tidybulk::counts[
    match(tidybulk::counts$sample, rownames(tsne)),
    "Cell type"
]

Rotate dimensions

Tidy transcriptomics wzxhzdk:14
Base R wzxhzdk:15

Test differential abundance

Tidy transcriptomics wzxhzdk:16
Base R wzxhzdk:17

Adjust counts

Tidy transcriptomics wzxhzdk:18
Base R wzxhzdk:19

Deconvolve Cell type composition

Tidy transcriptomics wzxhzdk:20
Base R wzxhzdk:21

Cluster samples

k-means

Tidy transcriptomics wzxhzdk:22
Base R wzxhzdk:23

SNN

Tidy transcriptomics wzxhzdk:24
Base R wzxhzdk:25

Drop redundant transcripts

Tidy transcriptomics wzxhzdk:26
Base R wzxhzdk:27

Draw heatmap

tidytranscriptomics wzxhzdk:28
Base R wzxhzdk:29

Draw density plot

tidytranscriptomics wzxhzdk:30
Base R wzxhzdk:31

Appendix

sessionInfo()


Try the tidybulk package in your browser

Any scripts or data that you put into this service are public.

tidybulk documentation built on April 7, 2021, 6 p.m.