In othomantegazza/fluidigr: Analize Fluidigm qPCR Data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Here, you will learn how to scale, inspect and visualize the gene expression data that you have parsed in the previous vignette (Load and Normalize the qPCR Output).

Setup

library(fluidgr)
library(magrittr)
library(dplyr)
library(stringr)
library(ggplot2)
library(forcats)
library(scales)
library(devtools)

Data

We provide the fluidigm_expr object, that contains normalized gene expression measurement.

This object is exactly the same at the one produced in the vignette (Load and Normalize the qPCR Output).

You can load it by just typing fluidigm_expr:

fluidigm_expr

Exploratory Data Analysis: Scale and Visualize

When we explore medium sized gene expression datasets, such as the one produced by qPCR, we often just concentrate on statistical testing and statistical relevance. But in this kind of datasets, you can observe many gene expression patterns and behaviour by Exploratory Data Analysis, when you iteratively scale and plot your data.

Scaling and plotting are tightly connected, because plots can help you figure out how scaling transforms your expression data; and appropriate scaling can help you detect useful expression patterns in your plots.

What variables are in your data?

Make the variables in your data explicit.

Often medium sized qPCR gene expression measurememtns are multivatiate. If you want to visualizalize them effectively, make all variables explicit. Also, keep in mind which variable you suspect that determines the response in your dataset and eventually what is the question that you are trying to anwser.

For example in the sample dataset we are measuring the expression of four genes in 4 developmental stages in 5 species. The column target_name records the "gene" variable, while the other two variables, developmental stage and species, are hidden in the sample_name variable. We can make them explicit.

The sample_name variable is encoded as "J-N1R1":

The first letter "J" is the species.
The letter and the number after the hyphen "N1" encode for the stage.
The last letter and number "R1" encode for the replicate.

# make variable explicit
fluidigm_expr <- 
  fluidigm_expr %>%
  # make species explicit
  mutate(species = str_split_fixed(string = sample_name,
                                   pattern = "-",
                                   n = 2)[, 1]) %>%
  # make stage explicit
  mutate(stage = str_sub(string = sample_name,
                         start = 3,
                         end = 4)) %>%
  # make replicate explicit
  mutate(replicate = str_sub(string = sample_name,
                         start = 5,
                         end = 6))

Scale your data

When you want want to explore gene expression, scaling is not always necessary, and often you can use, explore, visualize and communicate directly the normalized expression values; stored in the expression column of the fluidigm_expr dataset.

Indeed scaling is not always necessary, but, together with exploratory plots, scaling can help you detect patterns in your data.

Exploratory plots can help understand the effect that scaling has on expression data. This is why we explain scaling and visualisation of expression data together.

Scale with the `scale_fluidigm()` function

Fluidgr provides the scale_fluidigm() function, that scales expression values:

by z-score in one column (scaled_expression),
in range 0 to 1 in another(scaled_01_expression),

according to the groups provided in the argument .group.

This function takes as input the output of normalize() and returns the same dataset with the two extra column scaled_expression.

scaled_dat <- 
  fluidigm_expr %>% 
  scale_fluidigm(.group = target_name)

# inspect the output
scaled_dat %>% select(scaled_expression, scaled_01_expression) %>% summary()

Write your own scaling function

You can write your own scaling function, becaus scale_fluidigm() function is just a soft wrapper around dplyr's group_by() and mutate(). If you want to try different scaling methods, you can replicate and tweak the output of scale_fluidigm() with:

fluidigm_expr %>%
  group_by(target_name) %>%
  mutate(scaled_expression = scale(expression))

This should provide to you a good basis and enough freedom to explore and visualize your data as you prefer.

Visualize

Fluidigmr does not provide plotting function, because all the functions that you need to plot this kind of data are already available in ggplot2. Indeed the dataset output of is already in a tidy/gathered format.

Here we provide some ideas of how you can explore, plot and visualize your Fluidigm data.

Lineplot

A lineplot can be informative directly on normalized data, without scaling.

You can use facets to display multivariate data.

p <- fluidigm_expr %>%
  ggplot(aes(x = stage,
             y = expression)) +
  # plot each expression value into a point
  geom_point(size = 2,
               col = "#DA614D",
               alpha = .8) +
  # it might be complicated to add a line that connect means
  # across stages,
  # because stages are encoded as a categorical variable,
  # we must turn them into numeric
  stat_summary(aes(x = stage %>%
                     as.factor(.) %>%
                     as.numeric(.)),
               fun.y = mean,
               geom="line",
               size = 1.2,
               alpha = .7,
               linejoin = "round") +
  # implement facetting with independent y values
  # for each genes, since expression values are
  # not comparable across genes
  facet_grid(target_name ~ species,
               scales = "free_y") +
  theme_bw() 

p %>% print()

Some patterns in your data might be more evident if you plot them in log scale.

p_log <- 
  p + scale_y_log10()

p_log %>% print()

Also using heatmap with ggplot2

p_heat <- 
  fluidigm_expr %>%
  ggplot(aes(x = stage,
             y = target_name,
             fill = expression)) +
  geom_tile() +
  facet_grid(. ~ species) +
  scale_fill_viridis_c()

p_heat %>% print()

Heatmaps

Sometime heatmap don't convey muich invormation if you don't scale your data.

For example we can scale every gene on z-score to make them comparable.

scaled_gene <- 
  fluidigm_expr %>% 
  group_by(target_name) %>%
  mutate(scaled_expression = expression %>%
           rescale(from = range(.), to = c(0,1)))

Check if the heatmap has improved

remeber to map the colours to the new scaled_expression values

p_heat <- 
  scaled_gene %>%
  ggplot(aes(x = stage,
             y = species,
             fill = scaled_expression)) +
  geom_tile() +
  facet_grid(. ~ target_name) +
  scale_fill_viridis_c()

p_heat %>% print()

We can try to set more groups.

scaled_dat <- 
  fluidigm_expr %>%
  group_by(target_name, species) %>%
  mutate(scaled_expression = expression %>%
           rescale(from = range(.), to = c(0,1)))

Check if the heatmap has improved

remeber to map the colours to the new scaled_expression values

p_heat <- 
  scaled_dat %>%
  ggplot(aes(x = stage,
             y = species,
             fill = scaled_expression)) +
  geom_tile() +
  facet_grid(. ~ target_name) +
  scale_fill_viridis_c()

p_heat %>% print()