knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
Make sure you did all the previous steps explained in the README file. Install the qpcrviia7 package and the other required packages. Make sure your data file and annotation files are correct. You can use this example as a template to make your own script for analysis. Rmarkdown makes it possible to run the analysis in chunks to understand the individual steps better and can knit everything together in an html file.
We first need to load the packages that we will use into our session. Make sure you've installed all packages before else you will get an error.
library(qpcrviia7) library(readxl) library(dplyr) library(tidyr) library(ggplot2) library(ggthemes) library(RColorBrewer)
Data needs to be imported into the R session, we will use our own functions for this matter. read_qpcr() reads the raw excel file obtained from the Viia7 after qPCR and read_annotation reads our annotation files.
qpcr <- read_qpcr("data/2017-09-20 124839 FAP-C2C12-Muscle qPCR validation MC.xls", sheet = 3)
ann_samples <- read_annotation("data/Annotations_124839_samples.xlsx") ann_genes <- read_annotation("data/Annotations_124839_primers.xlsx")
After loading in the files we can see that they are still untidy. We will merge all files into one big dataframe and change the undetermined values in the CT column to NA so that our data is ready for analysis.
join_and_clean_qpcr(qpcr)
To start we plot all CT values to check the distribution and spot eventual outliers
par(mfrow=c(1,1)) hist(qpcr$CT, main = "CT values")
We plot the CT values per Sample to check the distribution per Sample and to spot outliers. You should also be able to see the blanc sample or if the blanc was totally not contaminated you won't be able to see it. Bad Samples can be removed with remove_sample(). Check the minimal level of CT values to select it later in the function set_min_max_CT(). The value of the blanc sample can be used to set the maximum level of CT.
ggplot(qpcr, aes(Sample, CT)) + geom_boxplot() + theme(axis.text.x = element_text(angle = 90)) + ggtitle("CT values per sample")
We plot the CT values per Gene to check the distribution per Gene and to spot outliers. Bad Genes can be removed with remove_primer().
ggplot(qpcr, aes(Gene, CT)) + geom_boxplot() + theme(axis.text.x = element_text(angle = 90)) + ggtitle("CT values per primer")
After we checked the blanc we don't need it anymore in further analysis so it needs to be removed. We use the function remove_blanc() for this.
qpcr <- remove_blanc(qpcr) hist(qpcr$CT, main = "CT after removal blanc")
Now we need to seperate our data into a set of housekeeping genes and endogenous genes. Housekeeping genes will be saved in a dataframe called hkg and the endogenous genes in a dataframe called endog. We need to this to be able to run the qc_hkg() function on the housekeeping genes.
split_genes(qpcr)
Undetermined CT values is also biological information, with set_min_max_CT() we put the lowest CT values at 10 and the highest at 40. Every CT that is below 10 or undetermined will become 40.
endog <- set_min_max_CT(endog, 10, 40) hist(endog$CT, main = "CT after set baseline")
Print a list of all genes with double meltcurve peaks with list_double_meltcurves().
list_double_meltcurves(endog)
Remove bad meltcurves with remove_bad_meltcurves().
remove_bad_meltcurves()
Now we will perform some quality control measures on our housekeeping genes. We recommend to use three housekeeping genes that show stable expression accross samples. the qc_hkg() function will check the distrubtion of CT values between samples of all housekeeping genes seperately. Here, we can see if the variability of any of the housekeeping genes is too high.
qc_hkg(hkg)
In this example Rpl13a shows a binomial expression throughout the dataset. This is because we compare cell lines in vitro with skeletal muscle. Therefore, we remove this housekeeping gene with remove_primer().
remove_primer(endog, "Rpl13a")
When we assessed the quality of our housekeeping genes seperately it's time to take the average of these genes.
mean_hkg <- calculate_mean_hkg(hkg)
After taking the average we plot quality control measures again to check for outliers.
qc_mean_hkg(mean_hkg)
Outliers are defined by boxplots, every sample that is outside 1.5 * IQR is regarded as an outliers and can skew analysis of that data. Therefore, we remove these with the function remove_outliers().
mean_hkg$CT_avg_hkg <- remove_outliers(mean_hkg$CT_avg_hkg)
When quality control is done we can normalize over housekeeping genes. So far this package can calculate Delta CT and Relative expression compared to housekeeping genes. Functions calculating fold change will be incorporated soon.
The function calculate_DCT calculates Delta CT and relative expression. Make sure to select your annotation columns that are important for plotting! Fill them in under the cols argument of calculate_DCT(). In this example we selected Cell_type, "Condition" and "Mouse".
calculate_DCT(endog, mean_hkg, cols = c("Cell_type", "Condition", "Mouse"))
Make a summary plot of the Delta CT values of all genes and samples together. Therefore, you need to fill in the x.var, y.var and col.var variable in function plot_scatter(). As x.var we use Cell_type to plot the different cell types over the x axis, y.var is DCt to plot Delta CT values on the y axis and col.var is Condition this is because we are comparing between healthy and dystrohpic cells. The function automatically makes a plot per Gene in the data set.
plot_scatter(endog, x.var = "Cell_type", y.var = "DCt", col.var = "Condition", theme = 8)
You can also plot a specific gene with plot_scatter_per_gene(), you can select the Gene under the variable gene.
plot_scatter_per_gene(endog, "Dcn", x.var = "Cell_type", y.var = "DCt", col.var = "Condition", theme = 10) + labs(y = "Delta CT") + scale_fill_discrete(name = "Cell type") + scale_x_discrete(labels = c("C2C12 GFP" = "C2C12", "FAP" = "FAP", "TA" = "Tibialis\nanterior"))
We can also plot the relative expression towards the housekeeping genes. We generate the same plots as under Delta CT. The only difference is that we use rel_expr as argument for y.var.
plot_scatter(endog, x.var = "Cell_type", y.var = "rel_expr", col.var = "Condition", theme = 8)
plot_scatter_per_gene(endog, "Dcn", x.var = "Cell_type", y.var = "rel_expr", col.var = "Condition", theme = 10) + labs(y = "Relative expression") + scale_fill_discrete(name = "Cell type") + scale_x_discrete(labels = c("C2C12 GFP" = "C2C12", "FAP" = "FAP", "TA" = "Tibialis\nanterior"))
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.