knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
Make sure you did all the previous steps explained in the README file. Install the qpcrviia7 package and the other required packages. Make sure your data file and annotation files are correct. You can use this example as a template to make your own script for analysis. Rmarkdown makes it possible to run the analysis in chunks to understand the individual steps better and can knit everything together in an html file.
library(qpcrviia7) library(readxl) library(dplyr) library(tidyr) library(ggplot2) library(ggthemes) library(RColorBrewer) library(Hmisc)
Data needs to be imported into the R session, we will use our own functions for this matter. read_qpcr() reads the raw excel file obtained from the Viia7 after qPCR and read_annotation reads our annotation files.
qpcr <- read_qpcr("data/2017-10-05 FAP-C2C12-Muscle qPCR validation MC.xls")
ann_samples <- read_annotation("data/Annotations_samples_124839_technical_replicates.xlsx") ann_genes <- read_annotation("data/Annotations_primers_124839_technical_replicates.xlsx")
Before merging our data and annotation files we first check the quality of our technical replicates.
This function prints a list of all technical replicates above a certain standard deviation. The standard deviation is put to 0.4 but you change it inside the function under argument threshold.
list_bad_tech_rep(qpcr, threshold = 0.4)
After setting the right threshold we merge all our technical replicates together by taking the average with qc_tech_rep(). The standard deviation is put to 0.4 but you change it inside the function under argument threshold.
qpcr <- qc_tech_rep(qpcr, threshold = 0.4) #remove Avg from col names for further functions colnames(qpcr) <- c("Target", "Sample", "CT", "SD_CT", "Tm1", "SD_Tm1", "Tm2", "SD_Tm2")
After loading in the files we can see that they are still untidy. We will merge all files into one big dataframe and change the undetermined values in the CT column to NA so that our data is ready for analysis.
join_and_clean_qpcr(qpcr)
To start we plot all CT values to check the distribution and spot eventual outliers.
par(mfrow=c(1,1)) hist(qpcr$CT, main = "CT values")
We plot the CT values per Sample to check the distribution per Sample and to spot outliers. You should also be able to see the blanc sample or if the blanc was totally not contaminated you won't be able to see it. Bad Samples can be removed with remove_sample(). Check the minimal level of CT values to select it later in the function set_min_max_CT(). The value of the blanc sample can be used to set the maximum level of CT.
ggplot(qpcr, aes(Sample, CT)) + geom_boxplot() + theme(axis.text.x = element_text(angle = 90)) + ggtitle("CT values per sample")
We plot the CT values per Gene to check the distribution per Gene and to spot outliers. Bad Genes can be removed with remove_primer().
ggplot(qpcr, aes(Gene, CT)) + geom_boxplot() + theme(axis.text.x = element_text(angle = 90)) + ggtitle("CT values per primer")
After we checked the blanc we don't need it anymore in further analysis so it needs to be removed. We use the function remove_blanc() for this.
qpcr <- remove_blanc(qpcr) hist(qpcr$CT, main = "CT after removal blanc")
Now we need to seperate our data into a set of housekeeping genes and endogenous genes. Housekeeping genes will be saved in a dataframe called hkg and the endogenous genes in a dataframe called endog. We need to this to be able to run the qc_hkg() function on the housekeeping genes.
split_genes(qpcr)
Undetermined CT values are also biological information, with set_min_max_CT() we put the lowest CT values at 10 and the highest at 40. Every CT that is below 10 or undetermined will become 40.
endog <- set_min_max_CT(endog, CT = "CT", 10, 40) hist(endog$CT, main = "CT after set baseline")
Print a list of all genes with double meltcurve peaks with list_double_meltcurves()
list_double_meltcurves(endog)
Remove bad meltcurves with remove_bad_meltcurves()
remove_bad_meltcurves()
Now we will perform some quality control measures on our housekeeping genes. We recommend to use three housekeeping genes that show stable expression accross samples. the qc_hkg() function will check the distrubtion of CT values between samples of all housekeeping genes seperately. Here, we can see if the variability of any of the housekeeping genes is too high.
qc_hkg(hkg)
In this example Rpl13a shows a binomial expression throughout the dataset. This is because we compare cell lines in vitro with skeletal muscle. Therefore, we remove this housekeeping gene with remove_primer()
hkg <- remove_primer(hkg, "Rpl13a")
When we assessed the quality of our housekeeping genes seperately it's time to take the average of these genes.
mean_hkg <- calculate_mean_hkg(hkg)
After taking the average we plot quality control measures again to check for outliers.
qc_mean_hkg(mean_hkg)
Outliers are defined by boxplots, every sample that is outside 1.5 * IQR is regarded as an outliers and can skew analysis of that data. Therefore, we remove these with the function remove_outliers()
mean_hkg$CT_avg_hkg <- remove_outliers(mean_hkg$CT_avg_hkg)
In this example new outliers pop up after a first outlier removal. It is possible to rerun qc_mean_hkg() and remove_outliers() to remove additional outliers.
qc_mean_hkg(mean_hkg) mean_hkg$CT_avg_hkg <- remove_outliers(mean_hkg$CT_avg_hkg) shapiro.test(mean_hkg$CT_avg_hkg)
When quality control is done we can normalize over housekeeping genes. So far this package can calculate Delta CT and Relative expression compared to housekeeping genes. Functions calculating fold change will be incorporated soon.
The function calculate_DCT calculates Delta CT and relative expression. Make sure to select your annotation columns that are important for plotting! Fill them in under the cols argument of calculate_DCT(). In this example we selected Cell_type, "Condition" and "Mouse".
calculate_DCT(endog, mean_hkg, cols = c("Cell_type", "Condition", "Mouse"))
Make a summary plot of the Delta CT values of all genes and samples together. Therefore, you need to fill in the x.var, y.var and col.var variable in function plot_scatter(). As x.var we use Cell_type to plot the different cell types over the x axis, y.var is DCt to plot Delta CT values on the y axis and col.var is Condition this is because we are comparing between healthy and dystrohpic cells. The function automatically makes a plot per Gene in the data set.
plot_scatter(endog, x.var = "Cell_type", y.var = "DCt", col.var = "Condition", theme = 8)
You can also plot a specific gene with plot_scatter_per_gene(), you can select the gene under the gene argument.
plot_scatter_per_gene(endog, gene = "Dcn", x.var = "Cell_type", y.var = "DCt", col.var = "Condition", theme = 10)
We can also plot the relative expression towards the housekeeping genes. We generate the same plots as under Delta CT. The only difference is that we use rel_expr as argument for y.var.
plot_scatter(endog, x.var = "Cell_type", y.var = "rel_expr", col.var = "Condition", theme = 8) + labs(title = "Scatterplot all genes", y = "Relative expression", x = "Cell type")
plot_scatter_per_gene(endog, gene = "Dcn", x.var = "Cell_type", y.var = "rel_expr", col.var = "Condition", theme = 10)
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.