The PC-PR2 is a statistical method, developed by Fages & Ferrari et al. (1), for investigating sources of variability in metabolomics or other omics data (1,2). In brief, it combines features of principal component and multivariable linear regression analyses to estimate the relative effects of metadata variables upon a large matrix of omics measurements.
The pcpr2
R package has been created to simplify the analysis so it can easily be incorporated into larger workflows. To execute the analysis, a complete matrix of omics data, denoted X, and a corresponding table of subject metadata, denoted Z, are passed to the function runPCPR2()
. The main output is the proportion of variation in the omics data attributed to each Z-variable, expressed as Rpartial2. A barplot of these values can be quickly generated by passing the output to plot()
.
A sample of a transcriptomics dataset is included as test data. This consists of a matrix of 3000 transcriptomics intensities and five descriptive Z-variables (two categorical, three numeric) for 124 subjects.
To start, update all installed R packages.
update.packages()
The devtools
package is needed to install development packages from GitHub. If it is not already installed on your system, install it from CRAN. Now install pcpr2
as follows:
library(devtools)
install_github("JoeRothwell/pcpr2")
You may be prompted to update other packages.
PC-PR2 is performed using the function runPCPR2()
which outputs an object of class pcpr2
including partial R2 values for each covariate. The variability in the omics data desired to be explained can be set with the argument pct.threshold
, which is optional and defaults to 0.8.
The package example is run as follows:
library(pcpr2)
output <- runPCPR2(transcripts, Z_metadata, pct.threshold = 0.8)
output$pR2
sex height weight smoking.status age.sample R2
1.24647643 2.48569520 0.10218837 2.94946793 0.03072886 4.91513509
For detailed output, use summary()
.
summary(output)
To generate a barplot of the results, pass the pcpr2 object to plot()
. The default is grey bars and no title, but the plot can be customised using other barplot()
arguments.
plot(output, col = "red", main = "Variability in transcriptomics data explained by covariates")
(1) Fages & Ferrari et al. (2014) Investigating sources of variability in metabolomic data in the EPIC study: the Principal Component Partial R-square (PC-PR2) method. Metabolomics 10(6): 1074-1083, DOI: 10.1007/s11306-014-0647-9
(2) Perrier et al. (2018) Identifying and correcting epigenetics measurements for systematic sources of variation. Clin Epigenetics 21(10): 38, DOI: 10.1186/s13148-018-0471-6
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.