The VIQCing (Visualization, Imputation, Quality Control) package has been made to help processing Metabolomic Data. It contains a robust pipeline that can be adapted depending on the input type and it needs.
Source code can be downloaded from github.
git clone https://github.com/AurelieGuilbault/VIQCing.git
You can install in Rstudio using:
devtools::install_github("AurelieGuilbault/VIQCing")
The data used as input usually is a LC-MS or NMR Feature Matrix, with the rows as metabolites and columns as samples. That said, as long as it looks like the following dataset, the data can be used with this package: (From the datasets::swiss)
d <- t(datasets::swiss) dat <- data.frame(dummyCompound=row.names(d), dummyMetabolite=NA, d, row.names = NULL) dat[,1:5]
If you data contains NA values, you can filter the row with the qualityControl() function.
Produces the following files:
1. "QC_data.txt": Summary of the QC;
2. "REMOVED_QC_data.txt": Summary of the removed metabolite;
3. output, cleaned dataset file(optional);
4. "REMOVED_output.txt", the removed set of metabolite (optional)
It will also warn you about potential problems. (e.g. if a line has sd of 0 or NA) The function returns the cleaned dataset and QC summary.
#Write the data in a file write.table(dat, file = "dummySet.txt",sep = "\t", row.names = FALSE) result <- VIQCing::qualityControl("dummySet.txt", missing=0.2, compound=1, metabolite=2, sampleStart = 3) result$dataset[,1:5] result$QC
If we had a line with only NA, another one with all the same value and a duplicate :
newDat <- result$dataset newDat[nrow(newDat)+1,] <- 0 newDat[nrow(newDat)+1,] <- 1
#Write the data in a file write.table(newDat, file = "falsedummySet.txt",sep = "\t", row.names = FALSE) result <- VIQCing::qualityControl("falsedummySet.txt", missing=1, compound=1, metabolite=2, sampleStart = 3) result$dataset[,1:5] result$QC
# Use the customisation output function: VIQCing::QCcustomization(result$QC, REMOVE=FALSE)
Impute the given dataset with different method options. Produces filename_imputed.txt, containing the imputed dataset;
Available imputation methods:
returns the imputed Dataset
If we create some holes in the previous dataset:
holeDataset <- read.table("holesdummySet.txt", sep="\t", header = TRUE) holeDataset[,1:5]
result <- VIQCing::imputation("holesdummySet.txt", method = "SVD", transformation = "scale", compound = 1, metabolite = 2, sampleStart = 3) result[, 1:5]
You can use the NRMSE function to evaluate the accuracy of the imputation:
# Only input the Samples, not the compound/metabolite columns VIQCing::NRMSE(result[,3:dim(result)[2]], dat[,3:dim(dat)[2]])
You can also use the imputationTest() function, which will produce a more complete output of the NRMSE when asked. It is advised to test most of the methods and transformation on your own datasets to determine the optimal imputation method.
VIQCing::imputationTest("dummySet.txt", method="SVD", transformation = "scale", nbTest=15, sampleStart = 3)
It is possible to visualize the distribution of your metabolomic data with Violin Plots. It will produce a .pdf file.
VIQCing::violinPlotQC("dummySet.txt", na=TRUE, compound=1, metabolite=2, sampleStart = 3)
You can use violinPlotImp() to compare the distribution of your data before and after imputation:
VIQCing::violinPlotImp("holesdummySet.txt", "holesdummySet_imputed.txt",na=TRUE, compound=1, metabolite=2, sampleStart = 3, compoundImp = 1, metaboliteImp = 2, sampleStartImp = 3)
It is possible to build a correlation matrix and its associated correlation tree for the given dataset. Both plots are optional and the correlation test can be decided.
The function also produces:
1. "filename.pdf", containing the asked plots;
2. "filname_pairs.txt", containing the correlation pairs
returns the correlation matrix "r" and the p-value matrix "P"
VIQCing::corMatrix("dummySet.txt", na=TRUE, compound=1, metabolite=2, sampleStart=3, testType="spearman", textSize = 5)
Output file "dummySet_pairs.txt" :
read.table("dummySet_pairs.txt", sep = "\t", header = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.