Introduction

The VIQCing (Visualization, Imputation, Quality Control) package has been made to help processing Metabolomic Data. It contains a robust pipeline that can be adapted depending on the input type and it needs.

Download

Source code can be downloaded from github.

git clone https://github.com/AurelieGuilbault/VIQCing.git

You can install in Rstudio using:

devtools::install_github("AurelieGuilbault/VIQCing") 

Tutorial

The data used as input usually is a LC-MS or NMR Feature Matrix, with the rows as metabolites and columns as samples. That said, as long as it looks like the following dataset, the data can be used with this package: (From the datasets::swiss)

d <- t(datasets::swiss)
dat <- data.frame(dummyCompound=row.names(d), dummyMetabolite=NA, d, row.names = NULL)
dat[,1:5]

QC Functions

quality control

If you data contains NA values, you can filter the row with the qualityControl() function.
Produces the following files:
1. "QC_data.txt": Summary of the QC;
2. "REMOVED_QC_data.txt": Summary of the removed metabolite;
3. output, cleaned dataset file(optional);
4. "REMOVED_output.txt", the removed set of metabolite (optional)

It will also warn you about potential problems. (e.g. if a line has sd of 0 or NA) The function returns the cleaned dataset and QC summary.

#Write the data in a file
write.table(dat, file = "dummySet.txt",sep = "\t", row.names = FALSE)

result <- VIQCing::qualityControl("dummySet.txt", missing=0.2, compound=1, metabolite=2, sampleStart = 3)

result$dataset[,1:5]
result$QC

If we had a line with only NA, another one with all the same value and a duplicate :

newDat <- result$dataset
newDat[nrow(newDat)+1,] <- 0 
newDat[nrow(newDat)+1,] <- 1
#Write the data in a file
write.table(newDat, file = "falsedummySet.txt",sep = "\t", row.names = FALSE)

result <- VIQCing::qualityControl("falsedummySet.txt", missing=1, compound=1, metabolite=2, sampleStart = 3)

result$dataset[,1:5]
result$QC

Output customization

# Use the customisation output function:
VIQCing::QCcustomization(result$QC, REMOVE=FALSE)

Imputation

Imputation of incomplete dataset

Impute the given dataset with different method options. Produces filename_imputed.txt, containing the imputed dataset;

Available imputation methods:

  1. "knn": From the impute package, use the k nearest neighboors to impute the values;
  2. "RF": From the missForest package, use RandomForest algorithm to impute the values;
  3. "QRILC": From the imputeLCMD package, use Quantile regression to impute the values;
  4. "SVD": From the pcaMethods package, use SVDimpute algorithm as proposed by Troyanskaya et al, 2001. to impute the values;
  5. "mean","median", ""median", "0", "HM": simple value replacement, either by the mean, median, 0 of Half minimum of the row;

returns the imputed Dataset

If we create some holes in the previous dataset:

holeDataset <- read.table("holesdummySet.txt", sep="\t", header = TRUE)
holeDataset[,1:5]
result <- VIQCing::imputation("holesdummySet.txt", method = "SVD", transformation = "scale", compound = 1, metabolite = 2, sampleStart = 3)
result[, 1:5]

Imputation evaluation

You can use the NRMSE function to evaluate the accuracy of the imputation:

# Only input the Samples, not the compound/metabolite columns
VIQCing::NRMSE(result[,3:dim(result)[2]], dat[,3:dim(dat)[2]])

You can also use the imputationTest() function, which will produce a more complete output of the NRMSE when asked. It is advised to test most of the methods and transformation on your own datasets to determine the optimal imputation method.

VIQCing::imputationTest("dummySet.txt", method="SVD", transformation = "scale", nbTest=15, sampleStart = 3)

Data Visualization

Violin Plots

Violin Plot on starting data

It is possible to visualize the distribution of your metabolomic data with Violin Plots. It will produce a .pdf file.

VIQCing::violinPlotQC("dummySet.txt", na=TRUE, compound=1, metabolite=2, sampleStart = 3)

ViolinPlotQC

Violin Plot to compare imputed data with original data

You can use violinPlotImp() to compare the distribution of your data before and after imputation:

VIQCing::violinPlotImp("holesdummySet.txt", "holesdummySet_imputed.txt",na=TRUE, compound=1, metabolite=2, sampleStart = 3, compoundImp = 1, metaboliteImp = 2, sampleStartImp = 3)

ViolinPlotImp

Correlation

Matrix and Tree

It is possible to build a correlation matrix and its associated correlation tree for the given dataset. Both plots are optional and the correlation test can be decided.
The function also produces:
1. "filename.pdf", containing the asked plots;
2. "filname_pairs.txt", containing the correlation pairs
returns the correlation matrix "r" and the p-value matrix "P"

VIQCing::corMatrix("dummySet.txt", na=TRUE, compound=1, metabolite=2, sampleStart=3, testType="spearman", textSize = 5)

CorrelationMatrix CorrelationTree

Output file "dummySet_pairs.txt" :

read.table("dummySet_pairs.txt", sep = "\t", header = TRUE)


AurelieGuilbault/VIQCing documentation built on Oct. 30, 2019, 5:02 a.m.