knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, message = TRUE, out.width = "100%" )
We can use metflow2 for data normalization and data integration.
First, we need to prepare samples for metflow2.
The peak table (csv format) can be from any software. We recomment that you use the Peak_table_for_cleaning.csv from processData() function from metflow2.
If you use other software, please make sure that the top 3 columns are name (peak name), mz and rt (rentention time, second). And the left column are sample intensity.

We need the sample information (csv format) to define the detailed information of samples. Column 1 is sample.name, column 2 is injection.order, column 3 is class (such as Subject, QC, Blank), column 4 is batch and column 5 is group (such as control and case).

Then place the peak table and sample information in a folder. We use the demo data from demoData package.
library(metflow2) library(demoData) library(tidyverse)
##create a folder named as example path <- file.path(".", "example") dir.create(path = path, showWarnings = FALSE)
##get demo data demo_data <- system.file("metflow2", package = "demoData") file.copy(from = file.path(demo_data, dir(demo_data)), to = path, overwrite = TRUE, recursive = TRUE)
Here, we have two peak tables, batch1.data.csv and batch2.data.csv, and sample_info.csv are in your ./example folder.
metflowClass objectobject <- create_metflow_object( ms1.data = c("batch1.data.csv", "batch2.data.csv"), sample.information = "sample_info.csv", path = path )
object is a metflowClass object, so you can print it in the console.
Because there are two batch peak tables, so first we must align them.
object <- align_batch( object = object, combine.mz.tol = 15, combine.rt.tol = 30, use.int.tol = FALSE )
object2 <- filter_peaks( object = object, min.fraction = 0.5, type = "any", min.subject.blank.ratio = 2, according.to = "class", which.group = "QC" )
Nest, we should remove some samples which have a lot of missing values.
object2 <- filter_samples(object = object2, min.fraction.peak = 0.9)
object2 <- impute_mv(object = object2, method = "knn") object2
Now we can normalize data using different methods.
object3 <- normalize_data(object = object2, method = "mean")
object3 <- normalize_data(object = object2, method = "svr", threads = 1)
# object3 <- normalize_data(object = object2, method = "pqn")
After data normaliztion, you can use the get_peak_int_distribution() function to see each peak intensity distributation plot.
get_peak_int_distribution(object = object3, peak_name = "M114T670", interactive = TRUE)
get_peak_int_distribution(object = object2, peak_name = "M114T670", interactive = TRUE)
Then we can use the integrate_data() function to do data integration.
object4 <- integrate_data(object = object3, method = "qc.mean")
We can also get the RSDs of all the peaks before and after data normalization and data integration.
rsd2 <- calculate_rsd(object = object2, slot = "QC") rsd4 <- calculate_rsd(object = object4, slot = "QC")
Then we can draw the comprison plot:
library(ggplot2) dplyr::left_join(rsd2, rsd4, by = c("index", "name")) %>% dplyr::mutate(class = dplyr::case_when(rsd.y < rsd.x ~ "Decrease", rsd.y > rsd.x ~ "Increase", rsd.y == rsd.y ~ "Equal")) %>% ggplot(aes(rsd.x, rsd.y, colour = class)) + ggsci::scale_color_jama() + geom_abline(slope = 1, intercept = 0) + geom_point() + labs(x = "RSD after normalization", y = "RSD before normalization") + theme_bw()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.