knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, message = TRUE, out.width = "100%" )
We can use metflow2
for data normalization and data integration.
First, we need to prepare samples for metflow2
.
The peak table (csv format) can be from any software. We recomment that you use the Peak_table_for_cleaning.csv
from processData()
function from metflow2
.
If you use other software, please make sure that the top 3 columns are name
(peak name), mz
and rt
(rentention time, second). And the left column are sample intensity.

We need the sample information (csv format) to define the detailed information of samples. Column 1 is sample.name
, column 2 is injection.order
, column 3 is class
(such as Subject, QC, Blank), column 4 is batch
and column 5 is group
(such as control and case).

Then place the peak table and sample information in a folder. We use the demo data from demoData
package.
library(metflow2) library(demoData) library(tidyverse)
##create a folder named as example path <- file.path(".", "example") dir.create(path = path, showWarnings = FALSE)
##get demo data demo_data <- system.file("metflow2", package = "demoData") file.copy(from = file.path(demo_data, dir(demo_data)), to = path, overwrite = TRUE, recursive = TRUE)
Here, we have two peak tables, batch1.data.csv
and batch2.data.csv
, and sample_info.csv
are in your ./example
folder.
metflowClass
objectobject <- create_metflow_object( ms1.data = c("batch1.data.csv", "batch2.data.csv"), sample.information = "sample_info.csv", path = path )
object
is a metflowClass
object, so you can print it in the console.
Because there are two batch peak tables, so first we must align them.
object <- align_batch( object = object, combine.mz.tol = 15, combine.rt.tol = 30, use.int.tol = FALSE )
object2 <- filter_peaks( object = object, min.fraction = 0.5, type = "any", min.subject.blank.ratio = 2, according.to = "class", which.group = "QC" )
Nest, we should remove some samples which have a lot of missing values.
object2 <- filter_samples(object = object2, min.fraction.peak = 0.9)
object2 <- impute_mv(object = object2, method = "knn") object2
Now we can normalize data using different methods.
object3 <- normalize_data(object = object2, method = "mean")
object3 <- normalize_data(object = object2, method = "svr", threads = 1)
# object3 <- normalize_data(object = object2, method = "pqn")
After data normaliztion, you can use the get_peak_int_distribution()
function to see each peak intensity distributation plot.
get_peak_int_distribution(object = object3, peak_name = "M114T670", interactive = TRUE)
get_peak_int_distribution(object = object2, peak_name = "M114T670", interactive = TRUE)
Then we can use the integrate_data()
function to do data integration.
object4 <- integrate_data(object = object3, method = "qc.mean")
We can also get the RSDs of all the peaks before and after data normalization and data integration.
rsd2 <- calculate_rsd(object = object2, slot = "QC") rsd4 <- calculate_rsd(object = object4, slot = "QC")
Then we can draw the comprison plot:
library(ggplot2) dplyr::left_join(rsd2, rsd4, by = c("index", "name")) %>% dplyr::mutate(class = dplyr::case_when(rsd.y < rsd.x ~ "Decrease", rsd.y > rsd.x ~ "Increase", rsd.y == rsd.y ~ "Equal")) %>% ggplot(aes(rsd.x, rsd.y, colour = class)) + ggsci::scale_color_jama() + geom_abline(slope = 1, intercept = 0) + geom_point() + labs(x = "RSD after normalization", y = "RSD before normalization") + theme_bw()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.