knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE, fig.width = 8)
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE, fig.width = 8, fig.path = "figure/")
knitr::opts_chunk$set(widgetframe_isolate_widgets = TRUE) = function(x, ...) {
knit_print.DataFrame = function(x, ...) {
registerS3method("knit_print", "data.frame",
registerS3method("knit_print", "DataFrame", knit_print.DataFrame)


Through integration with Metabolomics Workbench API, lipidr allows users, to quickly explore public lipidomics experiments. lipidr provides an easy way to re-analyze and visualize these datasets.

Loading libraries


First, we load lipidr using library command. For this vignette we will enable interactive graphics plotting by calling use_interactive_graphics().

Explore Metabolomics Workbench lipidomics datasets

Users can search all Metabolomics Workbench studies using any relevant keyword. A list of all studies matching the keyword will be returned as a table (data.frame).

list_mw_studies(keyword = "lipid")

Download datasets

Datasets can be easily downloaded and parsed into LipidomicsExperiment object using lipidr function fetch_mw_study() by supplying a study_id. The example shown in this vignette is from ST001111 link. The dataset contains positive and negative MS data for untargeted lipidomics from different breast cancer tissues.

d = fetch_mw_study("ST001111")


Note the warning that some molecules were not parsed because their names did not follow the supported patterns. We can examine these molecules, remove them from the dataset or change their names, if desired.

# list non_parsed molecules

# All of them are Ceramides, written with full chemical name
# We can replace the first part with "Cer" using RegEx
non_parsed <- non_parsed_molecules(d)
new_names <- sub("^.* \\(", "Cer (", non_parsed)
d <- update_molecule_names(d, old = non_parsed, new = new_names)

# We can check once again to make sure all molecules were parsed correctly

We can have a look at the clinical data, which was conveniently extracted from Metabolomics Workbench by lipidr.


Next, we tell lipidr that our dataset is normalized and logged.

d <- set_logged(d, "Area", TRUE)
d <- set_normalized(d, "Area", TRUE)

Quality control

We look at total ion concentration (TIC) and distribution (boxplot) for each sample.

plot_samples(d, "tic")
plot_samples(d, "boxplot")

Although the TIC plot looks similar for all samples, we can spot two outlier samples with significantly large dispersion (samples 42 and 18). We will keep the samples for now, but we may want to consider checking closely.

Because there is no QC samples or technical replicates in this dataset, we cannot assess the %CV of molecules. Also, there is no need to summarize_transitions in untargeted datasets.


mvaresults = mva(d, measure="Area", method="PCA")
plot_mva(mvaresults, color_by="SampleType", components = c(1,2))

We can see mild separation between benign and cancer samples, but not between cancer and metastasis. We can also spot the sample outlier samples that should consider removing. The low variance explained by PC1 and PC2 (cumulative displayed in the plot as R2X) indicate highly variable lipid profiles in these clinical samples.

keep_samples <- !colnames(d) %in% c("18", "42")
d <- d[, keep_samples]

Univariate Analysis

Two group comparison

For simple analysis, we can compare cancer vs benign and cancer vs metastasis. From PCA plot, we can expect very little difference between cancer and metastasis.

two_group <- de_analysis(d, Cancer-Benign, Cancer-Metastasis)

By quickly looking at the volcano plots, we can confirm the minute difference between cancer and metastatic samples. A fairly large difference is observed between cancer and benign samples, with PCs and PGs up-regulated and CLs and TGs down-regulated in cancer tissues.

Multi-group comparison

Instead of two-group comparison, we might be interested in lipids are differentially expressed in any group. We can perform ANOVA-style multi-group comparison using de_design function, which allows users to provide a custom design matrix. de_design is extremely helpful in complex experimental designs, where several factors should be accounted for in the analysis.

In this example, we will use cancer stage as our grouping variable. In our dataset, we can see samples with Stages I-IV. Using ANOVA-style analysis, we will identify all lipid molecules likely to be different among cancer stages.

multi_group <- de_design(d, ~ Stage)

Here we used the formula tilde to define our predictor variables. ~ Stage formula indicates that we are interested in features (lipid molecules) that are associated with Stage.


Surprisingly, Cancer Stage does not appear to affect lipid molecules profiled in this experiment.

Factorial analysis

In complex experimental designs and clinical samples, we may need to correct for confounding variables. This is done simply by adding the variable to the formula in de_design. For example, below, we are interested in Cancer effect while correcting for Race effect.

factorial_de <- de_design(d, ~ Race + SampleType, coef = "SampleTypeCancer")

In this case, we are seeing similar pattern as the two-group comparison, which indicates a small Race effect.

Users interested in creating more complex design matrices are referred to Limma User Guide and edgeR tutorial.

Multivariate analysis

Orthogonal multivariate analysis

Supervised multivariate analyses, such as OPLS and OPLS-DA can be performed to determine which lipids are associated with a group (y-variable) of interest. In this example we use "Diet" as grouping, and display the results in a scores plot.

mvaresults = mva(d, method = "OPLS-DA", group_col = "SampleType", groups=c("Benign", "Cancer"))
plot_mva(mvaresults, color_by="SampleType")

We can also plot the loadings and display important lipids contributing to the separation between different (Diet) groups.

plot_mva_loadings(mvaresults, color_by="Class", top.n=10)

Alternatively, we can extract top N lipids along with their annotations.

top_lipids(mvaresults, top.n=10)

Supervised multivariate analysis with continuous response variable

OPLS-DA can only be applied to in two-group comparison settings. In some cases, we might be interested in a lipid molecules ....

In this example, we will format Cancer Stage as a numeric vector.

stage <- d$Stage
stage[stage == "I"] <- 1
stage[stage == "II"] <- 2
stage[stage == "III"] <- 3
stage[stage == "IV"] <- 4
stage <- as.numeric(stage)

We can see stage contains missing values. We should filter them out first.

d_filtered <- d[, !]
stage <- stage[!]

mvaresults = mva(d_filtered, method = "OPLS", group_col = stage )
plot_mva_loadings(mvaresults, color_by="Class", top.n=10)

Enrichment analysis

enrich_results = lsea(two_group, = "logFC")

Visualization of enrichment analysis results. The enriched lipid classes are highlighted.

plot_enrichment(two_group, significant_lipidsets(enrich_results), annotation="class")

Alternatively, we can highlight chain lengths that were significantly enriched.

plot_enrichment(two_group, significant_lipidsets(enrich_results), annotation="length")

Lipid chain analysis


Session information


ahmohamed/lipidr documentation built on July 7, 2023, 2:22 a.m.