Introduction to PARAFAC modelling
In parafac4microbiome: Parallel Factor Analysis Modelling of Longitudinal Microbiome Data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 12,
  fig.height = 6
)
options(tibble.print_min = 6L, tibble.print.max = 6L, digits = 3)
options(rmarkdown.html_vignette.check_title = FALSE)

Introduction

Welcome to the parafac4microbiome R package! In this vignette we explain the datasets that available in this package, how you can model them using PARAFAC, and how to plot the outcome. We require the following packages and dependencies.

library(parafac4microbiome)
library(dplyr)
library(ggplot2)

Datasets

The parafac4microbiome package comes with three exemplary datasets: Fujita2023, Shao2019 and vanderPloeg2024. These refer to the first authors of their respective papers. Each of the dataset objects are lists with the following contents:

data: the data cube of microbiome counts.
mode1: metadata corresponding to the subject mode.
mode2: metadata corresponding to the feature (microbial abundances) mode.
mode3: metadata corresponding to the time mode.

We briefly show what the data in these datasets look like and then focus on Fujita2023 for the remainder of this vignette. For details on the datasets we refer to the parafac4microbiome paper and to the original papers as listed in their respective help files. Modelling and selecting the appropriate number components are explained in more detail in the vignettes corresponding to each dataset: vignette("Fujita2023_analysis"), vignette("Shao2019_analysis") and vignette("vanderPloeg2024_analysis").

dim(Fujita2023$data)
dim(Shao2019$data)
dim(vanderPloeg2024$data)

# We focus on Fujita2023
head(Fujita2023$data[,,1])
head(Fujita2023$mode1)
head(Fujita2023$mode2)
head(Fujita2023$mode3)

Analysis

Processing the data cube

As shown above, the data cube in Fujita2023$data contains unprocessed counts. The function processDataCube() performs the processing of these counts with the following steps:

It performs feature selection based on the sparsityThreshold setting. Sparsity is here defined as the fraction of samples where a microbial abundance (ASV/OTU or otherwise) is zero.
It performs a centered log-ratio transformation of each sample using the compositions::clr() function with a pseudo-count of one (on all features, prior to selection based on sparsity).
It centers and scales the three-way array. This is a complex subject, for which we refer to a paper by Rasmus Bro and Age Smilde. By centering across the subject mode, we make the subjects comparable to each other within each time point. Scaling within the feature mode avoids the PARAFAC model focusing on features with abnormally high variation.

The outcome of processing is a new version of the dataset. Please refer to the documentation of processDataCube() for more information.

processedFujita = processDataCube(Fujita2023, sparsityThreshold=0.99, CLR=TRUE, centerMode=1, scaleMode=2)
head(processedFujita$data[,,1])

Making a PARAFAC model

The processed data is ready to be modeled using Parallel Factor Analysis. Here we arbitrarily set the number of factors (i.e. the number of components) to be three. This is normally the outcome of a more detailed investigation into the correct number of components, described in vignette("Fujita2023_analysis"), vignette("Shao2019_analysis") and vignette("vanderPloeg2024_analysis"). The output of the function is a list object containing the PARAFAC loadings in each mode in model$Fac and some statistics like R-squared and the sum of squared error.

set.seed(0) # for reproducibility
model = parafac(processedFujita$data, nfac=3, verbose=FALSE)

head(model$Fac[[1]])
head(model$Fac[[2]])
head(model$Fac[[3]])
model$varExp

This model explains r model$varExp percent of the variation in the processed data cube.

Plotting a PARAFAC model

The plotPARAFACmodel() function gives the user full control over how they want to visualize their model. As such, there are a lot of plotting options that can be used, which you can see below. A brief overview:

colourCols: per mode (subject, feature, time), specifies by what variable the loading bar plot should be colored.
legendTitles: titles of the legends per mode, if colourCode is not specified for a mode, a legend will not be generated.
xLabels: labels for the x axis of each mode.
legendColNums: the number of columns in the legend for each mode. If colourCode is not specified for a mode, a legend will not be generated.
arrangeModes: a vector of boolean values specifying if the loadings should be grouped by their colourCol for easier inspection.
continuousModes: a vector of boolean values specifying if the loadings should be visualized with a line plot instead of the default bar plot.
overallTitle: title of the plot.

For a full overview, please refer to the documentation of plotPARAFACmodel().

plotPARAFACmodel(model$Fac, processedFujita,
  numComponents = 3,
  colourCols = c("", "Genus", ""),
  legendTitles = c("", "Genus", ""),
  xLabels = c("Replicate", "Feature index", "Time point"),
  legendColNums = c(0,5,0),
  arrangeModes = c(FALSE, TRUE, FALSE),
  continuousModes = c(FALSE,FALSE,TRUE),
  overallTitle = "Fujita PARAFAC model")

This concludes the introduction to the parafac4microbiome package. We hope this gives you sufficient information to get started. For more details on modelling specific datasets, please refer to vignette("Fujita2023_analysis"), vignette("Shao2019_analysis") and vignette("vanderPloeg2024_analysis").

Any scripts or data that you put into this service are public.

parafac4microbiome documentation built on June 8, 2025, 11:40 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

parafac4microbiome
Parallel Factor Analysis Modelling of Longitudinal Microbiome Data

Introduction to PARAFAC modelling
In parafac4microbiome: Parallel Factor Analysis Modelling of Longitudinal Microbiome Data

Introduction

Datasets

Analysis

Processing the data cube

Making a PARAFAC model

Plotting a PARAFAC model

Try the parafac4microbiome package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

parafac4microbiome Parallel Factor Analysis Modelling of Longitudinal Microbiome Data

Introduction to PARAFAC modelling In parafac4microbiome: Parallel Factor Analysis Modelling of Longitudinal Microbiome Data

Introduction

Datasets

Analysis

Processing the data cube

Making a PARAFAC model

Plotting a PARAFAC model

Try the parafac4microbiome package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

parafac4microbiome
Parallel Factor Analysis Modelling of Longitudinal Microbiome Data

Introduction to PARAFAC modelling
In parafac4microbiome: Parallel Factor Analysis Modelling of Longitudinal Microbiome Data