The phytoclass package
In phytoclass: Estimate Chla Concentrations of Phytoplankton Groups

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(phytoclass)

Introduction

The phytoclass package uses non-negative matrix factorization and simulated annealing to determine the biomass of different phytoplankton groups from pigment concentrations. The methodology is discussed in [@Hayward2023] and is similar to the CHEMTAX method of @Mackey1996.

For ease of use, most functions have default options. However the user also has the option to set their own parameters within the program, and instructions on how to do this are listed for each function.

It is important to highlight that naming conventions for phytoplankton groups and their pigments should be adhered to when using the default samples. For example, the user should ensure that pigment names in their sample matrix (S) match the same pigment names in the pigment – Chl a ratio matrix (F).

At present the use of DV Chl a with Prochlorococcus is not supported, however it will be in a future release.

The main function of the package is simulated_annealing() with an associated helper function Cluster()

Other helper functions covered in this document are:

Matrix_checks
Bounded_weights
Steepest_Desc
NNLS_MF

Main function: simulated_annealing() {.tabset}

This is the main function for the phytoclass package.

It takes in the inputs (listed below) and returns the updated pigment to Chl a ratios, the Chl a biomass of each phytoplankton group, error associated with each group, and a graph displaying the Chl a concentration for each group.

It is important that samples are clustered appropriately before using the function (see the Cluster function).

Arguments:

S = Sample matrix – a matrix of pigment samples. Ensure that Chl a is the final column

F = Pigment to Chl a matrix. If left blank default values will be used. Ensure that pigment columns are in the same order as S and column naming conventions match.

user_defined_min_max = If blank default values are used. To create different min_max values, follow the same structure as the phytoclass::min_max file. See the example below.

do_matrix_checks = this should only be set to true when using the default values. This will remove pigment columns that have column sums of 0. Set to FALSE if using customised names for pigments and phytoplankton groups.

niter = number of iterations. Default value is 500.

step = step ratio used. Default value is 0.009.

weight.upper.bound = the upper limit of the weights applied. Default value is 30.

When using the default values, the only argument required is the sample matrix. However, make sure that the pigment names match those in the built-in pigment to Chl matrix Fm.

For the examples that follow the argument niter equals one for processing speed, but should be set much higher to obtain convergence.

Cluster function

Prior to analysis using simulated annealing, pigment samples require clustering.

The Cluster function divides all pigment concentrations by the total Chl a concentration. Following this the data undergoes BoxCox transformation, and the data is hierarchically clustered using the Ward method based on the Manhattan distances between pigment samples. The DynamicTreeCut method of [@Langfelder2008] is then used to prune the dendogram into reasonable clusters of specified size(s).

The function returns a list of the clusters and the cluster dendrogram.

An example, using the built-in sample data set Sm:

Cluster.result <- Cluster(Sm, 14)

# list of clusters
Cluster.result$cluster.list
# plot of clusters
plot(Cluster.result$cluster.plot)

Example without clustering

The example here uses the built-in sample matrix Sm.

set.seed("7683")
Results <- simulated_annealing(Sm, niter = 1)

Results$`condition number`
Results$RMSE
Results$MAE
Results$Error

Results$`F matrix`
Results$`Class abundances`

Results$Figure

Example with clustering

Clust1 <- Cluster(Sm, min_cluster_size = 14)$cluster.list[[1]]
# Remove the cluster column/label
Clust1$Clust <- NULL

set.seed("7683")
Results <- simulated_annealing(Clust1, niter = 1)

Results$`condition number`
Results$RMSE
Results$MAE
Results$Error

Results$`F matrix`
Results$`Class abundances`

Results$Figure

Example using non-default values

#Create Fm (F matrix). Alternatively, a .csv file can be uploaded.
#Create Fm (F matrix). Alternatively, a .csv file can be uploaded.
Fu <- data.frame(
  Per = c(0, 0, 0, 0, 1, 0, 0, 0),
  X19but = c(0, 0, 0, 0, 0, 1, 1, 0),
  Fuco = c(0, 0, 0, 1, 0, 1, 1, 0),
  Pra = c(1, 0, 0, 0, 0, 0, 0, 0),
  X19hex = c(0, 0, 0, 0, 0, 1, 0, 0),
  Allo = c(0, 0, 1, 0, 0, 0, 0, 0),
  Zea = c(1, 1, 0, 0, 0, 0, 0, 1),
  Chl_b = c(1, 1, 0, 0, 0, 0, 0, 0),
  Tchla = c(1, 1, 1, 1, 1, 1, 1, 1)
)

rownames(Fu) <- c(
  "Prasinophytes", "Chlorophytes", "Cryptophytes"
  , "Diatoms-2", "Dinoflagellates-1",
  "Haptophytes", "Pelagophytes", "Syn"
)

Min_max <- data.frame(
  Class = c(
    "Syn", "Chlorophytes", "Chlorophytes", "Prasinophytes", "Prasinophytes",
    "Prasinophytes", "Cryptophytes", "Diatoms-2", "Diatoms-2", "Pelagophytes",
    "Pelagophytes", "Pelagophytes", "Dinoflagellates-1", "Haptophytes",
    "Haptophytes", "Haptophytes", "Haptophytes", "Diatoms-2", "Cryptophytes",
    "Prasinophytes", "Chlorophytes", "Syn", "Dinoflagellates-1", "Pelagophytes"
  ),
  Pig_Abbrev = c(
    "Zea", "Zea", "Chl_b", "Pra", "Zea", "Chl_b", "Allo", "Chl_c3",
    "Fuco", "Chl_c3", "X19but", "Fuco", "Per", "X19but", "X19hex",
    "Fuco", "Tchla", "Tchla", "Tchla", "Tchla", "Tchla", "Tchla", "Tchla",
    "Tchla"
  ),
  min = as.numeric(c(
    0.0800, 0.0063, 0.1666, 0.0642, 0.0151, 0.4993, 0.2118, 0.0189,
    0.3315, 0.1471, 0.2457, 0.3092, 0.3421, 0.0819, 0.2107, 0.0090,
    1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000
  )),
  max = as.numeric(c(
    1.2123, 0.0722, 0.9254, 0.4369, 0.1396, 0.9072, 0.5479, 0.1840,
    0.9332, 0.2967, 1.0339, 1.2366, 0.8650, 0.2872, 1.3766, 0.4689,
    1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000
  ))
)

set.seed("7683")
Results <- simulated_annealing(
  S = Sm, 
  F = Fu,
  user_defined_min_max = Min_max,
  do_matrix_checks = TRUE,
  niter = 1,
  step = 0.01,
  weight.upper.bound = 30
)

set.seed("7683")
Results <- simulated_annealing(
  S = Sm, 
  F = Fu,
  user_defined_min_max = Min_max,
  do_matrix_checks = TRUE,
  niter = 1,
  step = 0.01,
  weight.upper.bound = 30
)

Results$`condition number`
Results$RMSE
Results$MAE
Results$Error

Results$`F matrix`
Results$`Class abundances`

Results$Figure

Helper functions {.tabset}

Matrix_checks

This function will perform checks on the F and S matrices. If the columns sum is 0 or very small it will be eliminated from both the S and F matrix. To use this function, naming for both pigments and phytoplankton classes must follow the same conventions as the default values. The output of this function will be new S and F matrices.

Arguments:

S = S matrix

F = F matrix

MC <- Matrix_checks(Sm, Fm)  
Snew <- MC$Snew
Fnew <- MC$Fnew

Steep_Desc

This algorithm will perform the steepest descent algorithm without being bound by any limits. The results may show unrealistic pig:Chl a ratios for certain groups, although the error can often be very low.

Arguments:

F = F matrix

S = S matrix

num.loops = the number of loops/iterations to perform

MC <- Matrix_checks(Sm, Fm)  
Snew <- MC$Snew
Fnew <- MC$Fnew
SDRes <- Steepest_Desc(Fnew, Snew, num.loops = 10)

Bounded_weights

This function determines the weights to use for the S matrix, for which the default upper bound is 30 in the simulated_annealing() function call.

Arguments:

S = sample matrix

weight.upper.bound = maximum weights upper bound

Bounded_weights(Sm, weight.upper.bound = 30)

NNLS_MF

This performs non-negative least least squares matrix factorization between the S and F matrices. Ensure that Chl a is the final column and that the columns in S and F are in the same order. If the weights are left blank then no weights will be used.

Arguments:

Fn = F matrix

S = sample matrix

cm = weights for each column

MC <- Matrix_checks(Sm, Fm)  
Snew <- MC$Snew
Fnew <- MC$Fnew
cm <- Bounded_weights(Snew, weight.upper.bound = 30)

Results <- NNLS_MF(Fnew, Snew, cm)
Results$`F matrix`
Results$RMSE
Results$`C matrix`

References

Any scripts or data that you put into this service are public.

phytoclass documentation built on April 4, 2025, 4:02 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

phytoclass
Estimate Chla Concentrations of Phytoplankton Groups

The phytoclass package
In phytoclass: Estimate Chla Concentrations of Phytoplankton Groups

Introduction

Main function: simulated_annealing() {.tabset}

Cluster function

Example without clustering

Example with clustering

Example using non-default values

Helper functions {.tabset}

Matrix_checks

Steep_Desc

Bounded_weights

NNLS_MF

References

Try the phytoclass package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

phytoclass Estimate Chla Concentrations of Phytoplankton Groups

The phytoclass package In phytoclass: Estimate Chla Concentrations of Phytoplankton Groups

Introduction

Main function: simulated_annealing() {.tabset}

Cluster function

Example without clustering

Example with clustering

Example using non-default values

Helper functions {.tabset}

Matrix_checks

Steep_Desc

Bounded_weights

NNLS_MF

References

Try the phytoclass package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

phytoclass
Estimate Chla Concentrations of Phytoplankton Groups

The phytoclass package
In phytoclass: Estimate Chla Concentrations of Phytoplankton Groups