knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(phytoclass)
The phytoclass package uses non-negative matrix factorization and simulated annealing to determine the biomass of different phytoplankton groups from pigment concentrations. The methodology is discussed in [@Hayward2023] and is similar to the CHEMTAX method of @Mackey1996.
For ease of use, most functions have default options. However the user also has the option to set their own parameters within the program, and instructions on how to do this are listed for each function.
It is important to highlight that naming conventions for phytoplankton groups and their pigments should be adhered to when using the default samples. For example, the user should ensure that pigment names in their sample matrix (S) match the same pigment names in the pigment – Chl a ratio matrix (F).
At present the use of DV Chl a with Prochlorococcus is not supported, however it will be in a future release.
The main function of the package is simulated_annealing() with an associated helper function Cluster()
Other helper functions covered in this document are:
This is the main function for the phytoclass package.
It takes in the inputs (listed below) and returns the updated pigment to Chl a ratios, the Chl a biomass of each phytoplankton group, error associated with each group, and a graph displaying the Chl a concentration for each group.
It is important that samples are clustered appropriately before using the function (see the Cluster function).
Arguments:
S = Sample matrix – a matrix of pigment samples. Ensure that Chl a is the final column
F = Pigment to Chl a matrix. If left blank default values will be used. Ensure that pigment columns are in the same order as S and column naming conventions match.
user_defined_min_max = If blank default values are used. To create different min_max values, follow the same structure as the phytoclass::min_max file. See the example below.
do_matrix_checks = this should only be set to true when using the default values. This will remove pigment columns that have column sums of 0. Set to FALSE if using customised names for pigments and phytoplankton groups.
niter = number of iterations. Default value is 500.
step = step ratio used. Default value is 0.009.
weight.upper.bound = the upper limit of the weights applied. Default value is 30.
When using the default values, the only argument required is the sample matrix. However, make sure that the pigment names match those in the built-in pigment to Chl matrix Fm.
For the examples that follow the argument niter equals one for processing speed, but should be set much higher to obtain convergence.
Prior to analysis using simulated annealing, pigment samples require clustering.
The Cluster function divides all pigment concentrations by the total Chl a concentration. Following this the data undergoes BoxCox transformation, and the data is hierarchically clustered using the Ward method based on the Manhattan distances between pigment samples. The DynamicTreeCut method of [@Langfelder2008] is then used to prune the dendogram into reasonable clusters of specified size(s).
The function returns a list of the clusters and the cluster dendrogram.
An example, using the built-in sample data set Sm:
Cluster.result <- Cluster(Sm, 14)
# list of clusters Cluster.result$cluster.list # plot of clusters plot(Cluster.result$cluster.plot)
The example here uses the built-in sample matrix Sm.
set.seed("7683") Results <- simulated_annealing(Sm, niter = 1)
Results$`condition number` Results$RMSE Results$MAE Results$Error Results$`F matrix` Results$`Class abundances`
Results$Figure
Clust1 <- Cluster(Sm, min_cluster_size = 14)$cluster.list[[1]] # Remove the cluster column/label Clust1$Clust <- NULL set.seed("7683") Results <- simulated_annealing(Clust1, niter = 1)
Results$`condition number` Results$RMSE Results$MAE Results$Error Results$`F matrix` Results$`Class abundances`
Results$Figure
#Create Fm (F matrix). Alternatively, a .csv file can be uploaded. #Create Fm (F matrix). Alternatively, a .csv file can be uploaded. Fu <- data.frame( Per = c(0, 0, 0, 0, 1, 0, 0, 0), X19but = c(0, 0, 0, 0, 0, 1, 1, 0), Fuco = c(0, 0, 0, 1, 0, 1, 1, 0), Pra = c(1, 0, 0, 0, 0, 0, 0, 0), X19hex = c(0, 0, 0, 0, 0, 1, 0, 0), Allo = c(0, 0, 1, 0, 0, 0, 0, 0), Zea = c(1, 1, 0, 0, 0, 0, 0, 1), Chl_b = c(1, 1, 0, 0, 0, 0, 0, 0), Tchla = c(1, 1, 1, 1, 1, 1, 1, 1) ) rownames(Fu) <- c( "Prasinophytes", "Chlorophytes", "Cryptophytes" , "Diatoms-2", "Dinoflagellates-1", "Haptophytes", "Pelagophytes", "Syn" ) Min_max <- data.frame( Class = c( "Syn", "Chlorophytes", "Chlorophytes", "Prasinophytes", "Prasinophytes", "Prasinophytes", "Cryptophytes", "Diatoms-2", "Diatoms-2", "Pelagophytes", "Pelagophytes", "Pelagophytes", "Dinoflagellates-1", "Haptophytes", "Haptophytes", "Haptophytes", "Haptophytes", "Diatoms-2", "Cryptophytes", "Prasinophytes", "Chlorophytes", "Syn", "Dinoflagellates-1", "Pelagophytes" ), Pig_Abbrev = c( "Zea", "Zea", "Chl_b", "Pra", "Zea", "Chl_b", "Allo", "Chl_c3", "Fuco", "Chl_c3", "X19but", "Fuco", "Per", "X19but", "X19hex", "Fuco", "Tchla", "Tchla", "Tchla", "Tchla", "Tchla", "Tchla", "Tchla", "Tchla" ), min = as.numeric(c( 0.0800, 0.0063, 0.1666, 0.0642, 0.0151, 0.4993, 0.2118, 0.0189, 0.3315, 0.1471, 0.2457, 0.3092, 0.3421, 0.0819, 0.2107, 0.0090, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000 )), max = as.numeric(c( 1.2123, 0.0722, 0.9254, 0.4369, 0.1396, 0.9072, 0.5479, 0.1840, 0.9332, 0.2967, 1.0339, 1.2366, 0.8650, 0.2872, 1.3766, 0.4689, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000 )) )
set.seed("7683") Results <- simulated_annealing( S = Sm, F = Fu, user_defined_min_max = Min_max, do_matrix_checks = TRUE, niter = 1, step = 0.01, weight.upper.bound = 30 )
set.seed("7683") Results <- simulated_annealing( S = Sm, F = Fu, user_defined_min_max = Min_max, do_matrix_checks = TRUE, niter = 1, step = 0.01, weight.upper.bound = 30 )
Results$`condition number` Results$RMSE Results$MAE Results$Error Results$`F matrix` Results$`Class abundances`
Results$Figure
This function will perform checks on the F and S matrices. If the columns sum is 0 or very small it will be eliminated from both the S and F matrix. To use this function, naming for both pigments and phytoplankton classes must follow the same conventions as the default values. The output of this function will be new S and F matrices.
Arguments:
S = S matrix
F = F matrix
MC <- Matrix_checks(Sm, Fm) Snew <- MC$Snew Fnew <- MC$Fnew
This algorithm will perform the steepest descent algorithm without being bound by any limits. The results may show unrealistic pig:Chl a ratios for certain groups, although the error can often be very low.
Arguments:
F = F matrix
S = S matrix
num.loops = the number of loops/iterations to perform
MC <- Matrix_checks(Sm, Fm) Snew <- MC$Snew Fnew <- MC$Fnew SDRes <- Steepest_Desc(Fnew, Snew, num.loops = 10)
This function determines the weights to use for the S matrix, for which the default upper bound is 30 in the simulated_annealing() function call.
Arguments:
S = sample matrix
weight.upper.bound = maximum weights upper bound
Bounded_weights(Sm, weight.upper.bound = 30)
This performs non-negative least least squares matrix factorization between the S and F matrices. Ensure that Chl a is the final column and that the columns in S and F are in the same order. If the weights are left blank then no weights will be used.
Arguments:
Fn = F matrix
S = sample matrix
cm = weights for each column
MC <- Matrix_checks(Sm, Fm) Snew <- MC$Snew Fnew <- MC$Fnew cm <- Bounded_weights(Snew, weight.upper.bound = 30) Results <- NNLS_MF(Fnew, Snew, cm) Results$`F matrix` Results$RMSE Results$`C matrix`
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.