fclust: Build a functional clustering for one or more performances
In functClust: Functional Clustering of Redundant Components of a System

Description Usage Arguments Details Value References See Also Examples

Fit a primary tree of component clustering to observed assemblage performances, then prune the primary tree for its predicting ability and its parcimony, finally retain a validated secondary tree and the corresponding predictions, statistics and other informations.

fclust(dat, nbElt,
       weight     = rep(1, dim(dat)[2] - nbElt - 1),
       opt.na     = FALSE,
       opt.repeat = FALSE,
       opt.method = "divisive",
       affectElt  = rep(1, nbElt),
       opt.mean   = "amean",
       opt.model  = "byelt",
       opt.jack   = FALSE,   jack = c(3,4) )

`dat`	a data.frame or matrix that brings together: a vector of assemblage identity, a matrix of occurrence of components within the system, one or more vectors of observed performances. Consequently, the data.frame or matrix dimensions are: `dim(dat)[1]=` the number of observed assemblages, `* dim(dat)[2]=` 1 + number of system components + number of observed performances. On a first line (colnames): assemblage identity, a list of components identified by their names, a list of performances identified by their names. On following lines (a line by assemblage), name of the assemblage (read as character), a sequence of 0 (absence) and 1 (presence of component within each assemblage) (this is the matrix of occurrence of components within the system), a sequence of numeric values for informed each observed performances (this is the set of observed performances).
`nbElt`	an integer, that specifies the number of components belonging to interactive system. `nbElt` is used to know the dimension of matrix of occurrence.
`weight`	a vector of numerics, that specifies the weight of each performance. By default, each performance is equally weighted. If `weight` is informed, it must have the same length as the number of observed performances.
`opt.na`	a logical. The records for each assemblage can have `NA` in matrix of occurrence or in observed assemblage performances. If `opt.na = FALSE` (by default), an error is returned. If `opt.na = TRUE`, the records with `NA` are ignored.
`opt.repeat`	a logical. in any case, the function looks for different assemblages with identical elemental composition. Messages indicate these identical assemblages. If `opt.repeat = FALSE` (by default), their performances are averaged. If `opt.repeat = TRUE`, nothing is done, and the data are processed as they are.
`opt.method`	a string that specifies the method to use. `opt.method = c("divisive", "agglomerative", "apriori")`. The three methods generate hierarchical trees. Each tree is complete, running from a unique trunk to as many leaves as components. If `opt.method = "divisive"`, the components are clustered by using a divisive method, from the trivial cluster where all components are together, towards the clustering where each component is a cluster. This method gives the best result for several reasons, exposed in detail in joined vignettes (see "The options of fclust"). If `opt.method = "agglomerative"`, the components are clustered by using an agglomerative method, from the trivial clustering where each component is a cluster, towards the cluster where all components are brought together If all possible assemblages are not observed (that is generally he case in practice), the first clustering of few components can have no effect on convergence criterion, indicing a non-optimum result. If `opt.method = "apriori"`, the user knows and gives an "a priori" partitioning of the system components he is studying. The partition is arbitrary, in any number of clusters of components, but it must be specified (see following option `affectElt`). The tree is then built: (i) by using `opt.method = "divisive"` from the defined component clustering towards as many leaves as components; (ii) by using `opt.method = "agglomerative"` from the component clustering towards the trunk of tree.
`affectElt`	a vector of characters or integers, as long as the number of components `nbElt`, that indicates the labels of different functional clusters to which each component belongs. Each functional cluster is labelled as a character or an integer, and each component must be identified by its name in `names(affectElt)`. The number of functional clusters defined in `affectElt` determines an a priori level of component clustering (`level <- length(unique(affectElt))`). If `affectElt = NULL` (by default), the option `opt.method` must be specified. If `affectElt` is specified, the option `opt.method` switchs to `apriori`.
`opt.mean`	a character, equals to `"amean"` or `"gmean"`. If `opt.mean = "amean"`, means are computed using an arithmetic formula, if `opt.mean = "gmean"`, mean are computed using a geometric formula.
`opt.model`	a character equals to `"bymot"` or `"byelt"`. If `opt.model = "bymot"`, the modelled performances are means of performances of assemblages that share a same assembly motif by including all assemblages that belong to a same assembly motif. If `opt.model = "byelt"`, the modelled performances are the average of mean performances of assemblages that share a same assembly motif and that contain the same components as the assemblage to predict. This procedure corresponds to a linear model within each assembly motif based on the component occurrence in each assemblage. If no assemblage contains component belonging to assemblage to predict, performance is the mean performance of all assemblages as in `opt.model = "bymot"`.
`opt.jack`	a logical, that switchs towards cross-validation method. If `opt.jack = FALSE` (by default), a Leave-One-Out method is used: predicted performances are computed as the mean of performances of assemblages that share a same assembly motif, experiment by experiment, except the only assemblage to predict. If `opt.jack = TRUE`, a jackknife method is used: the set of assemblages belonging to a same assembly motif is divided into `jack[2]` subsets of `jack[1]` assemblages. Predicted performances of each subset of `jack[1]` assemblages are computed, experiment by experiment, by using the other (`jack[2] - 1`) subsets of assemblages. If the total number of assemblages belonging to the assembly motif is lower than `jack[1]*jack[2]`, predictions are computed by Leave-One-Out method.
`jack`	an integer vector of length `2`. The vector specifies the parameters for jackknife method. The first integer `jack[1]` specifies the size of subset, the second integer `jack[2]` specifies the number of subsets.

see Vignette "The options of fclust".

Return a list containing the primary tree of component clustering, predictions of assembly performances and statistics computed by using the primary and secondary trees of component clustering.

Recall of inputs:

nbElt, nbAss, nbXpr: the number of components that belong to the interactive system, the number of assemblages and the number of performances observed, respectively.
opt.method, opt.mean, opt.model, opt.jack, jack, opt.na, opt.repeat, affectElt: the options used for computing the resulting clustering trees, respectively.
fobs, mOccur, xpr: the vector or matrix of observed performances of assemblages, the binary matrix of occurrence of components, and the vector of weight of different performances, respectively.

Primary and secondary, fitted and validated trees, of component clustering and associated statistics:

tree.I, tree.II, nbOpt: the primary tree of component clustering, the validated secondary tree of component clustering, and the optimum number of functional clusters, respectively. A tree is a list of a square-matrix of dimensions nbLev * nbElt (with nbLev = nbElt), and of a vector of coefficient of determination (of length nbLev).
mCal, mPrd, tCal, tPrd: the numeric matrix of modelled values, and of values predicted by cross-validation, using the primary tree (mCal and (mPrd) or the secondary tree (tCal and (tPrd), respectively. All matrices have the same dimension nbLev * nbAss. rownames contains the number of component clusters, that is from 1 to nbElt clusters. colnames contains the names of assemblages.
mMotifs, tNbcl: the matrix of affectation of assemblages to different assembly motifs, coded as integers, and the matrices of the last tree levels used for predicting assemblage performances. All matrices have the same dimension nbLev * nbAss. rownames contains the number of component clusters, that is from 1 to nbElt clusters. colnames contains the names of assemblages.
mStats, tStats: the matrices of associated statistics. rownames contains the number of component clusters, that is from 1 to nbElt clusters. colnames = c("missing", "R2cal", "R2prd", "AIC", "AICc").

Jaillard, B., Richon, C., Deleporte, P., Loreau, M. and Violle, C. (2018) An a posteriori species clustering for quantifying the effects of species interactions on ecosystem functioning. Methods in Ecology and Evolution, 9:704-715. https://doi.org/10.1111/2041-210X.12920.

Jaillard, B., Deleporte, P., Loreau, M. and Violle, C. (2018) A combinatorial analysis using observational data identifies species that govern ecosystem functioning. PLoS ONE 13(8): e0201135. https://doi.org/10.1371/journal.pone.0201135.

fclust: build a functional clustering,
fclust_plot: plot the results of a functional clustering,
fclust_write: save the results of a functional clustering,
fclust_read: read the results of a functional clustering.

# Enable the comments
oldOption <- getOption("verbose")
if (!oldOption) options(verbose = TRUE)

nbElt <- 16   # number of components
# index = Identity, Occurrence of components, a Performance
index <- c(1, 1 + 1:nbElt, 1 + nbElt + 1)
dat.2004 <- CedarCreek.2004.2006.dat[ , index]
res <- fclust(dat.2004, nbElt)
names(res)
res$tree.II

options(verbose = oldOption)