dimac: Diversive magnetic clustering

Description Usage Arguments Value Details References Examples

View source: R/dimac.R

Description

The function dimac implements the divisive magnetic clustering algorithm to partition fatty acid signatures into clusters. The DiMaC algorithm was modified from the diana algorithm of the package cluster (Maechler et al. 2016). dimac is intended to be called by the user, but only after the fatty acid signatures have been prepared for analysis by calls to the functions prep_fa and prep_sig. Consequently, error checking of the arguments associated with the signatures (sigs, id, type, and loc) is necessarily limited, and calling dimac without preceding calls to prep_fa and prep_sig could return meaningless results. Please see Details or the vignette for additional information.

Usage

1
dimac(sigs, id, type, loc, dist_meas = 1, gamma = 1)

Arguments

sigs

A numeric matrix of fatty acid signatures in column-major format.

id

A character vector with a unique sample ID for each signature.

type

A character vector of prey or predator type names.

loc

A numeric matrix specifying the location of signatures within sig for each type.

dist_meas

A integer indicator of the distance measure to use. Default value 1.

gamma

The power parameter of the chi-square distance measure. Default value 1.

Value

A list containing the following elements:

clust

A data frame denoting cluster assignments at each iteration of the algorithm.

clust_dist

A numeric matrix of the summed distance within clusters at each iteration.

err_code

An integer error code(0 if no error is detected).

err_message

A string containing a brief summary of the results.

Details

The signatures in sigs are presumed to be ready for analysis, which is best accomplished by a call to the function prep_sig. Please refer to the documentation for prep_sig and/or the vignette for additional details.

The matrix loc provides a mapping of the location of data for each type within sig. It must contain a row for each type and two columns, which contain integers designating the first and last signature of each type within sigs. Such a matrix is returned by the function prep_sig.

Please refer to the documentation for the function dist_between_2_sigs for information regarding permissable values for the arguments dist_meas and gamma.

The DiMaC algorithm is initialized with all signatures in one cluster. The first two magnets are chosen as the two signatures having the greatest distance between them and each non-magnet signature is placed in the cluster associated with the closest magnet. The algorithm then enters an iterative phase. At each iteration, the cluster with the greatest average distance between its signatures and the mean signature is identified as the "active" cluster. The two signatures within the active cluster having the greatest distance between them are selected as new magnets. One of the two new magnets replaces the original magnet for the active cluster and the second starts the formation of an additional cluster. Each non-magnet signature is placed in the cluster associated with the closest magnet, without regard for its cluster designation in the preceding iteration. Consequently, the algorithm is not simply bifurcating, but rather is much more dynamic and flexible. The iterations continue until each signature is in its own cluster.

Unfortunately, there is no objective method to determine the most appropriate number of clusters for each prey or predator type. Our suggestion is to examine the distance results and identify any substantial reductions in distance, which are likely caused by the discovery of structure within that type, that are followed by a more gradual decrease in distance as the number of clusters increases. For diet estimation applications, partitioning a prey library into more clusters than the number of fatty acids used to estimate diet may result in estimates that are not unique. In such a case, estimates of diet composition need to be pooled into a smaller number of "reporting groups" (e.g., Bromaghin 2008; Meynier et al. 2010).

Utility functions called by dimac:

References

Bromaghin, J.F. 2008. BELS: Backward elimination locus selection for studies of mixture composition or individual assignment. Molecular Ecology Resources 8:568-571.

Maechler, M., P. Rousseeuw, A. Struyf, M. Hubert, and K. Hornik. 2016. cluster: cluster analysis basics and extensions. R package version 2.0.4.

Meynier, L., P.C.H. morel, B.L. Chilvers, D.D.S. Mackenzie, and P. Duignan. 2010. Quantitative fatty acid signature analysis on New Zealand sea lions: model sensitivity and diet estimates. Journal of Mammalogy 91:1484-1495.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
dimac(sigs = matrix(c(0.05, 0.10, 0.30, 0.55,
                      0.04, 0.11, 0.29, 0.56,
                      0.10, 0.05, 0.35, 0.50,
                      0.12, 0.03, 0.37, 0.48,
                      0.10, 0.06, 0.35, 0.49,
                      0.05, 0.15, 0.35, 0.45), ncol=6),
      id = c("ID_1", "ID_2", "ID_3", "ID_4", "ID_5", "ID_6"),
      type = c("Type_1", "Type_2", "Type_3"),
      loc = matrix(c(1, 3, 5, 2, 4, 6), ncol=2),
      dist_meas = 1,
      gamma = NA)
dimac(sigs = matrix(c(0.05, 0.10, 0.30, 0.55,
                      0.04, 0.11, 0.29, 0.56,
                      0.10, 0.05, 0.35, 0.50,
                      0.12, 0.03, 0.37, 0.48,
                      0.10, 0.06, 0.35, 0.49,
                      0.05, 0.15, 0.35, 0.45), ncol=6),
      id = c("ID_1", "ID_2", "ID_3", "ID_4", "ID_5", "ID_6"),
      type = c("Type_1", "Type_2", "Type_3"),
      loc = matrix(c(1, 3, 5, 2, 4, 6), ncol=2),
      dist_meas = 2,
      gamma = NA)
dimac(sigs = matrix(c(0.05, 0.10, 0.30, 0.55,
                      0.04, 0.11, 0.29, 0.56,
                      0.10, 0.05, 0.35, 0.50,
                      0.12, 0.03, 0.37, 0.48,
                      0.10, 0.06, 0.35, 0.49,
                      0.05, 0.15, 0.35, 0.45), ncol=6),
      id = c("ID_1", "ID_2", "ID_3", "ID_4", "ID_5", "ID_6"),
      type = c("Type_1", "Type_2", "Type_3"),
      loc = matrix(c(1, 3, 5, 2, 4, 6), ncol=2),
      dist_meas = 3,
      gamma = 0.5)
dimac(sigs = matrix(c(0.05, 0.10, 0.30, 0.55,
                      0.04, 0.11, 0.29, 0.56,
                      0.10, 0.05, 0.35, 0.50,
                      0.12, 0.03, 0.37, 0.48,
                      0.10, 0.06, 0.35, 0.49,
                      0.05, 0.15, 0.35, 0.45), ncol=6),
      id = c("ID_1", "ID_2", "ID_3", "ID_4", "ID_5", "ID_6"),
      type = c("Type_1", "Type_2", "Type_3"),
      loc = matrix(c(1, 3, 5, 2, 4, 6), ncol=2))

qfasar documentation built on March 20, 2020, 1:10 a.m.