simprof: simprof
In fcorra/mod_clustsig: Modification of Significant Cluster Analysis

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/simprof.R

Simprof from clustsig calculates the dist matrix internally. The Hellinger distance is not implemented. This is quite inconvinient to follow S Primpke approach. To avoid writing a function each time, we will implement a new method. The 'hellinger' method.

simprof(data, num.expected = 1000, num.simulated = 999,
  method.cluster = "average", method.distance = "euclidean",
  method.transform = "identity", alpha = 0.05,
  sample.orientation = "row", const = 0, silent = TRUE,
  increment = 100, undef.zero = TRUE, warn.braycurtis = TRUE)

`data`	Input data in a matrix.
`num.expected`	The number of similarity profiles to generate for creating the expected distribution of the data. This value should be large.
`num.simulated`	The number of similarity profiles to generate for use in comparing the observed test statistic with its null distribution. This value should be large.
`method.cluster`	The method of clustering to use with `hclust`. Standard values from `hclust` are `"ward"`, `"single"`, `"complete"`, `"average"`, `"mcquitty"`, `"median"` or `"centroid"`.
`method.distance`	This value should be either an option to pass to the function `dist` (standard values are `"euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski"`), `"braycurtis"` or `"czekanowski"` for Czekanowski Dissimilarity (referred to as Bray-Curtis Disimilarity in some fields, particularly marine biology), or `"actual-braycurtis"` for the true Bray-Curtis Dissimilarity where the data are standardized before the dissimilarity is calculated. This value can also be any function which returns a `"dist"` object. In this version of clustsig the "hellinger" distance is also implemented.
`method.transform`	An option to specify a transform, if any, to be applied to the data. Possible values are `"identity"` (no transformation), `"squareroot"`, `"log"`, `"PA"`(Presence/Absence), or any numeric value (of type `"double"`). This transform is applied before the adjustment constant is applied, so choose a constant accordingly.
`alpha`	The alpha level at which to reject the null hypothesis. If the null is rejected, the test continues and tests each sub-tree recursively until either all subtrees are exhausted by reaching the individual level or there are no significant distance. Due to the nature of multiple testing inherent in this process, care should be taken when choosing this alpha level.
`sample.orientation`	The orientation of the data, either `"row"` or `"column"`. The practical effect of this is that the transpose will be examined if `"column"` is chosen.
`const`	The value of the constant to be used in adjusting the Bray-Curtis Dissimilarity coefficient, if any is to be used. Any positive value of `"const"` will be appended as a new variable to each sample, acting as a sort of “dummy species” (where that interpretation is appropriate).
`silent`	A logical value indicating whether anything should be printed during the code execution. If `FALSE`, a message will be printed every `increment` (see below) number of times in the main looping procedure. This was implemented because the code can take a while to run due to many permutations and its recursive nature; however, for the same reason, many messages could be printed.
`increment`	An integer value indicating, if `silent=FALSE`, one which iterations a message should be printed. (If the iteration number modulus `increment` equals 0, that number will be printed.)
`undef.zero`	A logical value indicating whether undefined values arising from a denominator equal to 0 in the Bray-Curtis/Czekanowski Dissimilarity Indices should result in `NA` or 0. This defaults to `TRUE` so that NA values are replaced by 0. This default is to retain backward compatibility with the previous version of the package but may be changed in a future release.
`warn.braycurtis`	A logical value indicating whether a warning should be printed when using the `"braycurtis"` option because of the naming confusion in some fields with the Czekanowski Dissimilarity Index. This defaults to `TRUE` but may change in future releases. For more information, see Yoshioka (2008) listed in the references.

A tool for determining the number of significant clusters produced using hclust() with the assumption of no a priori groups.

S4 object of class simprof. It has the following components:

numgroups The number of groups which are found to bestatistically significant.
significantclusters A list of length numgroups with each element containing the sample IDs (row/column numbers in the corresponding original data) that are in each significant cluster.
pval The merge component from the hclust results with an extra column of p-values. These p-values are for testing whether the two groups in that row are statistically different.
hclust An object of class hclust which is just the results of running hclust on the original data.

Douglas Whitaker and Mary Christman.

Clarke, K.R., Somerfield, P.J., and Gorley, R.N., 2008. Testing of null hypotheses in exploratory community analyses similarity profiles and biota-environment linkage. J. Exp. Mar. Biol. Ecol. 366, 56–69.
Yoshioka, P.M., 2008. Misidentification of the Bray-Curtis similarity index. Mar. Ecol. Prog. Ser. 368, 309–310.

hclust

## Not run: 
# Load the USArrests dataset included with R
# And use abbreviations of state names
# We leave out the third column because
# it is on a different scale
usarrests<-USArrests[,c(1,2,4)]
rownames(usarrests)<-state.abb
# Run simprof on the data
res <- simprof(data=usarrests, method.distance="braycurtis")
pl.color <- simprof.plot(res)

## End(Not run)