simprof: simprof

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/simprof.R

Description

Simprof from clustsig calculates the dist matrix internally. The Hellinger distance is not implemented. This is quite inconvinient to follow S Primpke approach. To avoid writing a function each time, we will implement a new method. The 'hellinger' method.

Usage

1
2
3
4
5
simprof(data, num.expected = 1000, num.simulated = 999,
  method.cluster = "average", method.distance = "euclidean",
  method.transform = "identity", alpha = 0.05,
  sample.orientation = "row", const = 0, silent = TRUE,
  increment = 100, undef.zero = TRUE, warn.braycurtis = TRUE)

Arguments

data

Input data in a matrix.

num.expected

The number of similarity profiles to generate for creating the expected distribution of the data. This value should be large.

num.simulated

The number of similarity profiles to generate for use in comparing the observed test statistic with its null distribution. This value should be large.

method.cluster

The method of clustering to use with hclust. Standard values from hclust are "ward", "single", "complete", "average", "mcquitty", "median" or "centroid".

method.distance

This value should be either an option to pass to the function dist (standard values are "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski"), "braycurtis" or "czekanowski" for Czekanowski Dissimilarity (referred to as Bray-Curtis Disimilarity in some fields, particularly marine biology), or "actual-braycurtis" for the true Bray-Curtis Dissimilarity where the data are standardized before the dissimilarity is calculated. This value can also be any function which returns a "dist" object. In this version of clustsig the "hellinger" distance is also implemented.

method.transform

An option to specify a transform, if any, to be applied to the data. Possible values are "identity" (no transformation), "squareroot", "log", "PA"(Presence/Absence), or any numeric value (of type "double"). This transform is applied before the adjustment constant is applied, so choose a constant accordingly.

alpha

The alpha level at which to reject the null hypothesis. If the null is rejected, the test continues and tests each sub-tree recursively until either all subtrees are exhausted by reaching the individual level or there are no significant distance. Due to the nature of multiple testing inherent in this process, care should be taken when choosing this alpha level.

sample.orientation

The orientation of the data, either "row" or "column". The practical effect of this is that the transpose will be examined if "column" is chosen.

const

The value of the constant to be used in adjusting the Bray-Curtis Dissimilarity coefficient, if any is to be used. Any positive value of "const" will be appended as a new variable to each sample, acting as a sort of “dummy species” (where that interpretation is appropriate).

silent

A logical value indicating whether anything should be printed during the code execution. If FALSE, a message will be printed every increment (see below) number of times in the main looping procedure. This was implemented because the code can take a while to run due to many permutations and its recursive nature; however, for the same reason, many messages could be printed.

increment

An integer value indicating, if silent=FALSE, one which iterations a message should be printed. (If the iteration number modulus increment equals 0, that number will be printed.)

undef.zero

A logical value indicating whether undefined values arising from a denominator equal to 0 in the Bray-Curtis/Czekanowski Dissimilarity Indices should result in NA or 0. This defaults to TRUE so that NA values are replaced by 0. This default is to retain backward compatibility with the previous version of the package but may be changed in a future release.

warn.braycurtis

A logical value indicating whether a warning should be printed when using the "braycurtis" option because of the naming confusion in some fields with the Czekanowski Dissimilarity Index. This defaults to TRUE but may change in future releases. For more information, see Yoshioka (2008) listed in the references.

Details

A tool for determining the number of significant clusters produced using hclust() with the assumption of no a priori groups.

Value

S4 object of class simprof. It has the following components:

Author(s)

Douglas Whitaker and Mary Christman.

References

See Also

hclust

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
## Not run: 
# Load the USArrests dataset included with R
# And use abbreviations of state names
# We leave out the third column because
# it is on a different scale
usarrests<-USArrests[,c(1,2,4)]
rownames(usarrests)<-state.abb
# Run simprof on the data
res <- simprof(data=usarrests, method.distance="braycurtis")
pl.color <- simprof.plot(res)

## End(Not run)

fcorra/mod_clustsig documentation built on Jan. 24, 2020, 1:26 a.m.