mcarlo: Monte Carlo simulation of dissimilarities

View source: R/mcarlo.R

mcarloR Documentation

Monte Carlo simulation of dissimilarities

Description

Permutations and Monte Carlo simulations to define critical values for dissimilarity coefficients for use in MAT reconstructions.

Usage

mcarlo(object, ...)

## Default S3 method:
mcarlo(object, nsamp = 10000,
       type = c("paired", "complete", "bootstrap", "permuted"),
       replace = FALSE, 
       method = c("euclidean", "SQeuclidean", "chord", "SQchord",
                  "bray", "chi.square", "SQchi.square",
                  "information", "chi.distance", "manhattan",
                  "kendall", "gower", "alt.gower", "mixed"),
       is.dcmat = FALSE, diag = FALSE, ...)

## S3 method for class 'mat'
mcarlo(object, nsamp = 10000,
       type = c("paired", "complete", "bootstrap", "permuted"),
       replace = FALSE, diag = FALSE, ...)

## S3 method for class 'analog'
mcarlo(object, nsamp = 10000,
       type = c("paired", "complete", "bootstrap", "permuted"),
       replace = FALSE, diag = FALSE, ...)

Arguments

object

an R object. Currently only object's of class "mat", "analog" or matrix-like object of species data allowed.

nsamp

numeric; number of permutations or simulations to draw.

type

character; the type of permutation or simulation to perform. See Details, below.

replace

logical; should sampling be done with replacement?

method

character; for raw species matrices, the dissimilarity coefficient to use. This is predefined when fitting a MAT model with mat or analogue matching via analogue and is ignored in the "mcarlo" methods for classes "mat" and "analog".

is.dcmat

logical; is "object" a dissimilarity matrix. Not meant for general use; used internally by "mat" and "analogue" methods to instruct the "default" method that "object" is already a dissimilarity matrix, so there is no need to recalculate.

diag

logical; should the dissimilarities include the diagonal (zero) values of the dissimilarity matrix. See Details.

...

arguments passed to or from other methods.

Details

Only "type" "paired" and "bootstrap" are currently implemented.

distance produces square, symmetric dissimilarity matrices for training sets. The upper triangle of these matrices is a duplicate of the lower triangle, and as such is redundant. mcarlo works on the lower triangle of these dissimilarity matrices, representing all pairwise dissimilarity values for training set samples. The default is not to include the diagonal (zero) values of the dissimilarity matrix. If you feel that these diagonal (zero) values are part of the population of dissimilarities then use "diag = TRUE" to include them in the permutations.

Value

A vector of simulated dissimilarities of length "nsamp". The "method" used is stored in attribute "method".

Note

The performance of these permutation and simulation techniques still needs to be studied. This function is provided for pedagogic reasons. Although recommended by Sawada et al (2004), sampling with replacement ("replace = TRUE") and including diagonal (zero) values ("diag = TRUE") simulates too many zero distances. This is because the same training set sample can, on occasion be drawn twice leading to a zero distance. It is impossible to find in nature two samples that will be perfectly similar, and as such sampling with replacement and "diag = TRUE" seems undesirable at best.

Author(s)

Gavin L. Simpson

References

Sawada, M., Viau, A.E., Vettoretti, G., Peltier, W.R. and Gajewski, K. (2004) Comparison of North-American pollen-based temperature and global lake-status with CCCma AGCM2 output at 6 ka. Quaternary Science Reviews 23, 87–108.

See Also

mat for fitting MAT models and analog for analogue matching. roc as an alternative method for determining critical values for dissimilarity measures when one has grouped data.

plot.mcarlo provides a plotting method to visualise the distribution of simulated dissimilarities.

Examples

## Imbrie and Kipp example
## load the example data
data(ImbrieKipp)
data(SumSST)
data(V12.122)

## merge training and test set on columns
dat <- join(ImbrieKipp, V12.122, verbose = TRUE)

## extract the merged data sets and convert to proportions
ImbrieKipp <- dat[[1]] / 100
V12.122 <- dat[[2]] / 100

## perform the modified method of Sawada (2004) - paired sampling,
## with replacement
ik.mcarlo <- mcarlo(ImbrieKipp, method = "chord", nsamp = 1000,
                    type = "paired", replace = FALSE)
ik.mcarlo

## plot the simulated distribution
layout(matrix(1:2, ncol = 1))
plot(ik.mcarlo)
layout(1)

analogue documentation built on Sept. 30, 2024, 9:41 a.m.