Simulate missing morphometric data with taxonomic bias

Share:

Description

This function simulates higher frequency of missing data points in groups that are less numerically well represented in the whole sample, relative to other group. These groups may represent taxa (as used in Brown et al., In Press), but may also represent any other group of interest (e.g. populations, trials, subsamples, etc.). From a morphometric dataset, this function selects a number of specimens to have data points removed from and a number of measurements to remove from each of these specimens based on the distribution of missing data produced by missing.data. A vector containing the number of measurements to remove from each specimen is produced and sorted into descending order. Specimens are then sampled without replacement with a probability relative to the sum of the entire sample sizes divided by the number of specimens its respective group. The order the specimens are sampled determines the number of data points to be removed (i.e. the first to be sampled has the most removed). A complete mathematical description may be found in Brown et al. (In Press).

Usage

1
byclade(x, remperc, ngroups, groups)

Arguments

x

A n X m matrix of morphometric data with n specimens and m variables

remperc

The percentage of data to be removed from the matrix, expressed as a decimal (ex: 30 percent would be entered as 0.3)

ngroups

The number of taxonomic groups present in the data matrix

groups

A vector of length n specifying taxonomic group membership as integers (ex: c(1,1,2,2,3,3,...) )

Value

returns a n X m matrix of morphometric data with missing variables input as 'NA'

Author(s)

J. Arbour and C. Brown

References

Brown, C., Arbour, J. and Jackson, D. 2012. Testing of the Effect of Missing Data Estimation and Distribution in Morphometric Multivariate Data Analyses. Systematic Biology 61(6):941-954.

See Also

missing.data,obliterator