Simulate missing morphometric data with taxonomic bias
This function simulates higher frequency of missing data points in groups that are less numerically well represented in the whole sample, relative to other group. These groups may represent taxa (as used in Brown et al., In Press), but may also represent any other group of interest (e.g. populations, trials, subsamples, etc.). From a morphometric dataset, this function selects a number of specimens to have data points removed from and a number of measurements to remove from each of these specimens based on the distribution of missing data produced by
missing.data. A vector containing the number of measurements to remove from each specimen is produced and sorted into descending order. Specimens are then sampled without replacement with a probability relative to the sum of the entire sample sizes divided by the number of specimens its respective group. The order the specimens are sampled determines the number of data points to be removed (i.e. the first to be sampled has the most removed). A complete mathematical description may be found in Brown et al. (In Press).
byclade(x, remperc, ngroups, groups)
A n X m matrix of morphometric data with n specimens and m variables
The percentage of data to be removed from the matrix, expressed as a decimal (ex: 30 percent would be entered as 0.3)
The number of taxonomic groups present in the data matrix
A vector of length n specifying taxonomic group membership as integers (ex: c(1,1,2,2,3,3,...) )
returns a n X m matrix of morphometric data with missing variables input as 'NA'
J. Arbour and C. Brown
Brown, C., Arbour, J. and Jackson, D. 2012. Testing of the Effect of Missing Data Estimation and Distribution in Morphometric Multivariate Data Analyses. Systematic Biology 61(6):941-954.
Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.