minimum_subset_distance: Method 1: Minimum subset distance
In statdivlab/rre: Replicate Richness Estimator

Description Usage Arguments Details Examples

The regularization parameter λ is chosen for its ability to produce subset estimates with low between-subset variance.

1
2
3

minimum_subset_distance(fct_list, lambda_vec = seq(0, 20, by = 2),
  starts = data.frame(alpha = c(0.01, 0.01), delta = c(0.01, 1e-04)),
  partitions = 10, multiplier = 20, c_seq_len = 96, ...)

`fct_list`	A list of frequency count tables, assumed to be replicates.
`lambda_vec`	The values of the penalty parameter we select from.
`starts`	Starting values for `alpha` and `delta` in the MLE procedure.
`partitions`	An integer indicating the number of times to partition the data into two subsets
`multiplier`	The upper bound of the grid of candidate C values, stated in terms of a multiple of the maximum observed richess (c). For example if c is 50 and multiplier is 10, the method evaluates the likelihood in a C grid from 50 to 500.
`c_seq_len`	The number of points in the C grid search.

Method 1 is motivated by the belief that if we resample from the same population, an ideal C estimator should have low variance. Exploiting the fact that we have replicate data, the idea is to repeatedly partition the replicates into two subsets and come up with two estimates. We select the λ which yields the lowest between-subset variance. This partitioning is repeated partitions times to average out the arbitrary choice of subsets. See paper or source code for more detail.