minimum_subset_distance: Method 1: Minimum subset distance

Description Usage Arguments Details Examples

View source: R/tuning_proposals.R

Description

The regularization parameter λ is chosen for its ability to produce subset estimates with low between-subset variance.

Usage

1
2
3
minimum_subset_distance(fct_list, lambda_vec = seq(0, 20, by = 2),
  starts = data.frame(alpha = c(0.01, 0.01), delta = c(0.01, 1e-04)),
  partitions = 10, multiplier = 20, c_seq_len = 96, ...)

Arguments

fct_list

A list of frequency count tables, assumed to be replicates.

lambda_vec

The values of the penalty parameter we select from.

starts

Starting values for alpha and delta in the MLE procedure.

partitions

An integer indicating the number of times to partition the data into two subsets

multiplier

The upper bound of the grid of candidate C values, stated in terms of a multiple of the maximum observed richess (c). For example if c is 50 and multiplier is 10, the method evaluates the likelihood in a C grid from 50 to 500.

c_seq_len

The number of points in the C grid search.

Details

Method 1 is motivated by the belief that if we resample from the same population, an ideal C estimator should have low variance. Exploiting the fact that we have replicate data, the idea is to repeatedly partition the replicates into two subsets and come up with two estimates. We select the λ which yields the lowest between-subset variance. This partitioning is repeated partitions times to average out the arbitrary choice of subsets. See paper or source code for more detail.

Examples

1

statdivlab/rre documentation built on Nov. 5, 2019, 9:20 a.m.