cv_replicates: Method 2 and Method 4

Description Usage Arguments Details Examples

View source: R/tuning_proposals.R

Description

Method 2 is cross-validation using the likelihood in the evaluation step. Method 4 is cross-validation using the goodness of fit statistic in the evaluation step.

Usage

1
2
3
4
cv_replicates(fct_list, lambda_vec = seq(0, 20, by = 2),
  starts = data.frame(alpha = c(0.01, 0.01), delta = c(0.01, 1e-04)),
  partitions = 10, eval_function = "gof_chi_sq", multiplier = 20,
  c_seq_length = 96, ...)

Arguments

fct_list

A list of frequency count tables, assumed to be replicates.

lambda_vec

The values of the penalty parameter we consider in selecting λ.

starts

Starting values for alpha and delta in the MLE procedure.

partitions

An integer indicating the number of times to randomly split the data into testing and validating subsets.

eval_function

A function which evaluates how well a set of parameters fit a list of frequency count tables. To conform to goodness of fit, we use the negative of the likelihood function so that low scores are better.

multiplier

The upper bound of the grid of candidate C values, stated in terms of a multiple of the maximum observed richess (c). For example if c is 50 and multiplier is 10, the method evaluates the likelihood in a C grid from 50 to 500.

c_seq_len

The number of points in the C grid search.

Details

Methods 2 and 4 have very similar structure we we've included them both in the same function. To run each method use:

  1. Method 2: cv_replicates(..., "neg_unreg_like")

  2. Method 4: cv_replicates(..., "gof_chi_sq")

In each method we partition the data partitions times into training and evaluation subsets. An estimate for each λ in lambda_vec is generated and we evaluate them using the evaluation subset. The evaluation step depends on the method, see paper or source code for details of how these functions work.

Examples

1
cv_replicates(nb_fct_simulation(100, 0.1, 0.1, 2))

statdivlab/rre documentation built on Nov. 5, 2019, 9:20 a.m.