Description Usage Arguments Value Examples
The function iteratively learns which observations should at least be excluded from the data to reach a conservative 'goal value' for the statistic of interest. It does so by relying on a genetic algorithm, which efficiently explores the (usually vast) space of possible subsets. The result can uncover impactful subsamples and fuel discussions of robustness. Necessary arguments include the dataframe, a function to compute the statistic of interest ('statistic_computation' see examples), and the goal value of interest.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
data |
A data.frame containing the observations as rows. |
goal_value |
This conservative value (e.g., small effect size) is targeted. |
statistic_computation |
A formula which has 'data' as input and returns the statistic of interest. |
max_exclusions |
maximum number of cases to be excluded |
pop |
Number of 'individuals' in each generation of the genetic algorithm. |
max_generations |
Maximum number of generations that the algorithm generates. |
exclusion_cost |
Used to calibrate fitness function. |
prop_included_cases |
Initial proportion of included cases (e.g. .90). |
chance_of_mutation |
Chance that a gene mutates, higher is slower but more accurate (e.g. .02). |
stop_search |
After how many generations without change is the 'converged' result returned. |
random_seed |
Seed for replicability. |
large_sample_drops |
drops one row from best (helps converge with large samples) |
named list. includes how many and which rows were excluded plus the original and new statistic
1 2 3 4 5 6 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.