multilevel_break: Find smallest subset to exclude from clustered sample for...
In hannesrosenbusch/StatBreak: Examine robustness of sample statistics to case deletions

Description Usage Arguments Value Examples

View source: R/multilevel_break.R

The function iteratively learns which groups (see argument grouping_var) should at least be excluded from the data to reach a conservative 'goal value' for the statistic of interest. It does so by relying on a genetic algorithm, which efficiently explores the (usually vast) space of possible subsets. The result can uncover impactful subsamples and fuel discussions of robustness. Necessary arguments include the dataframe, a function to compute the statistic of interest ('statistic_computation' see examples), the column with the grouping variable ('grouping_var'), and the goal value of interest.

multilevel_break(
  data = NULL,
  goal_value = NULL,
  statistic_computation = NULL,
  grouping_var = NULL,
  max_exclusions = NULL,
  pop = 200,
  max_generations = 300,
  exclusion_cost = 0.01,
  prop_included_cases = 0.9,
  chance_of_mutation = 0.05,
  stop_search = 100,
  random_seed = 42
)

`data`	A data.frame containing the observations as rows.
`goal_value`	This conservative value (e.g., small effect size) is targeted.
`statistic_computation`	A formula which has 'data' as input and returns the statistic of interest.
`max_exclusions`	maximum number of groups to be excluded
`pop`	Number of 'individuals' in each generation of the genetic algorithm.
`max_generations`	Maximum number of generations that the algorithm generates.
`exclusion_cost`	Used to calibrate fitness function.
`prop_included_cases`	Initial proportion of included groups (e.g. .90).
`chance_of_mutation`	Chance that a gene mutates, higher is slower but more accurate (e.g. .02).
`stop_search`	After how many generations without change is the 'converged' result returned.
`random_seed`	Seed for replicability.

Vector of zeros and ones with length equal to number of observations in data. Ones indicate exclusion.

set.seed(42)
groups = c(0,0,0,0,0,1,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,0,0,0,0,0,0,0,0,1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,3,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,6,7,7,7,7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,8,8,8,8,8,8)
v1 = rnorm(length(groups))+0.7
v2 = rnorm(length(groups))

df = as.data.frame(cbind(groups, v1,v2))

st = function(data){
t.test(data$v1, data$v2)$p.value
}

multilevel_break(df, statistic_computation = st, goal_value = 0.05, grouping_var = 'groups', max_exclusions = 8)

hannesrosenbusch/StatBreak documentation built on Feb. 12, 2020, 10:35 a.m.

hannesrosenbusch/StatBreak index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

hannesrosenbusch/StatBreak
Examine robustness of sample statistics to case deletions

multilevel_break: Find smallest subset to exclude from clustered sample for...
In hannesrosenbusch/StatBreak: Examine robustness of sample statistics to case deletions

Description

Usage

Arguments

Value

Examples

Related to multilevel_break in hannesrosenbusch/StatBreak...

R Package Documentation

Browse R Packages

We want your feedback!

hannesrosenbusch/StatBreak Examine robustness of sample statistics to case deletions

multilevel_break: Find smallest subset to exclude from clustered sample for... In hannesrosenbusch/StatBreak: Examine robustness of sample statistics to case deletions

Description

Usage

Arguments

Value

Examples

Related to multilevel_break in hannesrosenbusch/StatBreak...

R Package Documentation

Browse R Packages

We want your feedback!

hannesrosenbusch/StatBreak
Examine robustness of sample statistics to case deletions

multilevel_break: Find smallest subset to exclude from clustered sample for...
In hannesrosenbusch/StatBreak: Examine robustness of sample statistics to case deletions