create_randomised_groups: Create randomised groups Creates randomised groups, e.g. for...
In familiar: End-to-End Automated Machine Learning and Model Evaluation

create_randomised_groups

R Documentation

Create randomised groups Creates randomised groups, e.g. for tests that depend on splitting (continuous) data into groups, such as the Hosmer-Lemeshow test

Description

The default fast mode is based on random sampling, whereas the slow mode is based on probabilistic joining of adjacent groups. As the name suggests, fast mode operates considerably more efficient.

Usage

create_randomised_groups(
  x,
  y = NULL,
  sample_identifiers,
  n_max_groups = NULL,
  n_min_groups = NULL,
  n_min_y_in_group = NULL,
  n_groups_init = 30,
  fast_mode = TRUE
)

Arguments

`x`	Vector with data used for sorting. Groups are formed based on adjacent values.
`y`	Vector with markers, e.g. the events. Should be 0 or 1 (for an event).
`sample_identifiers`	data.table with sample_identifiers. If provide, a list of grouped sample_identifiers will be returned, and integers otherwise.
`n_max_groups`	Maximum number of groups that need to be formed.
`n_min_groups`	Minimum number of groups that need to be formed.
`n_min_y_in_group`	Minimum number of y=1 in each group for a valid group.
`n_groups_init`	Number of initial groups (default: 30)
`fast_mode`	Enables fast randomised grouping mode (default: TRUE)

Details

Creates randomised groups, e.g. for tests that depend on splitting (continuous) data into groups, such as the Hosmer-Lemeshow test

Determine maximum number of groups: either 10 or number so that each group has 5 events (if smaller).
Determine minimum number of groups (half the maximum, or 2). Groups cannot the exceed corresponding group size.
Start with 50 very small groups.
Iterate while the maximum number of groups has not been reached.
- Selection probability is 1/n_j
- If a group exceeds the maximum group size, selection probability is 0.
  - Break if all groups have exceeded the maximum size.
- Get cumulative probability and normalise by total.
- Draw random number between 0 and 1.
- Select the group which has a cumulative probability range that contains the random number.
- Draw a random number to decide whether to join the group with right or left adjacent group, and assign the group number to the adjacent group. Probability depends on the size of adjacent groups. Smaller sizes have greater probability of being joined. No joining with groups already exceeding the maximum group size. If surrounded on both sides, force selection probability for current group to 0. If joining is possible, update group size, and selection probability for the new group.
Check that 5 events are present in each group. For each group with < 5 events, try to join with neighbours.
Start over if the number of groups is smaller than the minimum number.