create_randomised_groups: Create randomised groups Creates randomised groups, e.g. for...

View source: R/RandomGrouping.R

create_randomised_groupsR Documentation

Create randomised groups Creates randomised groups, e.g. for tests that depend on splitting (continuous) data into groups, such as the Hosmer-Lemeshow test

Description

The default fast mode is based on random sampling, whereas the slow mode is based on probabilistic joining of adjacent groups. As the name suggests, fast mode operates considerably more efficient.

Usage

create_randomised_groups(
  x,
  y = NULL,
  sample_identifiers,
  n_max_groups = NULL,
  n_min_groups = NULL,
  n_min_y_in_group = NULL,
  n_groups_init = 30,
  fast_mode = TRUE
)

Arguments

x

Vector with data used for sorting. Groups are formed based on adjacent values.

y

Vector with markers, e.g. the events. Should be 0 or 1 (for an event).

sample_identifiers

data.table with sample_identifiers. If provide, a list of grouped sample_identifiers will be returned, and integers otherwise.

n_max_groups

Maximum number of groups that need to be formed.

n_min_groups

Minimum number of groups that need to be formed.

n_min_y_in_group

Minimum number of y=1 in each group for a valid group.

n_groups_init

Number of initial groups (default: 30)

fast_mode

Enables fast randomised grouping mode (default: TRUE)

Details

Creates randomised groups, e.g. for tests that depend on splitting (continuous) data into groups, such as the Hosmer-Lemeshow test

  • Determine maximum number of groups: either 10 or number so that each group has 5 events (if smaller).

  • Determine minimum number of groups (half the maximum, or 2). Groups cannot the exceed corresponding group size.

  • Start with 50 very small groups.

  • Iterate while the maximum number of groups has not been reached.

    • Selection probability is 1/n_j

    • If a group exceeds the maximum group size, selection probability is 0.

      • Break if all groups have exceeded the maximum size.

    • Get cumulative probability and normalise by total.

    • Draw random number between 0 and 1.

    • Select the group which has a cumulative probability range that contains the random number.

    • Draw a random number to decide whether to join the group with right or left adjacent group, and assign the group number to the adjacent group. Probability depends on the size of adjacent groups. Smaller sizes have greater probability of being joined. No joining with groups already exceeding the maximum group size. If surrounded on both sides, force selection probability for current group to 0. If joining is possible, update group size, and selection probability for the new group.

  • Check that 5 events are present in each group. For each group with < 5 events, try to join with neighbours.

  • Start over if the number of groups is smaller than the minimum number.

Value

List of group sample ids or indices.


familiar documentation built on Sept. 30, 2024, 9:18 a.m.