match_groups: Creates a matched group via backward selection.

Description Usage Arguments Details Value See Also

View source: R/ldamatch.R

Description

Creates a matched group via backward selection.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
match_groups(
  condition,
  covariates,
  halting_test,
  thresh = 0.2,
  method = ldamatch::matching_methods,
  props = prop.table(table(condition)),
  replicates = get("RND_DEFAULT_REPLICATES", .ldamatch_globals),
  min_preserved = length(levels(condition)),
  print_info = get("PRINT_INFO", .ldamatch_globals),
  max_removed_per_cond = NULL,
  tiebreaker = NULL,
  lookahead = 2,
  all_results = FALSE,
  prefer_test = TRUE,
  max_removed_per_step = 1,
  max_removed_percent_per_step = 0.5,
  ratio_for_slowdown = 0.5
)

Arguments

condition

A factor vector containing condition labels.

covariates

A columnwise matrix containing covariates to match the conditions on.

halting_test

A function to apply to 'covariates' (in matrix form) which is TRUE iff the conditions are matched. Signature: halting_test(condition, covariates, thresh). The following halting tests are part of this package: t_halt, U_halt, l_halt, ad_halt, ks_halt, wilks_halt, f_halt. You can create the intersection of two or more halting tests using create_halting_test.

thresh

The return value of halting_test has to be greater than or equal to thresh for the matched groups.

method

The choice of search method, one of "random", You can get more information about each method on the help page for "search_<method_name>" (e.g. "search_exhaustive").

props

Either the desired proportions (percentage) of the sample for each condition as a named vector, or the names of the conditions for which we prefer to preserve the subjects, in decreasing order of preference. If not specified, the (full) sample proportions are used. This is preferred among configurations with the same taken into account by the other methods to some extent. For example, c(A = 0.4, B = 0.4, C = 0.2) means that we would like the number of subjects in groups A, B, and C to be around 40%, 40%, and 20% of the total number of subjects, respectively. Whereas c("A", "B", "C") means that if possible, we would like to keep all subjects in group A, and prefer keeping subjects in B, even if it results in losing more subjects from C.

replicates

The maximum number of random replications to be performed. This is only used for the "random" method.

min_preserved

The minimum number of preserved subjects. It can be used to ensure that the search will not take forever to run, but instead fail when a solution is not found when preserving this number of subjects.

print_info

If TRUE, prints summary information on the input and the results, as well as progress information for the exhaustive search and random algorithms. Default: TRUE; can be changed using set_param("PRINT_INFO", FALSE).

max_removed_per_cond

A named integer vector, containing the maximum number of subjects that can be removed from each group. Specify 0 for groups if you want to preserve all of their subjects. If you do not specify a value for a group, it defaults to 2 less than the group size. Values outside the valid range of 0..(N-1) (where N is the number of subjects in the group) are corrected without a warning.

tiebreaker

NULL, or a function similar to halting_test, used to decide between cases for which halting_test yields equal values.

lookahead

The lookahead to use: a positive integer. It is used by the heuristic3 and heuristic4 algorithms, with a default of 2. The running time is O(N ^ lookahead), wheren N is the number of subjects.

all_results

If TRUE, returns all results found by method in a list. (A list is returned even if there is only one result.) If FALSE (the default), it returns the first result (a logical vector).

prefer_test

If TRUE, prefers higher test statistic more than the expected group size proportion; default is TRUE. Used by all algorithms except exhaustive, which always

max_removed_per_step

The number of equivalent subjects that can be removed in each step. (The actual allowed number may be less depending on the p-value / theshold ratio.) This parameters is used by the heuristic3 and heuristic4 algorithms, with a default value of 1.

max_removed_percent_per_step

The percentage of remaining subjects that can be removed in each step. Used when max_removed_per_step > 1, with a default value of 0.5.

ratio_for_slowdown

The p-value / threshold ratio at which it starts removing subjects one by one. Used when max_removed_per_step > 1, with a default value of 0.5.

Details

The exhaustive, heuristic3, and heuristic4 search methods use the foreach package to parallelize computation. To take advantage of this, you must register a cluster. For example, to use all but one of the CPU cores, run: doParallel::registerDoParallel(cores = max(1, parallel::detectCores() - 1)) To use sequential processing without getting a warning, run: foreach::registerDoSEQ()

Value

A logical vector that contains TRUE for the conditions that are in the matched groups; or if all_results = TRUE, a list of such vectors.

See Also

calc_p_value for calculating the test statistic for a group setup.

calc_metrics for calculating multiple metrics about the goodness of the result.

compare_ldamatch_outputs for comparing multiple different results from this function.

search_heuristic2, search_heuristic3, search_heuristic4, search_random, search_exhaustive for


ldamatch documentation built on May 23, 2021, 5:06 p.m.