groupmatch: Optimal full matching with control groups

View source: R/groupmatch.R

groupmatchR Documentation

Optimal full matching with control groups

Description

This is an adaption of fullmatch to allow for restrictions when control observations are "grouped". The motivating use case is when there are multiple observations of control data for each control subject. In this case, the grouping variable is the subject. We may want to place restrictions, for example that only one observation of a subject can be matched, or in the case of one:many matching, a given control subject can only be matched to a given treated subject once.

Usage

groupmatch(
  x,
  group = NULL,
  allow_duplicates = FALSE,
  min.controls = 0,
  max.controls = Inf,
  omit.fraction = NULL,
  mean.controls = NULL,
  tol = 0.001,
  data = NULL,
  ...
)

Arguments

x

Any valid input to match_on. groupmatch will use x and any optional arguments to generate a distance before performing the matching.

If x is a numeric vector, there must also be passed a vector z indicating grouping. Both vectors must be named.

Alternatively, a precomputed distance may be entered. A matrix of non-negative discrepancies, each indicating the permissibility and desirability of matching the unit corresponding to its row (a 'treatment') to the unit corresponding to its column (a 'control'); or, better, a distance specification as produced by match_on. A simple distance specification - for example, a matrix of propensity score distances - can be enhanced by combining it with matrices representing exact-match or other caliper restrictions. The final matrix, including these constraints, is a valid input to groupmatch.

group

Grouping variable for control group. In the case of rolling enrollment, this will be a unique subject identifier pertaining to all 'copies' or 'versions' of the same subject.

allow_duplicates

When allow_duplicates is FALSE, the algorithm ensures that exactly one 'copy' or 'version' of each unique potential comparison subject is included in the matched comparison group, corresponding to Problem A in Pimentel et al. (2019). When allow_duplicates is TRUE, the algorithm permits different versions of the same potential comparison subject to match to different treatment subjects, corresponding to Problem B in Pimentel et al (2019). So, for example, when allow_duplicates is FALSE, only one version of unique potential comparison subject C1 could match to any treatment subject; when allow_duplicates is TRUE, versions C1a and C1d could match to treatment subjects T4 and T7, respectively.

min.controls

The minimum ratio of controls to treatments that is to be permitted within a matched set: should be non-negative and finite. If min.controls is not a whole number, the reciprocal of a whole number, or zero, then it is rounded down to the nearest whole number or reciprocal of a whole number.

Currently, groupmatch requires that min.controls be greater than or equal to 1. min.controls less than one implies matching with replacement, which scenario is currently under development.

When matching within subclasses (such as those created by exactMatch), min.controls may be a named numeric vector separately specifying the minimum permissible ratio of controls to treatments for each subclass. The names of this vector should include names of all subproblems distance.

max.controls

The maximum ratio of controls to treatments that is to be permitted within a matched set: should be positive and numeric. If max.controls is not a whole number, the reciprocal of a whole number, or Inf, then it is rounded up to the nearest whole number or reciprocal of a whole number.

When matching within subclasses (such as those created by exactMatch), max.controls may be a named numeric vector separately specifying the maximum permissible ratio of controls to treatments in each subclass.

omit.fraction

Optionally, specify what fraction of controls or treated subjects are to be rejected. If omit.fraction is a positive fraction less than one, then groupmatch leaves up to that fraction of the control reservoir unmatched. If omit.fraction is a negative number greater than -1, then groupmatch leaves up to |omit.fraction| of the treated group unmatched. Positive values are only accepted if max.controls >= 1; negative values, only if min.controls <= 1. If neither omit.fraction nor mean.controls is specified, then only those treated and control subjects without permissible matches among the control and treated subjects, respectively, are omitted.

When matching within subclasses (such as those created by exactMatch), omit.fraction specifies the fraction of controls to be rejected in each subproblem, a parameter that can be made to differ by subclass by setting omit.fraction equal to a named numeric vector of fractions.

At most one of mean.controls and omit.fraction can be non-NULL.

mean.controls

Optionally, specify the average number of controls per treatment to be matched. Must be no less than than min.controls and no greater than the either max.controls or the ratio of total number of controls versus total number of treated. Some controls will likely not be matched to ensure meeting this value. If neither omit.fraction or mean.controls are specified, then only those treated and control subjects without permissible matches among the control and treated subjects, respectively, are omitted.

When matching within subclasses (such as those created by exactMatch), mean.controls specifies the average number of controls per treatment per subproblem, a parameter that can be made to differ by subclass by setting mean.controls equal to a named numeric vector.

At most one of mean.controls and omit.fraction can be non-NULL.

tol

Because of internal rounding, groupmatch may solve a slightly different matching problem than the one specified, in which the match generated by groupmatch may not coincide with an optimal solution of the specified problem. tol times the number of subjects to be matched specifies the extent to which groupmatch's output is permitted to differ from an optimal solution to the original problem, as measured by the sum of discrepancies for all treatments and controls placed into the same matched sets.

data

Optional data.frame or vector to use to get order of the final matching factor. If a data.frame, the rownames are used. If a vector, the names are first tried, otherwise the contents is considered to be a character vector of names. Useful to pass if you want to combine a match (using, e.g., cbind) with the data that were used to generate it (for example, in a propensity score matching).

...

Additional arguments, including within, which may be passed to match_on.

Value

A optmatch object (factor) indicating matched groups.

References

Pimentel, SD, Forrow, LV, Gellar, J, and J Li (2019). Optimal matching approaches in health policy evaluations under rolling enrolment. Journal of the Royal Statistical Society Series A 183(4), 1411-1435. https://doi.org/10.1111/rssa.12521


jgellar/GroupMatch documentation built on Nov. 8, 2022, 10:48 p.m.