knitr::opts_chunk$set(collapse = TRUE, prompt=TRUE) library(magrittr)
Optmatch finds optimal matches by translating them to min-cost flow problems (Rosenbaum, 1991, JRSS-B; Hansen and Klopfer, 2006, JCGS), relying on a min-cost flow solver that works by dual ascent. Through 2018 (version 0.9*), no attempt was made to store the dual problem solution found by this solver; accordingly, it was not possible to use that solution as a starting value for related matching problems. This document lays out a roadmap for "daylighting" the min-cost flow material, i.e. making it accessible to the R user, for warm starts and supplementary calculations.
SubProbInfo
An SubProbInfo
is an S4-typed data frame bearing information about (sub)problems passed to the solver, and their translation to and from units acceptable to the solver. Columns:
groups
, character (which subproblem?);flipped
, logical (do upstream nodes of MCF representation correspond to columns of match distance matrix, as opposed to rows, the default?);hashed_dist
, character (hashed ID for a double precision distance);resolution
, double (grid-width for discretization of distances & node prices before rounding and handing off to solver);lagrangian_value
, numeric (as determined by back-transformed node prices, arc costs and arc flows);dual_value
, numeric (as determined by back-transformed node prices and arc costs);feasible
, logical;exceedance
, double (the legacy criterion regret calculation).Rules/conventions:
feasible
encodes whether the flow-price pair was found to be in a complementary slackness relationship, prior to backtransformation of distances & prices (if applicable). In other words, feasible==FALSE
indicates that the subproblem imposed infeasible matching constraints.NodeInfo
& subclassesA NodeInfo
is an S4-typed data frame bearing a
non-null row.names
attribute and columns:
name
, character;price
, double or integer (node prices);upstream_not_down
, logical (node type indicator, see below);supply
, integer;groups
, factor (optional; name of a subproblem); andRules/conventions:
NodeInfo
table.NodeInfo
object.NodeInfo
's name
column and its node labels are the same.
The reason for the presence of both is to avoid duplication among node labels,
which might otherwise occur when a NodeInfo
arises by concatenation of solutions
to distinct matching problems (which so happened to involve distinct units
that shared names).groups
, values of name
should be unique. (Presently the validity
checker only looks for this when groups
has a single level, in the interests of avoiding
slowdowns when combining NodeInfo's.)upstream_not_down
column: TRUE ~
upstream, i.e.
"T(f)" nodes in Fig. 2 of H.&K. '06, FALSE ~
downstream or "C(f)"
nodes in H.&K. '06 Fig. 2; NA ~
bookkeeping nodes, e.g. "Overflow" or "Sink"
in H.&K. '06 Fig. 2.price
can be NA_real_
, but only if upstream_not_down=FALSE
. See also Rules/conventions pertaining to MCFSolutions
objects, below.NodeInfo
.ArcInfo
An ArcInfo
has 2 slots:
@matches
, data frame with columnsgroups
, factor,upstream
, factor,downstream
, factor;@bookkeeping
, data frame with columnsgroups
, factor,start
, factor,end
, factor,flow
, integer (nonnegative),capacity
, integer (nonnegative).Rules/conventions:
@matches
encodes a corresponding flow of 1; absence encodes flow 0. (So having the row means its upstream
/start
node and downstream
/end
nodes were matched, absence means not matched.)upstream
, downstream
, start
and end
all have the same levels()
.capacity
. Their flow
values must fall in this range (inclusive of endpoints).@bookkeeping
d.f. must have a row for each of the bookkeeping arcs of the problem.groups
, upstream
) pair must appear as a (groups
, name
) pair in the NodeInfo table, with upstream_not_down==TRUE
; each (groups
, downstream
) pair must appear as a (groups
, name
) NodeInfo pair, with upstream_not_down==FALSE
.groups
, start
) or (groups
, end
) pair carried in this table must have in the NodeInfo table a corresponding (groups
, name
) entry.MCFSolutions
)In practice, current plans call only for passing dual solutions (arrays of node prices), not also primal solutions (flow vectors), back to the solver. But in principle the relax4f solver could accommodate dual-primal pairs as start values, at the cost of rejiggering our interactions; see comments to issue 162. Also assessing CS requires the combo of a primal and dual solution. So we'll save both of them, as an S4 object.
Slots for class MCFSolutions
:
subproblems
, a "SubProbInfo" object (see above);nodes
, a NodeInfo
object (see above); andarcs
, a ArcInfo
object (see above).Rules/conventions:
@arcs
all have precisely the same collection
of levels (as is enforced by the ArcInfo
validity checker). In addition, they
coincide with row.names(.@nodes)
. The latter to be enforced by MCFSolutions
's
validity check.@nodes
table has to have a groups
column.@node$price
value may be NA_real_
and references to them will not appear
in corresponding ArcInfo tables.row.names(@nodes)
).@nodes
table carries an explicit record of their subgroup, which
@subproblems
and @nodes
combine to give row/column or
treatment/control status viaifelse(with(object@subproblems, flipped[match(object@nodes$groups,groups)]), !object@nodes$upstream_not_down, object@nodes$upstream_not_down)
c()
should not routinely check validity
on its result; rather the check should be applied to smaller objects.@.Data
, @names
and @levels
slots for likely future use;
see below notes re future Optmatch
S4 class.Different kinds of matching problem have different bookkeeping nodes, implicit arc capacity constraints etc. Encode all this by declaring appropriate subclasses of MCFSolutions()
.
FullmatchMCFSolutions
A type extending MCFSolutions()
, w/ characteristics:
@nodes
, exactly 2 node labels per subproblem s.t. is.na(upstream_not_down)
: a string beginning with "(_Sink_)
" and another beginning with "(_End_)
";@nodes
, also each of those appears exactly once per subproblem;@nodes
, each node with upstream_not_down == TRUE
has positive supply
;@nodes
, each node with upstream_not_down == FALSE
has supply==0
;@nodes
, each node with is.na(upstream_not_down)
has nonpositive supply
;@arcs@bookkeeping
, each node with !is.na(upstream_not_down)
appears as the start
of an arc that has end node (a string beginning w/) (_End_)
;@arcs@bookkeeping
, each node with upstream_not_down == FALSE
appears as the start
of an arc that has end node named by a string beginning with (_Sink_)
.fullmatch
/ pairmatch
(*ptmatch)Return an object bearing S3 class "optmatch", as we did before these changes.
Embed an MCFSolutions
into it as an attribute.
Return object of a new S4 class, Optmatch
, inheriting from factor as well as from MCFSolutions
.
as.optmatch()
/as.factor()
for MCFSolutions
objectsnode.labels()
,node.labels<-
for MCFSolutions
objects (to pull /set row.names(@nodes)
)getMCFSolution()
: extractor for optmatch objects (& later for Optmatch objects)c()
for MCFSolutions
: rbind()
s the various constituent data frames, enforcing req's that subproblems have distinct names, de-duping bookkeeping nodes and enforcing requirement that non-bookkeeping nodes' names be distinct.addArcs()
for MCFSolutions
objectsaddNodes()
for MCFSolutions
objectsAdd the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.