quickblock constructs near-optimal threshold blockings. The function
expects the user to provide distances measuring the similarity of
units and a required minimum block size. It then constructs a blocking
so that units assigned to the same block are as similar as possible while
satisfying the minimum block size.
integer with the required minimum number of units in each block.
restrict the maximum within-block distance.
logical indicating whether large blocks should be broken up into smaller blocks.
additional parameters to be sent either to the
caliper parameter constrains the maximum distance between units
assigned to the same block. This is implemented by restricting the
edge weight in the graph used to construct the blocks (see
sc_clustering for details). As a result, the caliper
will affect all blocks and, in general, make it harder for
the function to find good matches even for blocks where the caliper is not
binding. In particular, a too tight
caliper can lead to discarded
units that otherwise would be assigned to a block satisfying both the
matching constraints and the caliper. For this reason, it is recommended
to set the
caliper value quite high and only use it to avoid particularly
poor blocks. It strongly recommended to use the
caliper parameter only
primary_unassigned_method = "closest_seed" in the underlying
sc_clustering function (which is the default
The main algorithm used to construct the blocking may produce
some blocks that are much larger than the minimum size constraint. If
TRUE, all blocks twice as large as
size_constraint will be broken into two or more smaller blocks. Block
are broken so to ensure that the new blocks satisfy the size constraint.
In general, large blocks are produced when units are highly clustered,
so breaking up large blocks will often only lead to small improvements. The
blocks are broken using the
seed_method = "inwards_updating". The
governs how the seeds are selected in the nearest neighborhood graph that
is used to construct the blocks (see
for details). The
"inwards_updating" option generally works well
and is safe with most datasets. Using
seed_method = "exclusion_updating"
often leads to better performance (in the sense of blocks with more
similar units), but it may increase run time. Discrete data (or more generally
when units tend to be at equal distance to many other units) will lead to
particularly poor run time with this option. If the dataset has at least one
"exclusion_updating" is typically quick. A third
seed_method = "lexical", which decreases the run time relative
"inwards_updating" (sometimes considerably) at the cost of performance.
quickblock passes parameters on to
so to change
quickblock with the parameter
specified as usual:
quickblock(..., seed_method = "exclusion_updating").
qb_blocking object with the constructed blocks.
Higgins, Michael J., Fredrik Sävje and Jasjeet S. Sekhon (2016), ‘Improving massive experiments with threshold blocking’, Proceedings of the National Academy of Sciences, 113:27, 7369–7376. http://www.pnas.org/lookup/doi/10.1073/pnas.1510504113
sc_clustering for the underlying function used
to construct the blocks.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
# Construct example data my_data <- data.frame(x1 = runif(100), x2 = runif(100)) # Make distances my_distances <- distances(my_data, dist_variables = c("x1", "x2")) # Make blocking with at least two units in each block quickblock(my_distances) # Require at least three units in each block quickblock(my_distances, size_constraint = 3) # Impose caliper quickblock(my_distances, caliper = 0.2) # Break large block quickblock(my_distances, break_large_blocks = TRUE) # Call `quickblock` directly with covariate data (ie., not pre-calculating distances) quickblock(my_data[c("x1", "x2")]) # Call `quickblock` directly with covariate data using Mahalanobis distances quickblock(my_data[c("x1", "x2")], normalize = "mahalanobize")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.