Compute Student Matches for all Pairs of Schools


Iterates over all possible treated-control school pairs, optionally computes and stores an optimal student match for each one, and generates a distance matrix for schools based on the quality of each student match.


matchStudents(students, treatment,, match.students,
student.vars, school.caliper = NULL, verbose, penalty.qtile, min.keep.pctg)



a dataframe containing student covariates, with a different row for each student.


the column name of the binary treatment status indicator in the students dataframe.

the column name of the unique school ID in the students dataframe.


logical value. If TRUE, students are matched within school pairs and some students will be excluded. If FALSE, all students will be retained in the matched sample for each school pair.


column names of variables in students on which to match students and assess balance of student matches in evaluating match quality.


matrix with one row for each treated school and one column for each control school, containing zeroes for pairings allowed by the caliper and Inf values for forbidden pairings. When NULL no caliper is imposed.


a logical value indicating whether detailed output should be printed.


a numeric value between 0 and 1 specifying a quantile of the distribution of all student-student matching distances. The algorithm will prefer to exclude treated students rather than form pairs with distances exceeding this quantile.


a minimum percentage of students in the smaller school in a pair which must be retained, even when treated students are excluded.


The penalty.qtile and min.keep.pctg control the rate at which students are trimmed from the match. If the quantile is high enough no students should be excluded in any match; if the quantile is very low the min.keep.pctg can still ensure a minimal sample size in each match.


A list with two elements:


a list with one element for each treated school. Each element is a list with one element for each control school, and each element of these secondary lists is a dataframe containing the matched sample for the corresponding treated-control pairing.


a matrix with one row for each treated school and one column for each control school, giving matching distances based on the student match.


Luke Keele, Penn State University,

Sam Pimentel, University of Pennsylvania,

Want to suggest features or report bugs for Use the GitHub issue tracker. Vote for new features on Trello.

comments powered by Disqus