# OSDR: Finds an Optimal System of Distinct Representatives from a... In OSDR: Finds an Optimal System of Distinct Representatives

## Description

The function finds an Optimal Set of Distinct Representatives (OSDR) from an ordered family of subsets. Optimality is order-wise in the sense specified by Gale: any other SDR cannot be larger and its elements cannot be chosen from more important sets (see details).

## Usage

 1 OSDR(M) 

## Arguments

 M An ordered family of subsets. In assignment problems the list specifying the feasible applicants for each job. In matching problems the list specifying the controls matchable with each treated unit. The list is assumed ordered by decreasing assignment/matching priority.

## Details

Consider a set of jobs and a set of suitable applicants for each job. It was shown by D.Gale that there exists an optimally assignable subset of the jobs in the sense that if {a_1, … a_k} is optimal and {b_1, … b_h} is another assignable set of jobs then: a) k ≥ h and b) a_i ≤ b_i for all i. The latter means that job a_i has as higher a priority as job b_i. From a graph perspective, if J is the set of jobs and A is the set of applicants the OSDR is an order wise maximum size matching of the bipartite graph with vertex set J union A.

Function OSDR finds the optimal matching using an algorithmic proof of Hall's theorem due to logician D.J. Shoesmith. The algorithm assigns greedily until possible and correct backward when necessary. A message is given when a correction is attempted (when it fails another message warns that Hall's condition is not satisfied).

## Value

A list containing three elements: the OSDR (a zero in position i indicates impossibility of assigning job i), the index of optimally assignable jobs and the index of unassignable jobs. If the latter is not empty M does not meet Hall's condition, i.e., a complete SDR is not possible.

## Note

In statistical matching problems the sets J and A are the sets of treated and control units and it is usually required to find the minimum covariate distance matching of treated versus control units. When a complete matching does not exist it can be convenient to find either an order optimal matching or a minimum cost matching on the OSDR subset (see the gender gap example).

## Author(s)

Massimo Cannas <massimo.cannas@unica.it>

## References

Gale, D. (1968) Optimal matching in an ordered set: an application of matroid theory. Journal of Combinatorial Theory 4: 176-180.

Anderson, I. (1989) A first course in Combinatorial Mathematics. Oxford University Press.

See also matlist,  listmat
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 ### example 1 # M is the list of suitable applicants for five jobs M1 <- c("A" , "B") M2 <- c("B" , "C") M3 <- c("B") M4 <- c("A" , "C") M5 <- c("B" , "C" , "D") M <- list(M1 , M2 , M3 , M4 , M5) OSDR(M) # $OSDR # [1] "A" "C" "B" "0" "D" #$matched # [1] 1 2 3 5 # $unmatched # [1] 4 # job 4 unmatched so Hall's condition is not satisfied: it's impossible to fill all the jobs # note that there are (order-\emph{suboptimal}) assignments of the same length of the optimal: # eg: 0CBAD , BC0AD #### example 2: sligthly modified: more than one order optimal matching M1<-c("A","B","C") M2<-c("A","C") M3<-c("B") M4<-c("A","C") M5<-c("A","D") M <-list(M1,M2,M3,M4,M5) OSDR(M) # note there are other order optimal matchings: ACB0D or CAB0D # note there are also other maximum size matchings (not order optimal): # e.g. 0CBAD or BC0AD #### Case Study: matching men and women executives # load executive data data(exdata) # descriptives on: # sex(0=M; 1=F) ; # position (4=top manager, 3=medium/first line manager, 2 =supervisor); table(exdata$sex)# there are more women table(exdata$position,exdata$sex)# and more in apical position # order by matching priority (high-rank women first) # see e.g. Lynn and Thompson(1997), J. of Appl. Psych. 82(3)) data <- exdata[order(-exdata$sex,-exdata$position, -exdata$seniority),] # covariate distance matrix require(optmatch) dist <- match_on(sex ~ position+education+year_born+contract+part_fulltime+seniority , data=exdata) # use broad caliper to avoid very bad matches dist <- caliper(dist,4,values=TRUE) # minimum distance pair matching (package optmatch) copt <- pairmatch(dist,data=exdata) summary(copt) sum(matched.distances(copt,dist)) # total cost 19 ### find osdr #order dist by priority order (i.e by decreasing position) dist <- as.matrix(dist)[order(match(rownames(dist),rownames(exdata))),] mylist <- matlist(dist) res <- OSDR(mylist) # index and labels of treated and untreated in OSDR ord_dist<-as.matrix(dist)[order(match(names(mylist),rownames(exdata))),] index_t<-res$matching; names_t<-rownames(ord_dist)[index_t] index_ut<-res$unmatched;names_ut<-rownames(ord_dist)[index_ut] # compare matched treated: optmatch vs ordmatch #matched treated optmatch matched(copt)[which(exdata$sex==1)] #matched treated ordmatch rownames(data)[which(exdata\$sex==1)] %in% names_t # compare total matching cost: optmatch vs ordmatch # case 1: distance matrix is zero infinity: same cost (0) # case 2: distance matrix is not zero infinity # find minimum cost matching on osdr: data2<-exdata[-match(names_ut,rownames(exdata)),] dist2<-as.matrix(dist)[-match(names_ut,rownames(dist)),] copt2 <- pairmatch(dist2,data=data2) summary(copt2);copt2 sum(matched.distances(copt2,dist2)) # 22 sum(matched.distances(copt,dist)) # 19