match.dummy: Create dummy individuals or sinks to a data matrix or a...

Description Usage Arguments Value Note Author(s) See Also Examples

View source: R/matchtools.R

Description

Dummy observations are allowed in order to make the number of observations dividable by the number of elements in each submatch, i.e. for pairwise matching the number of observations should be paired, for triangular matching the number of observations should be dividable by 3, etc. This can be done either by adding column averaged individuals to the original data frame (parameter 'dat'), or by adding zero distance sinks to the distance/dissimilarity matrix (parameter 'd'). The latter approach favors dummies being matched to real extreme observations, while the former favors dummies being matched to close-to-mean real observations.

Usage

1
match.dummy(dat, d, g = 2)

Arguments

dat

A data.frame of the original observations, to which column averaged new dummy observations are added

d

N times N distance/dissimilarity matrix, to which zero distance sinks are added

g

The desired number of elements per each submatch, i.e. the size of the clusters. The number of added dummies is the smallest number of additions that fulfills (N+dummy)%%g == 0

Value

Depending on if the dat or the d parameter was provided, the function either: dat: adds new averaged individuals according to column means and then returns the data matrix d: adds zero distance sinks to the distance/dissimilarity matrix and returns the new distance/dissimilarity matrix

Note

Adding zero distance sinks to the distance matrix or averaged individuals to the original data frame produce different results and affect the optimal matching task differently.

Author(s)

Teemu Daniel Laajala <teelaa@utu.fi>

See Also

match.allocate match.mat2vec match.vec2mat match.bb

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
data(vcapwide)

exdat <- vcapwide[1:10,c("PSAWeek10", "BWWeek10")]
dim(exdat)
avgdummies <- match.dummy(dat=exdat, g=3)
dim(avgdummies)
# Construct an Euclidean distance matrix after adding two dummy individuals 
# (averaged individuals to the original data matrix)
bb3 <- match.bb(as.matrix(dist(avgdummies)), g=3)
str(bb3)

# Construct an Euclidean distance matrix after adding two dummy distances (zero distance sinks)
exd <- as.matrix(dist(vcapwide[1:10,c("PSAWeek10", "BWWeek10")]))
dim(exd)
d <- match.dummy(d=exd, g=3)
dim(d)
# 10 is not dividable by 3, 2 sinks are added to make d 12x12
bb3 <- match.bb(d, g=3)
str(bb3)

# Notice that sinks produce a lot smaller target function costs than averaged individuals

Example output

[1] 10  2
[1] 12  2
[1] "Performing initial sorting for a good initial guess"
[1] "Computing boundaries for minimum distances in possible combinations..."
[1] "Starting branch and bound"
[1] "Branches: 44"
[1] "Bounds: 816"
[1] "Ends visited: 4"
[1] "Solution cost 85.0064247147366"
[1] "Solution: 4,3,4,1,3,4,2,1,1,3,2,2"
List of 6
 $ branches: num 44
 $ bounds  : num 816
 $ ends    : num 4
 $ matrix  : num [1:12, 1:12] 0 0 1 0 0 1 0 0 0 0 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:12] "ID003" "ID007" "ID008" "ID009" ...
  .. ..$ : chr [1:12] "ID003" "ID007" "ID008" "ID009" ...
 $ solution: Named num [1:12] 4 3 4 1 3 4 2 1 1 3 ...
  ..- attr(*, "names")= chr [1:12] "ID003" "ID007" "ID008" "ID009" ...
 $ cost    : num 85
[1] 10 10
[1] 12 12
[1] "Performing initial sorting for a good initial guess"
[1] "Computing boundaries for minimum distances in possible combinations..."
[1] "Starting branch and bound"
[1] "Branches: 89"
[1] "Bounds: 1527"
[1] "Ends visited: 4"
[1] "Solution cost 47.1926533983434"
[1] "Solution: 4,3,4,1,3,2,1,1,3,2,4,2"
List of 6
 $ branches: num 89
 $ bounds  : num 1527
 $ ends    : num 4
 $ matrix  : num [1:12, 1:12] 0 0 1 0 0 0 0 0 0 0 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:12] "ID003" "ID007" "ID008" "ID009" ...
  .. ..$ : chr [1:12] "ID003" "ID007" "ID008" "ID009" ...
 $ solution: Named num [1:12] 4 3 4 1 3 2 1 1 3 2 ...
  ..- attr(*, "names")= chr [1:12] "ID003" "ID007" "ID008" "ID009" ...
 $ cost    : num 47.2

hamlet documentation built on May 1, 2019, 8:40 p.m.