View source: R/match.template.R
match.template | R Documentation |
This function solves the multidimensional assignment problem with decomposable costs (MDADC) by matching the data to a pre-specified set of vectors (the template). The dissimilarity function is the squared Euclidean distance.
match.template(x, template = 1L, unit = NULL, w = NULL, method = c("hungarian","bruteforce"), equal.variance = FALSE)
x |
data: matrix of dimensions mn \times p or 3D array of dimensions (p,m,n) with m = number of labels/classes, n = number of sample units, and p = number of variables) |
template |
integer (= which sample unit to take as template) or (p,m) matrix |
unit |
integer (=number of units/datasets) or vector mapping rows of |
w |
weights for the loss function. Can be specified as a p-vector of diagonal weights or as a full p \times p (positive definite) matrix |
method |
method for the linear assignment problem: |
equal.variance |
logical; if TRUE, resp. FALSE, return common, resp. label-specific, covariance of matched features |
Given n datasets or statistical units, each containing m feature vectors, the one-to-one matching problem is to find a set of n label permutations that produce the best match of feature vectors across units. The objective function to minimize is the sum of squared (Euclidean) distances between all feature vectors having the same (new) label. This amounts to minimizing the sum of the within-label variances.
The template-based method consists in relabeling successively each sample unit to best match a template matrix of feature vectors. This method is very fast but its optimization performance is only as good as the template. For best results, the template should be representative of the collected data.
If x
is a matrix, the rows should be sorted by increasing unit label and unit
should be a nondecreasing sequence of integers, for example (1,...,1,2,...,2,...,n,...,n) with each integer 1,...,n replicated m times.
The argument w
can be specified as a vector of positive numbers (will be recycled to length p if needed) or as a positive definite matrix of size (p,p).
A list of class matchFeat
with fields
sigma
best assignement as set of permutations (m\times n matrix)
cluster
best assignement as cluster indicators (m \times n matrix)
objective
minimum objective value
mu
mean vector for each class/label (p \times m matrix)
V
covariance matrix for each class/label (p \times p \times m array if equal.variance
is FALSE, p \times p matrix otherwise
call
function call
Degras (2022) "Scalable feature matching across large data collections."
doi: 10.1080/10618600.2022.2074429
https://en.wikipedia.org/wiki/Assignment_problem
https://en.wikipedia.org/wiki/Hungarian_algorithm
match.2x
, match.bca
,
match.bca.gen
, match.gaussmix
, match.kmeans
, match.rec
## Generate data n <- 10 k <- 3 d <- 5 mu <- matrix(1:k, nrow=d, ncol=k, byrow=TRUE) sigma <- 0.3 x <- array(mu, c(d,k,n)) + rnorm(d*k*n,sigma) ## Match all feature vectors with first case as template result <- match.template(x,1) ## Display results result$cost # cost function xmatched <- array(dim=dim(x)) # re-arranged (matched) feature vectors for (i in 1:n) xmatched[,,i] <- x[,result$sigma[,i],i]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.