fEMDDetailed: emdist:emd is quite restrictive. A more libral alternative.

View source: R/fEMDDetailed.R

fEMDDetailedR Documentation

emdist:emd is quite restrictive. A more libral alternative.

Description

So long as you can calculate a distance matrix between the two datasets, you can use this function to calculate the earth mover's distance between them. This lets you calculate EMD beteen any two datasets, unlike emdist::emd only lets compare numberic datasets of up to four dimensions. This additionally gives you the weights when transforming one dataset to another so you can make more detailed inferences about which data is contributing the most to the distances, etc.

Usage

fEMDDetailed(SNO1, SNO2, Distance, dtWeights1 = NULL, dtWeights2 = NULL)

Arguments

SNO1,

index of the obervation number from dataset one, eg. c(1,2,3,4,1,2,3,4,1,2,3,4)

SNO2,

index of the obervation number from dataset one, eg. c(1,1,1,1,2,2,2,2,3,3,3,3)

Distance

The distance between the data associated with the respective SNO1 and SNO2 values

dtWeights1

The weight for the respective SNO1 entry, a data.table with two columns - SNO1, Weight

Details

The output is an lpExtPtr type of object. The lpSolveAPI library has many operations that you can perform on such an object, for instance get.variables will get the value of the mapping performed by the EMD which is a useful detail to underestand what observations are contributing more to the distance.

Examples

# Two random datasets of three dimension
a = data.table(matrix(runif(21), ncol = 3))
b = data.table(matrix(runif(30), ncol = 3))
# adding serial numbers to each observation
a[, SNO := .I]
b[, SNO := .I]
# evaluating distance between all combinations of data in the two datasets
a[, k := 'k']
b[, k := 'k']
dtDistances = merge(a,b,'k',allow.cartesian = T)
dtDistances[,
   Distance := (
      (( V1.x - V1.y) ^ 2) +
      (( V2.x - V2.y) ^ 2) +
      (( V3.x - V3.y) ^ 2)
   ) ^ 0.5
]
# getting EMD between this dataet
lprec = fEMDDetailed(
   SNO1 = dtDistances[, SNO.x],
   SNO2 = dtDistances[, SNO.y],
   Distance = dtDistances[, Distance]
)
fGetEMDFromDetailedEMD(lprec)
# This value should be the same as that computed by EMD
# EMD needs the weightage of each point, which is assigned as equal in our 
# function, so giving 1/N weightage to each data point
emdist::emd(
   as.matrix(
      a[, list(1/.N, V1,V2,V3)]
   ),
   as.matrix(
      b[, list(1/.N, V1,V2,V3)]
   )
)

thecomeonman/CodaBonito documentation built on April 24, 2023, 11:41 a.m.