getMetrics: Generate four evaluation metrics from pairwise comparisons

Description Usage Arguments Value

View source: R/anyData.R

Description

This function runs hierarchical linkage using one of five linkage methods: single linkage, complete linkage, average linkage, centroid linkage and minimax linkage. For a data set with $n$ items, it is possible to get clusterings of sizes 1 through n. For each cluster size, we compute four evaluation metrics: 1. maximum minimax radius (see Bien et al. 2011), 2. misclassification rate, 3. precision, 4. recall.

Usage

1
2
3
4
5
6
7
8
getMetrics(
  allPairwise,
  pairColNums,
  matchColNum,
  distSimCol,
  linkage,
  myDist = TRUE
)

Arguments

allPairwise

name of data frame containing all pairwise comparisons. This needs to have at least four columns, one representing the first item in the comparison, one representing the second item, one representing the true match/non-match status, and the last representing a distance or similarity metric. These are enumerated in the next three parameters.

pairColNums

vector of length 2 indicating the column numbers in 'allPairwise' of 1. item 1 in comparison, 2. item 2 in comparison

matchColNum

column number of column in 'allPairwise' indicating true match/non-match status

distSimCol

name of column in 'allPairwise' indicating distances or similarities, input as character, e.g. "l2dist". If this is a similarity and not a difference, input 'myDist' parameter to be FALSE. If a similarity measure is used, distance will be calcualted as 1 - similarity.

linkage

one of "single", "complete", "average", "centroid", "minimax"

myDist

is 'distSimCol' a distance or similarity measure? Default TRUE, i.e. distance measure

Value

outMetrics, a data frame with each row representing a clustering. For a data set with $n$ items, there will be $n$ rows. Columns are the four evaluation metrics, 'maxMinimax', 'misClass', 'precision' and 'recall'.


xhtai/clusterTruster documentation built on May 22, 2020, 10:56 a.m.