Description Usage Arguments Details Value References Examples
View source: R/similarity-scores.R
Compares statistics for a player of interest to a the statistics for a large group of players. Returns the distance from each player in the group to the player of interest.
1 | findDiffs2(stats_fp, stats_comps, weights)
|
stats_fp |
data.frame of one row. The statistics for the focus player. |
stats_comps |
data.frame with identical columns to |
weights |
named vector of length |
findDiffs2
separates the statistics into quantitative and
categorical statistics. It then passes the categorical statistics (for
both the focus player and the comps) and the categorical weights to
catDiffs2
to calculate differences for categorical variables.
It passes the quantitative statistics and weights to
numDiffs2
calculate differences for quantitative variables.
Using Gower's formula, distances for categorical variables between player j and the focus player are 0 if the players have the same value of the variable, and 1 if it's different. Difference for the i^{th} quantitative variable for the j^{th} player are d_{ij} = (stat_{ij} - stat_{i,fp}) / range(stat_i).
findDiffs2
then records missing data as m_{ij} = 0 if the i^{th}
statistic for player j or for the focus player is missing. Let
delta_{j} be the distance between player j and the focus player and
w_i be the weight for the i^{th} statistic (i = 1...I). Using
Gower's distance metric,
delta_j = sum(i=1:I; w_i m_{ij} d_{ij}) / sum(i=1:I; w_i d_{ij})
a vector of differences. The jth element of the vector corresponds
to the difference between stats_fp
and the jth row of
stats_comps
. Differences are calculated using Gower's distance
formula.
Gower, J. C. (1971) A general coefficient of similarity and some of its properties, Biometrics 27, 857 - 874.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | ### Using the iris data
wts <- (1:5) %>% setNames(names(iris))
fp <- data.frame(5,4,3,2,"setosa") %>% setNames(names(iris))
i <- head(iris)
findDiffs2(fp, i, wts)
# ratio of weights are all that matters
wts <- wts / 10
findDiffs2(fp, i, wts) # same thing!
# columns not names in wts will be removed
wts <- wts[-4]
findDiffs2(fp, i, wts) # still works
# can handle missing data!
i <- head(iris)
i[3,4] <- NA
findDiffs2(fp, i, wts)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.