findDiffs2: Calculates Differences for a Specified Player

Description Usage Arguments Details Value References Examples

View source: R/similarity-scores.R

Description

Compares statistics for a player of interest to a the statistics for a large group of players. Returns the distance from each player in the group to the player of interest.

Usage

1
findDiffs2(stats_fp, stats_comps, weights)

Arguments

stats_fp

data.frame of one row. The statistics for the focus player.

stats_comps

data.frame with identical columns to stats_fp. The statistics for the group of players to be compared. Each row corresponds to the statistics for a player. Differences will be calculated by comparing each row of stats_comps to stats_fp.

weights

named vector of length =ncol(stats_comps) specifying a weight for each variable.

Details

findDiffs2 separates the statistics into quantitative and categorical statistics. It then passes the categorical statistics (for both the focus player and the comps) and the categorical weights to catDiffs2 to calculate differences for categorical variables. It passes the quantitative statistics and weights to numDiffs2 calculate differences for quantitative variables.

Using Gower's formula, distances for categorical variables between player j and the focus player are 0 if the players have the same value of the variable, and 1 if it's different. Difference for the i^{th} quantitative variable for the j^{th} player are d_{ij} = (stat_{ij} - stat_{i,fp}) / range(stat_i).

findDiffs2 then records missing data as m_{ij} = 0 if the i^{th} statistic for player j or for the focus player is missing. Let delta_{j} be the distance between player j and the focus player and w_i be the weight for the i^{th} statistic (i = 1...I). Using Gower's distance metric,

delta_j = sum(i=1:I; w_i m_{ij} d_{ij}) / sum(i=1:I; w_i d_{ij})

Value

a vector of differences. The jth element of the vector corresponds to the difference between stats_fp and the jth row of stats_comps. Differences are calculated using Gower's distance formula.

References

Gower, J. C. (1971) A general coefficient of similarity and some of its properties, Biometrics 27, 857 - 874.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
### Using the iris data
wts <- (1:5) %>% setNames(names(iris))
fp <- data.frame(5,4,3,2,"setosa") %>% setNames(names(iris))
i <- head(iris)
findDiffs2(fp, i, wts)
# ratio of weights are all that matters
wts <- wts / 10
findDiffs2(fp, i, wts) # same thing!
# columns not names in wts will be removed
wts <- wts[-4]
findDiffs2(fp, i, wts) # still works

# can handle missing data!
i <- head(iris)
i[3,4] <- NA
findDiffs2(fp, i, wts)

guytuori/simScores documentation built on May 17, 2019, 9:29 a.m.