comperes package is designed for
This vignette describes supported formats for CR and operations with them.
It is assumed that competition consists from multiple games (matches, comparisons, etc.). In general games can consist from variable number of players. Inside a game all players are treated equally. In every game every player has some score: the value of arbitrary nature that fully characterizes player's performance in particular game (in most cases it is some numeric value).
Long format is the most general way to represent CR because it naturally allows one game to consist from variable number of players. Results should be in "data.frame-like" format with observational unit (row) being score of particular player in particular game.
comperes this format is supported with
longcr S3 class which inherits from tibble. Data in this format should have at least three columns with the following names:
game- game identifier;
player- player identifier;
score- score of particular player in particular game.
Extra columns are allowed. Note that if
longcr object is converted to
widecr (wide format which is described next) they will be dropped. So it is better to store extra information about
player pair as list-column
score which will stay untouched.
As example of
longcr object one can use
ncaa2005 which is a built-in dataset in
print(ncaa2005, n = 6)
There is an S3 method for easy conversion to
to_longcr. Its default version converts argument to tibble and adds
longcr to result's class. If argument of
to_longcr is a proper
longcr object then it stays untouched. In case of
widecr the actual conversion to
longcr is made preserving all extra columns.
to_longcr has argument
TRUE then the result is ensured to have proper structure: there will be columns
score and there will be no duplicated
player pairs. If they were not detected in original data then they are created as new columns with
NAs and appropriate message is given. In case of imperfect matching of column names there also will be a message.
For more details see
Wide format is preferred if all games consist from the constant number of players. Results should be in "data.frame-like" format with observational unit (row) being one particular game. Data should be organized in pairs of columns
score. Identifier of a pair should go after respective keyword and consist only from digits. For example: player1, score1, player2, score2. Order doesn't matter. Extra columns are allowed.
To account for R standard string ordering, identifiers of pairs should be formatted with possible leading zeros. For example: player01, score01, ..., player10, score10.
game for game identifier is optional. If present it will be used in conversion to
longcr format via
Here is the
widecr version of
print(to_widecr(ncaa2005, repair = FALSE), n = 3)
As is with
longcr there is an S3 method for easy conversion to
to_widecr. Its default version converts argument to tibble and adds
widecr to result's class. If argument of
to_widecr is a proper
widecr object then it stays untouched. In case of
longcr the actual conversion to
widecr is made using only columns
to_widecr has argument
TRUE then it detects possible
score pairs by the identifier of a pair (characters that go after respective keywords). If some column doesn't have pair it is created as new column with
For more details see
A useful case of wide CR format is pairgames: CR in which games are held between two players. It is just
widecr object with two players. Also it is the most popular case of CR for rating and ranking systems. There is a function
to_pairgames to create pairgames from general CR: it drops games with one player and for every game with 3 and more players this function transforms it into set of separate games between unordered pairs of players. It accepts CR in format ready for
to_longcr. For more details see
?to_pairgames. The usage example is as follows:
cr_data <- data.frame(game = rep(1, 3), player = 11:13, score = 101:103) to_pairgames(cr_data)
After conversion of CR into appropriate format one can use them for several types of operations.
Head-to-Head value is a measure of a quality of direct confrontation between two players. It is assumed that this value can be computed based only on the players' scores in their common games. If it is not true for some case then competition results should be changed by transformation or addition of more information (in form of extra columns or extra field in
score column(s) making list-column(s)).
Head-to-Head value is computed for an ordered pair of players based on their matchups. It means that Head-to-Head value for "player1"-"player2" may be different from "player2"-"player1". It is done in order to except not symmetrical Head-to-Head values.
There is a function for computing multiple Head-to-Head values in matrix form (Head-to-Head matrix):
get_h2h. It accepts CR data and Head-to-Head function
h2h_fun (for more details see
?get_h2h). It returns an object of class
h2h: square matrix with number of rows (and columns) equal to number of players for which it is computed. The Head-to-Head matrix of
ncaa2005 with Head-to-Head value being number of wins of second player in matchups:
get_h2h(ncaa2005, h2h_fun = h2h_num_wins)
For the list of implemented
h2h_funs see help page for
get_h2h has argument
players. By default it is
NULL and it means that Head-to-Head values are computed for all players present in CR. If not
NULL then Head-to-Head values are computed only for pairs between players from argument
players. Note to be careful with Head-to-Head values of players with themselves: it can be inaccurate if
players is not
NULL because it will be based on possibly undesirable data. Example for Head-to-Head value being number of games played:
get_h2h(ncaa2005, h2h_fun = h2h_num, players = c("Duke", "Miami"))
The output can be wrongly interpreted as Head-to-Head matrix based on CR from which only games between "Duke" and "Miami" are taken. The correct interpretation is as Head-to-Head matrix based on matchups from the whole
ncaa2005 between players from given set. So the number of games played by "Duke" in the supplied CR is 4. It can be rephrased as the number of matchups of "Duke" with itself so the output is conceptually correct.
players can also have values that are not present in CR data. The resulting rows and columns will be filled with
NAs. For dealing with absent data in Head-to-Head matrix
get_h2h has two more arguments (for detailed information see
absent_players. It is a function that should do something with data of players that have not enough games played. To do nothing use
absent_h2h. It is a function that should do something with those entries of Head-to-Head matrix which are
NA. To do nothing use
Examples of usage with extra players:
get_h2h( ncaa2005, h2h_fun = h2h_num, players = c("Duke", "Miami", "Extra"), absent_players = skip_action ) # Use extra argument 'fill' to supply value for 'fill_h2h' get_h2h( ncaa2005, h2h_fun = h2h_num, players = c("Duke", "Miami", "Extra"), absent_players = skip_action, absent_h2h = fill_h2h, fill = 0 )
With given CR it can be interesting to compute its summaries. Of course it can be done pretty easy with combination of
dplyrs verbs and grouping but
comperes provides a function for that:
get_item_summary(cr_data, item, summary_fun = NULL, ...). CR should be ready for
to_longcr and every further actions are done based on
longcr version of CR.
item defines on which columns grouping is made for computing item summary. Argument
summary_fun defines the function which performs summary computation. Basically
summary_fun to groups of
longcr version of CR data defined by
summary_fun can be
NULL in which case a tibble is returned with columns named as stored in
item and which has all unique values of particular item (set of columns) in CR.
get_item_summary(ncaa2005, item = "player", summary_fun = summary_min_max_score) get_item_summary(ncaa2005, item = "game", summary_fun = NULL)
For the list of implemented
summary_funs see help page for
There are also wrappers around
get_item_summary for most common items:
item = "game";
item = "player";
# The same as previous code get_player_summary(ncaa2005, summary_fun = summary_min_max_score) get_game_summary(ncaa2005, summary_fun = NULL)
item can define multiple columns:
ncaa2005 %>% mutate(season = rep(1:2, each = 10)) %>% get_item_summary(item = c("season", "player"), summary_fun = summary_min_max_score)
In order to modify scores in CR so that they fully characterize player's performance in particular game one might need to use item summaries. Instead of manually computing with
get_item_summary and applying
left_join to result,
add_item_summary. For example suppose the goal of players in
ncaa2005 was not to gain points more than opponent but to gain as close scores to opponent as possible. In this case
score doesn't fully describe the player's performance. Instead the distance from the mean score can describe it:
ncaa2005 %>% add_item_summary(item = "game", summary_fun = summary_mean_sd_score) %>% mutate(score = abs(score - meanScore)) %>% print(n = 6)
add_item_summary can be redundant in this example but in case of complex CR with variable number of players it can be quite useful.
There are also wrappers for the most common items:
# The same as previous example ncaa2005 %>% add_game_summary(summary_fun = summary_mean_sd_score) %>% mutate(score = abs(score - meanScore)) %>% print(n = 6)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.