runs raw data diagnostics for Elo rating

Share:

Description

runs some diagnostics on the data supplied to elo.seq, to check whether elo.seq will run without errors

Usage

1
seqcheck(winner, loser, Date, draw=NULL, presence=NULL)

Arguments

winner

factor or character vector of winner IDs

loser

factor or character vector of loser IDs

Date

character vector of form "YYYY-MM-DD" with the date of the respective interaction

draw

logical (of length(winner)). Did did the interaction end undecided (i.e. drawed or tied)? By default all FALSE, i.e. no undecided interactions occured

presence

data.frame with presence data, see elo.seq

Details

calender dates (for the sequence as well as in the first column of presence, if supplied) need to be in "YYYY-MM-DD" format!

seqcheck will return two types of messages: warnings and errors. Errors will result in the data NOT working when supplied to elo.seq, and need to be fixed. Warning message do not necessarily lead to failure of executing elo.seq. Note that by default seqcheck is part of elo.seq. If any error or warning is produced by seqcheck, these data will not work in elo.seq. Some warning (but not error) messages can be ignored (see below) and if the runcheck argument in elo.seq is set to FALSE Elo ratings will be calculated properly in such cases.

The actual checks (and corresponding messages) that are performed are described in more detail here:

Most likely (i.e. in our experience), problems are caused by mismatches between the interaction data and the corresponding presence data.

Errors:
presence starts AFTER data: indicates that during interactions at the beginning of the sequence, no corresponding information was found in the presence data. Solution: augment presence data, or remove interactions until the date on which presence data starts
presence stops BEFORE data: refers to the corresponding problem towards the end of interaction and presence data
during the following interactions, IDs were absent...: indicates that according to the presence data, IDs were absent (i.e. "0"), but interactions with them occured on the very date(s) according to the interaction data
the following IDs occur in the data sequence but NOT...: there is/are no columns corresponding to the listed IDs in the presence data
there appear to be gaps in your presence (days missing?)...: check whether your presence data includes a line for each date starting from the date of the first interaction through to the date of the last interaction

Warnings:
presence continues beyond data: indicates that presence and interaction data do not end on the same date.
presence starts earlier than data: indicates that presence and interaction data do not start on the same date.
the following IDs occur in the presence data but NOT...: there are more ID columns in the presence data than IDs occuring in the interaction data

Other warnings/errors can result from inconsistencies in either the presence or sequence data, or be of a more general nature:

Errors:
no 'Date' column found: in the presence data, no column exists with the name/header "Date". Please rename (or add) the necessary column named "Date" to your presence data.
at least one presence entry is not 1 or 0: presence data must come in binary form, i.e. an ID was either present ("1") or absent ("0") on a given date. No NAs or other values are allowed.
your data vectors do not match in length: at least one of the three mandatory arguments (winner, loser, Date) differs from one other in length. Consider handling your data in a data.frame, which avoids this error.

Warnings:
IDs occur in the data with inconsistent capitalization: because R is case-sensitive, "A" and "a" are considered different individuals. If such labelling of IDs is on purpose, ignore the warning and set runcheck=FALSE when calling elo.seq()
There is (are) X case(s) in which loser ID equals winner ID: winner and loser represent the same ID
the following individuals were observed only on one day: while not per se a problem for the calculation of Elo ratings, individuals that were observed only on one day (irrespective of the number of interactions on that day) cannot be plotted. eloplot will give a warning in such cases, too.

Value

returns info about possible errors, or states that data are fine for running with elo.seq

Author(s)

Christof Neumann

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
data(adv)
seqcheck(winner=adv$winner, loser=adv$loser, Date=adv$Date)
data(advpres)
seqcheck(winner=adv$winner, loser=adv$loser, Date=adv$Date,
         presence=advpres)

# create faulty presence data
faultypres <- advpres[-1, ]
faultypres[5,2:8] <- 0
# seqcheck(winner=adv$winner, loser=adv$loser, Date=adv$Date,
#          presence=faultypres)