seqcheck: runs raw data diagnostics for Elo rating

View source: R/seqcheck.R

seqcheckR Documentation

runs raw data diagnostics for Elo rating

Description

runs some diagnostics on the data supplied to elo.seq, to check whether elo.seq will run without errors

Usage

seqcheck(winner, loser, Date, draw = NULL, presence = NULL)

Arguments

winner

either a factor or character vector with winner IDs of dyadic dominance interactions

loser

either a factor or character vector with loser IDs of dyadic dominance interactions

Date

character vector of form "YYYY-MM-DD" with the date of the respective interaction

draw

logical, which interactions ended undecided (i.e. drawn or tied)? By default all FALSE, i.e. no undecided interactions occurred. Note that in this case, winner/loser values can be interchanged

presence

optional data.frame, to supply data about presence and absence of individuals for part of the time the data collection covered. see details

Details

calender dates (for the sequence as well as in the first column of presence, if supplied) need to be in "YYYY-MM-DD" format!

seqcheck will return two types of messages: warnings and errors. Errors will result in the data NOT working when supplied to elo.seq, and need to be fixed. Warning message do not necessarily lead to failure of executing elo.seq. Note that by default seqcheck is part of elo.seq. If any error or warning is produced by seqcheck, these data will not work in elo.seq. Some warning (but not error) messages can be ignored (see below) and if the runcheck argument in elo.seq is set to FALSE Elo-ratings will be calculated properly in such cases.

The actual checks (and corresponding messages) that are performed are described in more detail here:

Most likely (i.e. in our experience), problems are caused by mismatches between the interaction data and the corresponding presence data.

Errors:
Presence starts AFTER data: indicates that during interactions at the beginning of the sequence, no corresponding information was found in the presence data. Solution: augment presence data, or remove interactions until the date on which presence data starts

Presence stops BEFORE data: refers to the corresponding problem towards the end of interaction and presence data

During the following interactions, IDs were absent...: indicates that according to the presence data, IDs were absent (i.e. "0"), but interactions with them occured on the very date(s) according to the interaction data

The following IDs occur in the data sequence but NOT...: there is/are no columns corresponding to the listed IDs in the presence data

There appear to be gaps in your presence (days missing?)...: check whether your presence data includes a line for each date starting from the date of the first interaction through to the date of the last interaction

Warnings:

Presence continues beyond data: indicates that presence and interaction data do not end on the same date.

Presence starts earlier than data: indicates that presence and interaction data do not start on the same date.

The following IDs occur in the presence data but NOT...: there are more ID columns in the presence data than IDs occuring in the interaction data

Date column is not ordered: The dates are not supplied in ascending order. elo.seq will still work but the results won't be reliable because the interactions were not in the correct sequence.

Other warnings/errors can result from inconsistencies in either the presence or sequence data, or be of a more general nature:

Errors:

No 'Date' column found: in the presence data, no column exists with the name/header "Date". Please rename (or add) the necessary column named "Date" to your presence data.

At least one presence entry is not 1 or 0: presence data must come in binary form, i.e. an ID was either present ("1") or absent ("0") on a given date. No NAs or other values are allowed.

Your data vectors do not match in length: at least one of the three mandatory arguments (winner, loser, Date) differs from one other in length. Consider handling your data in a data.frame, which avoids this error.

Warnings:

IDs occur in the data with inconsistent capitalization: because R is case-sensitive, "A" and "a" are considered different individuals. If such labelling of IDs is on purpose, ignore the warning and set runcheck=FALSE when calling elo.seq()

There is (are) X case(s) in which loser ID equals winner ID: winner and loser represent the same ID

The following individuals were observed only on one day: while not per se a problem for the calculation of Elo ratings, individuals that were observed only on one day (irrespective of the number of interactions on that day) cannot be plotted. eloplot will give a warning in such cases, too.

Value

returns textual information about possible issues with the supplied data set, or states that data are fine for running with elo.seq

Author(s)

Christof Neumann

Examples

data(adv)
seqcheck(winner = adv$winner, loser = adv$loser, Date = adv$Date)
data(advpres)
seqcheck(winner = adv$winner, loser = adv$loser, Date = adv$Date,
         presence = advpres)

# create faulty presence data
# remove one line from presence data
faultypres <- advpres[-1, ]
# make all individuals absent on one day
faultypres[5, 2:8] <- 0
# run check
seqcheck(winner = adv$winner, loser = adv$loser, Date = adv$Date,
         presence = faultypres)

# fix first error
faultypres <- rbind(faultypres[1, ], faultypres)
faultypres$Date[1] <- "2010-01-01"

# run check again
seqcheck(winner = adv$winner, loser = adv$loser, Date = adv$Date,
         presence = faultypres)

# fix presence on date for interaction number 6
faultypres[6, 2:8] <- 1

# run check again
seqcheck(winner = adv$winner, loser = adv$loser, Date = adv$Date,
         presence = faultypres)
# all good now

gobbios/EloRating documentation built on July 19, 2024, 4:05 a.m.