screen_time: Screen Time

screen_timeR Documentation

Screen Time

Description

A dataset summarising the screen time of contestants on the TV show Survivor. Currently only contains Season 1-4 and 42.

Usage

screen_time

Format

This data frame contains the following columns:

version_season

Version season key

episode

Episode number

castaway_id

ID of the castaway (primary key). Also includes two special IDs of host (i.e. Jeff Probst) or unknown (the image detection couldn't identify the face with sufficient accuracy)

screen_time

Estimated screen time for the individual in seconds.

Details

Individuals' screen time is calculated, at a high-level, via the following process:

  1. Frames are sampled from episodes on a 1 second time interval

  2. MTCNN detects the human faces within each frame

  3. VGGFace2 converts each detected face into a 512d vector space

  4. A training set of labelled images (1 for each contestant + 3 for Jeff Probst) is processed in the same way to determine where they sit in the vector space. TODO: This could be made more accurate by increasing the number of training images per contestant.

  5. The Euclidean distance is calculated for the faces detected in the frame to each of the contestants in the season (+Jeff). If the minimum distance is greater than 1.2 the face is labelled as "unknown". TODO: Review how robust this distance cutoff truly is - currently based on manual review of Season 42.

  6. A multi-class SVM is trained on the training set to label faces. For any face not identified as "unknown", the vector embedding is run into this model and a label is generated.

  7. All labelled faces are aggregated together, with an assumption of 1 full second of screen time each time a face is seen.


survivoR documentation built on July 9, 2023, 5:21 p.m.