library("knitr") library("dplyr")
This package aims at supporting the analysis of annotation of images, e.g. pics of wearable camera. The workflow is as follows:
pictures are produced, e.g. participants of a study wear a camera that automatically produces pictures;
these pictures are then annotated by algorithms/ coders using a list of annotations;
the results of these annotations are then used to e.g. reconstruct the sequence of activities of a person during the day, or link it to pollution exposure.
This R package supports the following tasks:
How to convert data to a format easier to deal with in R?
How to summarize annotations?
How to plot annotations?
How to calculate interrater agreement?
The structure of data in the package is adapted to data produced for instance using the Doherty Sensecam Browser for annotation or with data produced using the XnView MP software. However, you can probably re-format your raw data to use the converting function.
The data needed for using the package are:
dico
. It contains three columns Code, Meaning, Group. The Code is the unique X-digit identifier of an activity. The Meaning, preferably in a single word such as washingYourTeeth, explains the code. The Group allows to group activities into meaningful categories, i.e. washingYourTeeth and washingYourHands could be in the hygiene Group whereas eatingRealFood and snackingOnCheapChocolate could be in the eating Group. If you do not use abbreviations, Code and Meaning can be equal but in any case they both need to be present.path_dico <- system.file("extdata", "dicoCoding_pinocchio.csv", package = "watchme") sep_dico <- ";" dico <- read.table(path_dico, sep = sep_dico, header=TRUE) dico
A table of coding results. The columns are either:
name,image_path,image_time,annotation (SQL query of the Doherty browser database, in this case the column name is a repetition of the participant name)
a column name containing the word "Filename", EXIF:Date Taken [Y-m-d_H-M-S], IPTC:Keywords (XnView MP)
The column image_path/Filename indicates the path to the picture, or its name. It only needs to be unique for each picture. The column image_time/EXIF:Date Taken [Y-m-d_H-M-S] gives the date and time at which the picture was taken, in the format "YMD HMS". The column annotation/IPTC:Keywords gives the code(s) associated with the picture. They can be pasted one after another, since we will use grepl() for finding the unique X-digit identifiers, or they can be on separate lines, since all codes for one picture identified by one picture_name will be merged.
These are the two formats the package were created for:
path_results <- system.file("extdata", "image_level_pinocchio.csv", package = "watchme") sep_results <- "," coding_results <- read.table(path_results, sep = sep_results, header = TRUE) coding_results <- dplyr::tbl_df(coding_results) coding_results %>% head() %>% knitr::kable() path_results <- system.file("extdata", "sample_coding1.csv", package = "watchme") sep_results <- "\t" coding_results <- read.table(path_results, sep = sep_results, header = TRUE, quote = "\'") coding_results <- dplyr::tbl_df(coding_results) coding_results %>% head() %>% knitr::kable()
tibble
data.framesUsing both these inputs, we create a tibble
on which operations will be performed.
The tibble
has the following variables:
participant_id, Name or ID number of the participant (character)
image_path, Path or name of the image in order to be able to identify duplicates (character)
image_time, Time and date of each image (POSIXt)
Columns of Booleans, indicating if a given code was given to a given picture.
the attribute
dico, which is a tibble
.
The function used to create such a tibble
is called watchme_prepare_data
.
For finding both inputs and interpreting them the watchme_prepare_data
function needs to know the paths to each file, path_results
and path_dico
and the separator used in each of them, sep_results
and sep_dico
, as well as the timezone corresponding to image_time
. sep_results
and sep_dico
might seem to be a bit of a hassle but we wanted to accomodate for different formats.
Below we illustrate the use of watchme_prepare_data
.
library("watchme") path_results <- system.file("extdata", "sample_coding1.csv", package = "watchme") sep_results <- "\t" path_dico <- system.file("extdata", "dico_coding_2016_01.csv", package = "watchme") sep_dico <- ";" results_table <- watchme_prepare_data(path_results=path_results, sep_results=sep_results, path_dico=path_dico, sep_dico=sep_dico, tz = "Asia/Kolkata") results_table %>% head() %>% knitr::kable()
In the case of the CHAI project, we had many coding results files from XnView MP and some of them had annotations spread over several columns. Because of this we added the robust_reading
option which is FALSE by default. When TRUE each results file is read twice, once for finding image_path and image_time (which need to be the first and second column, respectively), once for finding annotation by considering each line as a single variable. This way we didn't need to care too much about the number of columns of annotations, even if in theory it should have been one column only.
In the CHAI project, annotation of pictures was performed in several passes: in a first pass, coders looked at the set of pictures and assigned indoor/outdoor location for instance, in a second one they assigned code from the cooking group, etc. Each pass led to one file. So all results from one participant-day were contained in 5 coding files. Because of this, we wrote a function for being able to combine them. The steps were the following:
For each pass, generate a coding tibble with a dico as attribute via the use of watchme_prepare_data()
. For this we used one dico per pass where only the codes of that group (e.g. only cooking codes) were described.
Use the watchme_combine_results
function on the list of these 5 tibbles. common_codes
in this case was only the "uncodable" picture code, all other columns had to present in only one of the tibble. For the merging to work, all 5 tibbles need to have the exact same image_time.
Here is one example.
passes <- c("CK", "IO", "OP", "PM", "TP") create_pass_results <- function(pass){ path_results <- system.file('extdata', paste0("oneday_", pass, ".csv"), package = 'watchme') sep_results <- "\t" path_dico <- system.file('extdata', paste0("dico_coding_2016_01_", pass, ".csv"), package = 'watchme') sep_dico <- ';' results <- watchme_prepare_data(path_results = path_results, sep_results = sep_results, path_dico = path_dico, sep_dico = sep_dico, tz = "Asia/Kolkata") results$image_path <- gsub('\"', "", results$image_path) results } results_list <- passes %>% purrr::map(create_pass_results) results_list[[1]] oneday_results <- watchme_combine_results(results_list, common_codes = "non_codable") oneday_results
There is a default function based on ggplot2
watchme_plot_raw(results_table)
Using the annotations from the images, when can easily deduce a sequence of events. For instance having two subsequent pictures of washingYourTeeth taken at respectively t1 and t2 could be interpreted as having a washingYourTeeth event from t1 to t2. The watchme_aggregate
allows the conversion from a wearableCamImages
object to a table (dplyr
class tbl_df
) with
event_code, group, meaning
start_time (POSIXt
),
end_time (POSIXt
),
start_picture and end_picture,
no_pictures in the event,
duration in seconds.
If pictures have several codes, then there can be synchronous events.
The function watchme_aggregate
takes two arguments: a tibble
created by watchme_prepare_data
and a minimal duration for the events, in pictures, which is called min_no_pictures
. Below are two examples.
data("coding_example") eventTable <- watchme_aggregate(df = coding_example) knitr::kable(head(eventTable)) eventTable2 <- watchme_aggregate(df = coding_example, min_no_pictures = 2) knitr::kable(head(eventTable2))
The package provides a function using the R ggplot2
package for plotting sequences of events.
Below are three examples. In the example we give there's only one group of codes, indoor_outdoor, where codes are incompatible, but if plotting several groups of codes, one can use facetting.
data("coding1") event_table <- watchme_aggregate(df = coding1) watchme_plot_sequence(event_table) watchme_plot_sequence(event_table, x_axis = "picture") watchme_plot_sequence(event_table, x_axis = "picture") + facet_grid(group ~ .)
For plotting results from more than one coder, one could do this:
data("coding1") data("coding2") dico <- attr(coding1, "dico") event_table1 <- watchme_aggregate(df = coding1) event_table1 <- mutate(event_table1, coder = "coder1") event_table2 <- watchme_aggregate(df = coding2) event_table2 <- mutate(event_table2, coder = "coder2") event_table <- dplyr::bind_rows(event_table1, event_table2) attr(event_table, "dico") <- dico watchme_plot_sequence(event_table) + facet_grid(coder ~ .)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.