In tca2/datavyu: Prepare and Create Output from datavyu Coding Files

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  dpi = 250,
  eval = FALSE
)

Note: This vignette is not run now because of changes in how the R package works

Loading, setting up

First, load the {birdseyevyu} package and the {tidyverse} suite of packages:

library(birdseyevyu)
library(tidyverse) # install via install.packages("tidyverse")
library(irr) # install via install.packages("irr")
library(patchwork) # install via install.packages("patchwork")
library(here) # install.packages("here")

1. Preparing the data in datavyu (the software---not the package!)

Next, let's prepare the files we wish to analyze. To do so, we have to export them from the datavyu software, as follows:

Run the following Ruby script within the datavyu software by selecting Script and then Run Script; select a directory with one or more .opf files:

datavyu2csv.rb

2. Open the directory that the Ruby script created; a number of CSV files for each `.opf` file should now be created.

This is the directory (folder) passed to the datavyu functions below.

The {here} package can be used to flexibly (across computers/operating systems) specify file paths: To save on typing, the directory can be set for an entire R session via the following:

options(directory = here("irr-data", "datavyu_output_11-16-2020_14-13"))

Summarizing a column

Frequency of codes; note that the code is the code listed appended to the column name after a period.

summarize_column(column = "LogClass_IS",
                 code = "LogClass_IS.i")

Frequency of codes by file:

summarize_column(column = "LogClass_IS",
                 code = "LogClass_IS.i",
                 by_file = TRUE)

Plot of duration (note that summary = "duration" can be added to any of the above) by file:

freq_summary <- summarize_column(column = "LogClass_IS",
                                 code = "LogClass_IS.i",
                                 by_file = TRUE,
                                 summary = "duration")

plot_column_summary(freq_summary)

Inter-rater agreement

Plots

prepared_time_series_tm <- prep_time_series(column = "LogClass_IS",
                                            code = "LogClass_IS.i",
                                            specified_file = "TM 14-12-03 T201 Content Log")

plot_time_series(prepared_time_series_tm)

prepared_time_series_hh <- prep_time_series(column = "LogClass_IS",
                                            code = "LogClass_IS.i",
                                            specified_file = "HH T201 14-12-03 Content Log")

plot_time_series(prepared_time_series_hh)

These could be composed together using the patchwork library:

plot_time_series(prepared_time_series_tm) +
  plot_time_series(prepared_time_series_hh) +
  plot_layout(ncol = 1)

Agreement statistics

First, looking at data:

prepared_time_series_tm
prepared_time_series_hh

We'll do a "full" join, re] taining all time stamps for both files. First, we must rename one (or both) of the two code columns. Having done this, we can easily compare the two once joined:

prepared_time_series_tm <- rename(prepared_time_series_tm, code_tm = code)
prepared_time_series_hh <- rename(prepared_time_series_hh, code_hh = code)

joined_data <- prepared_time_series_tm %>% 
  full_join(prepared_time_series_hh, by = "ts")

joined_data

We can calculate agreement using the {irr} package, passing only the 2nd and 3rd columns (with the codes) to the function agree() (from the {irr} function):