What we need for this
running this file in its entirety will produce the BINNED file
Suggested directory structure/settings (assumed below):
Note: general approach below: read the notes, uncomment the code edit as needed, run bit by bit.
#notes: if you don't have these installed, uncomment the next 3 lines # install.packages(c("devtools","tidyverse","readxl","forcats", "skimr"), # repos = "http://cran.us.r-project.org") # devtools::install_github("BergelsonLab/blabr") # devtools::install_github("dmirman/gazer") library(devtools) library(tidyverse) library(readxl) library(forcats) library(blabr) library(skimr) library(gazer) options(tibble.width = Inf) options(dplyr.width = 100)
Change the following to TRUE if you want to generate the result output files for all the functions below (but not the chunk at the very bottom of this rmd)
generate_output_file = FALSE
Loading fixations report:
# studyname <- fixations_report("data/eyetracking/studyname_fixrep_filename.xls") # if you had and want to keep practice trials, use remove_practice=FALSE
Get an idea of what is in the data we just loaded
Careful, these are quite big uncomment the next set of lines and look CAREFULLY at the output here.
look into it!
# summary(studyname) # glimpse(studyname) # skim(studyname) # colnames(studyname)
Convert fixations data into 20ms bins
Each dataset is a little different, below you'll want to flag which columns you want to keep, and what each of them is. Some of them may be the same as below, but some may not! Uncomment and edit the block below as appropriate
Note: There's often nested commenting for code blocks, you probably want to uncomment the whole chunk, but leave the comments on the code that were double commented
# studyname_bin <- binifyFixations(studyname, # keepCols=c("RECORDING_SESSION_LABEL",#subject number # "CURRENT_FIX_INTEREST_AREA_LABEL",#TARGET or DISTRACTOR # "CURRENT_FIX_X", #actual x coordinate # "CURRENT_FIX_Y", #actual y coordinate # "TRIAL_INDEX",#5-36 # "RT",#from images showing up to target onset # "TRIAL_START_TIME", # "AudioTarget",#name of sound file, e.g. where_diaper.wav # "Carrier",#can, do, look, where # "DistractorImage","TargetImage", # image file name e.g. apple.jpg # "DistractorLoc","TargetLoc", # #location of target and distractor [320,512] or [960,512] for the test trials # "Pair", # "TargetSide","Trial","TrialType")) #L or R, 1-32, between or within # #note: Trial is 1-32; TRIAL_INDEX (later renamed TrialCountingPractice; 5-36 bc # #practice trials) # skim(studyname_bin)
WAIT did you actually make sure you had the right columns above and that you changed the comments to reflect the values in YOUR dataset??DO IT
Look at binified data
#dim(studyname_bin) # put the dimensions here; useful for debugging!
If your study didn't have an experimenter pressing a key, you can cut this whole section!
There are often a handful of trials with a keypress issue I.e. these trials have no log of experimenter pressing button when target word started this could be bc parent didn't say target word, or experimenter error (mistake or bc baby screaming, etc.) Uncomment below and look at the result!
#keypressissues_studyname <- keypress_issues(studyname_bin, study="studyname", out_csv=generate_output_file, output_dir="data/eyetracking/manual_trial_checks/") #keypressissues_studyname
once it's been fixed, the line below will read in the fixed times when uncommented
and then we can fix them with the retrieved times
#ret_kp_studyname <- keypress_retrieved("data/eyetracking/manual_trial_checks/retrieved_keypress_studyname.xlsx") # N.B.: ret_kp_studyname tells us how many trials had a missing press #nrow(ret_kp_studyname) #& how many were fixed offlne #filter(ret_kp_studyname, outcome=="FIX")
if your dataset doesn't have the parent saying the sentence, you may not need the msg rep at all
#studyname_mesrep_all <- load_tsv("data/eyetracking/studyname_mesrep_filename.xls") #summary(studyname_mesrep_all)
we generally just need the button press column and time, trial, and subj (called "EL_BUTTON_CRIT_WORD" in the mesrep, unless you renamed it)
#studyname_mesrep <- get_mesrep(studyname_mesrep_all, ret_kp_studyname)
Now we have to double check the outlier really late keypress times (i.e. target onset times): The default for this function is set to a conservative 6s Visual inspection confirms this, so we double check those over 6s How many are there?
# qplot(studyname_mesrep$CURRENT_MSG_TIME) # # latetargetonset_studyname <- get_late_target_onset(studyname_mesrep, # out_csv = generate_output_file, # output_dir="data/eyetracking/manual_trial_checks/") # # FYI this is usually slightly more conservative than using 3sds: # mean(studyname_mesrep$CURRENT_MSG_TIME, na.rm=T)+ # 3*sd(studyname_mesrep$CURRENT_MSG_TIME, na.rm=T) # # (they're basically never shorter than 3sd) # val_lessthan3<- mean(studyname_mesrep$CURRENT_MSG_TIME, na.rm=T) - # 3*sd(studyname_mesrep$CURRENT_MSG_TIME, na.rm=T) # # filter(studyname_mesrep, CURRENT_MSG_TIME< val_lessthan3) # # #This pulls out needed trials for checking against the video footage # nrow(latetargetonset_studyname)
and then once fixed we read in the updated excel version. Looking at the videos of the study will reveal whether the keypress was correctly late, or experimenter error. The latter we're able to fix from the video and message report, if we have the video.
The code below needs to be individualized for your col names, be thorough!
# checked_lt_studyname <- late_target_retrieved("data/eyetracking/manual_trial_checks/checked_latetargetonset_studyname.xlsx") # studyname_mesrep <- studyname_mesrep_all %>% # mutate(Trial=as.numeric(Trial)) %>% # fyi this turns p1:p4 (practice) to NA # filter(CURRENT_MSG_TEXT=="PLAY_POP") %>% # dplyr::select(RECORDING_SESSION_LABEL, TRIAL_INDEX, Trial, RT, # AudioTarget, CURRENT_MSG_TIME) %>% # inner_join(checked_lt_studyname %>% filter(outcome=="FIX")) %>% # dplyr::rename(PLAY_POP=CURRENT_MSG_TIME) %>% # mutate(CURRENT_MSG_TIME_test = PLAY_POP+ms_diff) %>% # dplyr::select(RECORDING_SESSION_LABEL, CURRENT_MSG_TIME_test, # TRIAL_INDEX, AudioTarget, Trial) %>% # right_join(studyname_mesrep) %>% # mutate(CURRENT_MSG_TIME = ifelse(!is.na(CURRENT_MSG_TIME_test), # CURRENT_MSG_TIME_test, CURRENT_MSG_TIME), # Trial= as.character(Trial)) # # brings back p1:p4 from the join if you have it # metanote: should turn the above into a function
Note: CURRENT_MSG_TIME_test that's not NA is a sign that late keypress was fixed
Overview of the data
# summary(studyname_bin) # dim(studyname_mesrep) # summary(studyname_mesrep)
now we can merge the message report file with the fixation report file so we get the target onset times, merging binned fixation data and message report, and rename a few columns
# studyname_fix_mes_clean_1 <- left_join(studyname_bin, studyname_mesrep) %>% # dplyr::rename(SubjectNumber = RECORDING_SESSION_LABEL, # TargetOnset = CURRENT_MSG_TIME, # gaze = CURRENT_FIX_INTEREST_AREA_LABEL, # TrialCountingPractice = TRIAL_INDEX) # # studyname_fix_mes_clean <- studyname_fix_mes_clean_1 %>% # # filter no press trials, # filter(!is.na(TargetOnset)) %>% # # divide pairs in two columns # # create a column for the stitem of pair and a second column for the 2nd item # separate(Pair, c("first","second"), sep = "_", remove = F, extra="drop")%>% # mutate(gaze = fct_recode(gaze, NULL = "."), # gaze becomes a factor # # make the gaze variable into a binary 1,0 for proportions later) # propt = ifelse(gaze == "TARGET", 1, ifelse(gaze == "DISTRACTOR", 0, NA)), # # remove the .jpg from the TargetImage # targetnum = as.factor(gsub(".jpg", "", TargetImage)), # #remove the .jpg from the TargetImage and the number (e.g. bottle3) # target = as.factor(gsub("\\d+", "", targetnum)), # #is the target the first or second image # num = factor(ifelse(as.character(second) == as.character(target), "two", "one")), # Trial = as.numeric(Trial), # RT = as.numeric(RT) # ) %>% # # all (and only) character columns as factors # characters_to_factors()
Note: if you skipped the section above due to no keypress, you probably still want to rename some columns, make sure things are the right type (numeric, factor, character) etc.
Note2: below, the dataframe is now called studyname_fix_mes_clean. Rename yours as needed if you didn't have to do the key press stuff ~~~~~~~~~~~~~~~~~~~~~~~~~~
spend time summarizing & glimpsing & probing & grouping your eyetracking data tibble.
Sometimes when running subjects experimenters accidentally name the files wrong, with extra digits, typos, etc. or computer barfs and needs restarted, etc.
#unique(studyname_fix_mes_clean$SubjectNumber)
Which subjects have clear errors in naming? list those Ss here:
Do the notes say any file should be fully dropped bc of experimenter error or other reasons? e.g. "s16 was just the warmup trials and then experiment cashed; s162 is the right file for that baby"
Any other weird anomalies you found? Errors in the data source? Notes you need to act on? list those here and fix by adding code below!
#sample code for fully removing or renaming files; your will depend on your subject names of course # studyname_test_preexclude_fixes <- studyname_fix_mes_clean %>% # # filter(SubjectNumber!="y16") %>% # mutate(SubjectNumber = fct_recode(SubjectNumber, # "y16"= "y162", # "y18"= "y018", # "y17" = "y017"))
now we're actually making new columns for noun onset, & our windows of interest
#n.b. the function automatically makes the most common 3 wins the lab uses, #367-200,3500,5000 # this will usually have been preregistered unless study is more exploratory #studyname_test_preexclude <- get_windows(studyname_test_preexclude_fixes, # bin_size = 20, # nb_1 = 18, # short_window_time = 2000)
time: 2000 time_bin: 20 time before reaction can be linked to cue: 367
explaining the math in the FindLowData function, which tags trials with data from less than 1/3 of the window of analysis (assumes 20ms bins)
# studyname_test_taglowdata <-studyname_test_preexclude %>% # FindLowData(gazeData = ., "shortwin", nb_2 = 367) %>% # dplyr::rename('lowdata_short' = 'missing_TF') # # summary(studyname_test_taglowdata)
First we want to take a simple look at how many data rows there are for each Ss, how many data rows where they're looking at T or D, and a graph of this. This doesn't take into account the lowdata we just tagged just yet
Note: if your columns names are different, you may need to adjust code below
# data_rows <- studyname_test_taglowdata %>% # group_by(SubjectNumber) %>% # tally() %>% # arrange(-n) # # td_rows <- studyname_test_taglowdata %>% # group_by(SubjectNumber) %>% # filter(gaze %in% c("DISTRACTOR", "TARGET")) %>% # tally() %>% # arrange(-n) %>% # rename(td_n = n) # coarse_data_quantity_SS <- td_rows %>% # left_join(data_rows) %>% # mutate(prop_td = td_n/n, # prop_n_overmax_data = n/(max(n)), # prop_td_overmax_td = td_n/(max(td_n))) # # ggplot(studyname_test_taglowdata, aes(CURRENT_FIX_X, CURRENT_FIX_Y, # color = gaze))+ # geom_point(shape=1)+facet_wrap(~SubjectNumber)
from this it will be clear if
you got essentially no data from a subject, list those Ss here:
Which Ss contributed not that much data overall. list those Ss here:
But you can't tell yet if they were perfect for the trials they did do! Now we look at this based on how many trials Ss had with looking in at least 1/3 of the window of interest
Note: your max_trial_num may not be 32! change as needed! (make sure you're considering practice trials or lack there ofproperly in your numbering of Trial, etc)
# Ss_stopped_early <- studyname_test_taglowdata %>% # group_by(SubjectNumber) %>% # summarise(max_trial_num=max(Trial)) %>% # filter(max_trial_num<32) # # #if more than half low_data, child excluded # #nb the reason to get rid of the NAs is that if there was NO gaze in a bin, it's # #NA, e.g. looked totally off screen from 150ms to 3000ms after target onset would # # have NA for lowdata_short # excluded_short <- studyname_test_taglowdata %>% # filter(lowdata_short == T | is.na(lowdata_short)) %>% # dplyr::select(SubjectNumber, Trial) %>% # group_by(SubjectNumber, Trial) %>% # dplyr::summarize() %>% # dplyr::count() %>% # dplyr::filter(n>=16)
Which Ss are out based on <50% of trials with at least 1/3 of the window of data? list those Ss here:
put a copy of your participant_tracking.xlsx spreadsheet in your data folder!
(you may need to generate this xlsx from the shared googledoc)
Usually, we get lots of overlap with who the notes says is was fussy/unusable and who the data said was fussy/unusable list the subjects the participant_tracking said are 'N/maybe' for 'usable' here, & why e.g.
list them here, along with your decisin and rational e.g.
Doing this painful process here lets you establish and write the 'exclude' part of your methods BEFORE you've looked at your results
# studyname_test <- studyname_test_taglowdata %>% # filter(!SubjectNumber %in% excluded_short$SubjectNumber)
# summary(studyname_test) # saveRDS(studyname_test, file = "data/eyetracking/studyname_test.Rds") # saveRDS(studyname_test_taglowdata, file = "data/eyetracking/studyname_test_taglowdata.Rds")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.