To be expanded at a later time.
Keystrokes are a misnomer. For our purposes, we track the changes in the text content using a diff algorithm. The text diff may be triggered by keyboard input, but also other editing actions such as copy-and-paste, spell-checker use, etc.
To be expanded
The following is an example of a "text.change" entry, where the letter 's' at position 51 was removed.
<observableDatum> <sceneId>AnswerQuestions</sceneId> <controlId>AnswerQuestions.TextArea</controlId> <eventType>text.change</eventType> <timestamp>2015-07-27T17:57:10.349Z</timestamp> <content> <pair> <key>position</key> <value>51</value> </pair> <pair> <key>removedText</key> <value> <![CDATA[ s ]]></value> </pair> <pair> <key>addedText</key> <value> <![CDATA[ ]]> </value> </pair> </content> </observableDatum>
The text.change
events are sandwitched between the text.focus
and the text.blur
events, which mark whether the text input area is activated or deactivated (e.g., when
the user clicked
in and out of the box). We also need to distinghish text input on different scenes, which is
why we keep the nav.scene
event as an input to the events2df()
function. It is in the
dropEvents
list, meaning that events2df
will not output this event.
For analysis, we extract all the text.\*
nodes into a data frame. Note the column
content
which contains lists of attributes such as addedText
in the XML above. This gives us the flexibility to store arbitrary information for different events without polluting the data
frame structure. We can access these attributes using the @
operator defined in the
dfat library. This hopefully achieves a balance between
representational flexibility and the convenience of the data frame structure required by most
R statistical functions.
require(devtools) devtools::install_github('garyfeng/dfat') # overwrites the @ sign devtools::install_github('garyfeng/pdata') # utilities for process data # read the data from all XML files under "./data"; # the filename (without extention) will be in the variable "bookletId" data <- readXML(Sys.glob("data/*.XML"), subjIdVar="bookletId") # in this case the data frame will have the following variables # bookletId: subj ID # blockId: the condition the subj is in. # sceneId: the scence in the test # eventType: the event id, for each row # ts: the timestamp in POSIXct # content: a list with named members representing extended attributes; see above XML for an example # The <key> becomes the attribute name, and <value> becomes the value. # get only events starting with "text" keystrokes<- subset(data, grepl("^text", data$eventType))
We then analyze the inter-key intervals (IKIs) for various kinds of keystroke events, namely
We will also calculate a bunch of other useful variables.
``` {r, eval=FALSE}
t0<- keystrokes$ts; t0[keystrokes$eventType!="text.focus"]<-NA; t0[1]<-keystrokes$ts[1] keystrokes$durSinceTextFocus <- as.numeric(timeDiff(t0, keystrokes$ts))
keystrokes$textPos <- as.numeric(keystrokes$content@"position") +1
keystrokes$editLen <-sapply(keystrokes$content, function(c) { ifelse( (is.null(c$position)), 0, nchar(c$addedText)-nchar(c$removedText)) })
keystrokes %>% group_by(bookletId, controlId) %>% mutate(textLen = cumsum(editLen)) -> keystrokes keystrokes$textLen[is.na(keystrokes$textPos)] <-NA
addedText <-keystrokes$content@"addedText" allKeys <- which(! is.na(addedText))
alphabetKeys<- which(addedText %in% c(letters, LETTERS))
del <- which(addedText=="")
sp <- which(addedText==" ")
inter <- intersect(sp+1, alphabetKeys)
intra <- setdiff(alphabetKeys, inter)
keystrokes$keyType <-NA keystrokes$keyType[sp] <-"Spaces" keystrokes$keyType[del] <-"Delecting" keystrokes$keyType[intra] <-"Intra-word" keystrokes$keyType[inter] <-"Inter-word"
keystrokes$sceneId <- factor(keystrokes$sceneId, levels = keystrokes$sceneId) keystrokes$eventType <- factor(keystrokes$eventType, levels = keystrokes$eventType)
---- ## Text Length and Cursor Position by Scenes and by Condition ## The "ribbon" plot shows the progression of 2 variables over time * the text length * the cursor position The difference between the two lines -- whenever the test-taker edit at positions other than the end of the text -- is shown in the shaded area. In other words, the "flags" signal non-local editing actions. In the following example, we plot the ribbon plot for each ```sceneId```, with time zero starting from the onset of each scene (one textbox per scene at most). We also want to compare across a categorical variable ```blockId```, and identify each student (```bookletId```) with color. ```r # stats of keystrokes ggplot()+ geom_ribbon(data=keystrokes, alpha=0.5, aes(x=as.numeric(durSinceSceneStart)/60, ymin=as.numeric(textPos), ymax=as.numeric(textLen), color=bookletId, fill=bookletId))+ # titles and options xlab("Task Time (in min.)") +ylab("Text Length - Cursor Position")+ theme(legend.position="none")+ facet_grid(blockId ~ sceneId , scales="free_x")
Let's say we want to show the distribution of the inter-key-intervals (IKIs) for different types of keystrokes, we can use the getDistr() function (part of the pdata library) to compute distribution functions, get several distributions under a single data frame, and plot them using ggplot.
``` {r, eval=FALSE}
binwidth = 25
tmp <- getDistr(keystrokes$dur1000, binwidth = binwidth) %>% mutate(keyType="All") for (i in unique(keystrokes$keyType)) { if (is.na(i)) next getDistr(keystrokes$dur[keystrokes$keyType==i] 1000, binwidth = binwidth) %>% mutate(keyType=i) %>% rbind(tmp) -> tmp }
for (i in c("All", "Intra-word", "Inter-word", "Spaces", "Delecting")) { if (is.na(i)) next pdfLog<- ggplot(data=tmp[tmp$keyType==i,], aes(x=time, y=pdf, color=sceneId))+ geom_line() + labs(list(x = "InterKey Interval (msec)", y = "Prob. Density"))+ xlim(0.001, 1000)
pdfLogLog<- ggplot(data=tmp[tmp$keyType==i,], aes(x=time, y=pdf, color=sceneId))+ geom_line() + labs(list(x = "InterKey Interval (msec)", y = "Prob. Density"))+ #stat_smooth(se=F) + scale_x_log10() +scale_y_log10()+ theme(legend.position="none")
hazLog<- ggplot(data=tmp[tmp$keyType==i,], aes(x=time, y=haz, color=sceneId))+ geom_line() + labs(list(x = "InterKey Interval (msec)", y = "Hazard Rate"))+ xlim(0.001, 1500)+ theme(legend.position="none")
hazLogLog<- ggplot(data=tmp[tmp$keyType==i,], aes(x=time, y=haz, color=sceneId))+ geom_line() + labs(list(x = "InterKey Interval (msec)", y = "Hazard Rate"))+ #stat_smooth(se=F) + scale_x_log10() +scale_y_log10()+ theme(legend.position="none")
grid.arrange(pdfLog, hazLog, pdfLogLog, hazLogLog, ncol=2, top=textGrob(paste("KeyType = ", i), gp=gpar(cex=2))) } ```
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.