knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Please see manuscript for a long description of the following data. We will load the example data, and you can use the ?
with the dataset name to learn more about the data. This example focuses on expanding the free recall options to include multiple or randomized lists by participant. You will need the data with a "list" or trial ID, as well as a matching answer key with the same ID.
library(lrd) data("multi_data") head(multi_data) #?multi_data data("multi_answers") head(multi_answers) #?multi_answers library(ggplot2) library(reshape)
The participant data should be in long format with each answer as one row of data. The participant ID will be repeated by participant, while the trial ID will be repeated by trial. You should use the repeated
argument to indicate the information that denotes how the trials are repeated across participants. Note that in our example List.Type
and List.Number
contain the same information.
DF_long <- arrange_data(data = multi_data, responses = "Response", sep = " ", id = "Sub.ID", repeated = "List.Number") head(DF_long)
Next, we will restructure the answer key to have one column for the answers and one column for the list ID. We will also relabel the columns (since reshape
's output is vague using variable
and value
). Last, we need to make sure the list ID column matches the column in our participant data.
multi_answers$position <- 1:nrow(multi_answers) #this column is only to reshape answer_long <- melt(multi_answers, measured = colnames(multi_answers), id = "position") #fix columns colnames(answer_long) <- c("position", "List.ID", "Answer") #match list id to participant data, which is only numbers #list IDs can be characters or numbers answer_long$List.ID <- gsub(pattern = "List", replacement = "", x = answer_long$List.ID) head(answer_long)
Scoring in lrd
is case sensitive, so we will use tolower()
to lower case all correct answers and participant answers. In this particular example, all the spaces from the participant answers were removed, so we will also remove them from the answer key.
DF_long$response <- tolower(DF_long$response) answer_long$Answer <- tolower(answer_long$Answer) answer_long$Answer <- gsub(" ", "", answer_long$Answer) head(DF_long) head(answer_long)
You should define the following:
free_output <- prop_correct_multiple(data = DF_long, responses = "response", key = answer_long$Answer, key.trial = answer_long$List.ID, id = "Sub.ID", id.trial = "List.Number", cutoff = 1, flag = TRUE) str(free_output)
We can use DF_Scored
to see the original dataframe with our new scored column - also to check if our answer key and participant answers matched up correctly! The DF_Participant
can be used to view a participant level summary of the data. Last, if a grouping variable is used, we can use DF_Group
to see that output.
#Overall free_output$DF_Scored #Participant free_output$DF_Participant #Group #free_output$DF_Group
This function prepares the data for a serial position curve analysis or visualization. Please note, it assumes you are using the output from above, but any output with these columns would work fine. The arguments are roughly the same as the overall scoring function. We've also included some ggplot2
code as an example to help show how you might use our output for plotting. List.ID
indicates the list identifier for examining differences in responses between lists.
serial_output <- serial_position_multiple(data = free_output$DF_Scored, position = "position", answer = "Answer", key = answer_long$Answer, key.trial = answer_long$List.ID, scored = "Scored", id.trial = "List.Number") head(serial_output) ggplot(serial_output, aes(Tested.Position, Proportion.Correct, color = List.ID)) + geom_line() + geom_point() + xlab("Tested Position") + ylab("Probability of First Response") + theme_bw()
Conditional response probability is the likelihood of answers given the current answer set. Therefore, the column participant_lags represents the lag between the written and tested position (e.g., chair was listed second, which represents a lag of -6 from spot number 8 on the answer key list). The column Freq
represents the frequency of the lags between listed and shown position, while the Possible.Freq
column indicates the number of times that frequency could occur given each answer listed (e.g., given the current answer, a tally of the possible lags that could still occur). The CRP
column calculates the conditional response probability, or the frequency column divided by the possible frequencies of lags.
crp_output <- crp_multiple(data = free_output$DF_Scored, key = answer_long$Answer, position = "position", scored = "Scored", answer = "Answer", id = "Sub.ID", key.trial = answer_long$List.ID, id.trial = "List.Number") head(crp_output) crp_output$participant_lags <- as.numeric(as.character(crp_output$participant_lags)) ggplot(crp_output, aes(participant_lags, CRP, color = List.Number)) + geom_line() + geom_point() + xlab("Lag Distance") + ylab("Conditional Response Probability") + theme_bw()
Participant answers are first filtered for their first response, and these are matched to the original order on the answer key list (Tested.Position
). Then the frequency (Freq
) of each of those answers is tallied and divided by the number of participants overall or by group if the group.by
argument is included (pfr
).
pfr_output <- pfr_multiple(data = free_output$DF_Scored, key = answer_long$Answer, position = "position", scored = "Scored", answer = "Answer", id = "Sub.ID", key.trial = answer_long$List.ID, id.trial = "List.Number") head(pfr_output) pfr_output$Tested.Position <- as.numeric(as.character(pfr_output$Tested.Position)) ggplot(pfr_output, aes(Tested.Position, pfr, color = List.ID)) + geom_line() + geom_point() + xlab("Tested Position") + ylab("Probability of First Response") + theme_bw()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.