knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 5, fig.height = 5 )
This package is used to analyze SlimStampen App data. To use this package you need a .csv file or a set of .xlsx files with the data from the SlimStampen lessons that you want to analyze. Below will be explained how to import data from CSV and Excel files.
The functions in this package are focused on giving a general overview of what the data looks like, not to provide specific statistical analyses.
It should be noted that the calculation of alpha estimates is accurate when each user has at most one session in a lesson, but cannot be guaranteed when when users have several sessions or any lesson reset has been used.
To use this package you first need to call the library()
function to load the package into your RStudio environment.
library(SlimStampeRData)
path <- tempdir()
This vignette is a global overview of how the functions in this package can be used. Every function also has its own page in which all parameters and the input and output of the function are described. To see this page you can enter the function name with a question mark in front of it in the console.
#To see the information page of a function ?individual_RT ?read_dataset #To see the information page of the package ?SlimStampeRData #To see the index page of the package (with all functions listed) help(package = "SlimStampeRData")
You may run into warning or error messages when you run the code in this vignette. At the end of this vignette is a section with common warning and error messages and their solutions.
The read_dataset()
function reads the data from your .csv file into RStudio, so you can use it as a data table. The read_dataset()
function needs the location of the .csv file to be able to read it.
# Not run # Example of possible CSV location data <- read_dataset("C:/Users/YourName/Desktop/data.csv")
When no location is given the example data set is used.
data <- read_dataset()
The read_dataset_excel()
function reads the data from your .xlsx files into RStudio, so you can use it as a data table. The read_dataset_excel()
function needs the location of the .xlsx files to be able to read them.
There are two ways to provide the location of your Excel files, individually or by directory.
If you provide the files individually you must provide the file path to three files: the response entries, the lesson data and the fact data. You must also provide the lesson Id's of the lessons you want to parse.
# Not run # Example of possible individual Excel files locations data <- read_dataset_excel(lessons = c(23, 45), file_response = "C:/Users/YourName/x_response.xlsx", file_lesson = "C:/Users/YourName/x_lesson.xlsx", file_fact = "C:/Users/YourName/x_fact.xlsx")
If you provide a directory the three files will be selected automatically. However, for this to work correctly the files must end on "fact.xlsx", "lesson.xlsx" and "response.xlsx". There must also be only one file present in the directory that ends on each of these strings, otherwise the wrong file may be selected. If the files don't end on these strings or the files are not present in the same directory, then you must provide the file paths separately, as in the example above.
When you provide a directory you must also provide the lesson Id's of the lessons you want to parse.
# Not run # Example of Excel files directory location data <- read_dataset_excel(lessons = c(23, 45), file_dir = "C:/Users/YourName/dataFolder")
If you want to use the example data set you must set the example parameter to TRUE.
# Not run data <- read_dataset_excel(exampleset = TRUE)
It can occur that you will see the message
- Reset entries have been found, presentationStartTime is being estimated. -
in the console after running the read_dataset_excel()
function. This means that there are reset entries in the data set. These entries contain a usual userId, sequence_number and lessonId and also contain a factId of -1, but no other data. These entries are added to the database anytime a user chooses to reset one of the lessons they have done and will reset the alpha, activation and learnt facts for that user.
Since these entries contain no other data the presentationStartTime must be estimated. These estimated presentationStartTime values don't represent the actual time at which the lesson was reset, but are instead fabricated to make sure that the presentationStartTime occurs after all the trials that need to be forgotten and before any trials that were learnt by the user after the reset. For this process that sequence_number of the trials is used to determine the order in which trials and the reset occurred.
The function dataset_stats()
gives a quick summary of the data set. It is easiest to save the output in a variable, so the participant table can be accessed later on. Because this table can be very long it is not printed in the standard output.
x <- dataset_stats(data) x$participants
To view the list of participants in your viewer you can use View(x$participants)
instead.
Most functions in this package give a plot as output. The plot itself or a preview of the plots (if there are multiple) will be printed in the viewer. The plots will also be saved in a PDF in your filesystem. If you don't specify a file path the PDFs will be saved in your current directory.
If you don't know what your current directory is, you can check this by entering getwd()
in the console. If you want to work in another directory you can set this by using setwd("C:/path/to/working/directory")
If you'd like to specify where PDF files should be saved you can do this by choosing either an objective or relative path.
An objective path could be:
path <- "C:/Users/YourName/Documents"
A relative path could be:
path <- "Figures"
This means that the PDF will be saved in a folder names 'Figures' in the current directory. The function does expect this folder to already exist and will give an error if the folder does not exist.
Another example of a relative path:
path <- "../Figures"
This means that the files will be saved in the folder called 'Figures' in the folder above your current working directory.
All functions that print plots in this vignette will specify the parameter 'filepath' as a temporary folder. If you want to use the default save option when you run the function yourself, you can just leave the parameter 'filepath' out of the function call.
There are two functions that print plots for all participants in the data set. This gives an overview of how all participants performed and whether any participants have an unusual performance. The trials in which the participant gave an incorrect answer have a red marker, where the correct answers have a grey marker. It may occur that there is more than one plot for the same user-lesson combination. This can occur when the same user engaged in several sessions within the same lesson.
individual_RT(data, filepath = path)
There is also the option to print only one session for both this RT function and the individual ROF function mentioned below:
individual_RT(data, session = "7f27ee3c-2e6b-42a4-aff1-ba2492008ab6",filepath = path)
Sometimes it is easier to view the data on a logarithmic scale. To do so, you can set the logartihmic parameter to TRUE:
individual_RT(data, session = "7f27ee3c-2e6b-42a4-aff1-ba2492008ab6", filepath = path, logarithmic = TRUE)
individual_ROF(data, filepath = path)
The function above shows an error when you try to run it with the example data set. This is because the example data set does not have a column which shows the alpha for every trial. To remedy this you can run the function calculate_alpha_and_activation()
. Be aware that any existing columns that are called 'alpha' or 'activation' will be re-written.
If your data set already contains a column with the alpha values you can skip this step.
data <- calculate_alpha_and_activation(data)
You could also change the settings, if they were different in the experimental setup. For example:
# Not run data <- calculate_alpha_and_activation(data, minAlpha = 0, maxAlpha = 1, standardAlpha = 0.5, fThreshold = -0.6)
Now, if you run the individual_ROF()
function again, it should run without an error message.
individual_ROF(data, filepath = path)
There are three functions that plot an average for a variable over all participants. For this average only the first session of every participant in every session is used, as to exclude any effects that can stem from the breaks between sessions. These functions give a plot as output in which all lines represent a session. On the x-axis the repetitions of facts is used. This is the amount of times that a fact has been presented to the participant.
These functions give an error if your data set does not have a column with the number of repetition for each trial.
average_accuracy_participants(data, filepath = path)
To solve this you can run the function calculate_repetition()
. Be aware that any existing columns with the name 'repetition' will be re-written.
If your data set already contains a column with the repetition values you can skip this step.
data <- calculate_repetition(data)
Now, if you run one of the average functions, it should do so without an error message.
average_accuracy_participants(data, filepath = path)
average_RT_participants(data, filepath = path)
average_ROF_participants(data, filepath = path)
There is one function that plots the average Rate of Forgetting over all facts. This function gives a plot as output in which all lines represent a fact. Just like in the previous section, this function also uses the first session for every participant in a session to calculate the average.
This function gives an error if your data set does not have a column with the number of repetition for each trial. See the section 'Average of participants' for more information.
average_ROF_facts(data, filepath = path)
The big markers show what the last average alpha is for that fact. These last alpha's are then used to decide to order of the facts in the legend.The upper-left value is the highest last alpha, the lower- right value is the lowest last alpha. If there is more than one column in the legend (if there are too many facts to fit in one column), the columns should be read one-by-one. Be aware that lines/markers may sometimes overlap.
The parameter 'factNames' allows you to choose which column should be used to label the facts. Beware that for some data sets only a limited amount of columns will provide labels for all facts. For example, in a data set in which 'factText' and 'fileName' are both used, these columns are likely to not provide labels for all facts. In this case using the column 'factAnswer' instead may be useful.
#Instead of "factText" you could also use "factAnswer" average_ROF_facts(data, factNames = "factText", filepath = path)
Some of the functions in this package may not work if these columns are not provided. This warning will occur when one of the columns that the package expects to be there is not present. This can occur in several ways:
Column xxx contains missing values. This means that there are either empty cells or cells that contain 'NA'. Check your data to see whether the missing data belongs in the data set or not. Most of the functions in this package will run when missing data is present, but data points that contain missing data may not be printed in plots or they may skew calculated variables (such as the mean alpha/RT/accuracy).
This message usually occurs when there are sessions in the data that contain only one data point. This can lead to the graphs not looking very nice. When this occurs the graphs will still be printed in the PDF file, so there is no need to change anything if the sessions are correct in the data.
Running warnings() will show you the warnings messages that occurred when running the function.
warnings()
There are NA values in your data set. The plots for the individual sessions that contain NA values are still printed, but the rows that contain a NA value are removed and not printed as a data point. The first line that is printed in the console after the function is called should clarify which column contains NA values. Having NA values in your data can also lead to the axis of the plots being printed in unexpected ways.
The place you specified to save your file does not exist or cannot otherwise be accessed. Make sure that you have specified a directory that already exists and that R has access to.
unlink(path)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.