knitr::opts_chunk$set() knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width=8, fig.height=7 )
Within synesthesia research, specifically research about grapheme-color synesthesia, consistency tests are often used. Basically, a participant is shown a set of graphemes (e. g. letters, digits, months, or other words) and has to respond by indicating what color, if any, they associate with the grapheme. Each grapheme is repeated a certain number of times, usually three times, throughout the test. If the participant is unusually consistent in the colors they respond with, e. g. choosing a red color every time an 'A' is shown and always choosing a green color for 'H', this indicates that the participant might have synesthetic grapheme-color associations. There might of course be other reasons for consistent response patterns, such as if the participant employed mnemonic strategies, for which reason consistency testing is often combided with other measures for synesthesia classification. Still, consistency testing is widely considered to be an important tool for synesthesia research.
In consistency testing, participants' levels of consistency are estimated by using a score based on response colors' color space distances. This has been called by different names, such as 'color variation score'. Within the synr package, the term 'consistency score' is used to refer to this score.
The aim of this package is to facilitate analysis of consistency test data by providing functionality for rolling all the consistency test data up into one specialized R object. This object, with its linked methods (functions), constitutes an effective interface for:
When using synr, your main interface to the data is the ParticipantGroup class. For information on how to convert raw consistency test data into a ParticipantGroup object, please see the separate tutorial Creating ParticipantGroup objects. Here, it's assumed that the raw data are in 'long format', as briefly shown below:
Since the data are in 'long format', they can be used with
create_participantgroup like this:
pg <- create_participantgroup( raw_df=synr_exampledf_long_small, n_trials_per_grapheme=2, id_col_name="participant_id", symbol_col_name="trial_symbol", color_col_name="response_color", time_col_name="response_time", color_space_spec="Luv" )
Once you have a ParticipantGroup object (simply called 'participantgroup' from here on), you can start using methods with it and accessing its attributes. synr implements this with reference classes, which is an advanced topic. The idea however is that you can learn through synr's articles and help documentation how to use the tools without having to worry about how they work under the hood.
The participantgroup has a nested structure, where the participantgroup has a list of participants, and each participant has a list of graphemes. By using
$ as a separator, you can specify a participant and/or a grapheme to drill down into this nested structure and access data or methods you need.
Let's start by accessing participants' data. Say you want to find what colors the participant with ID '3' used for the symbol 'A'. You can do this by using the syntax
fetched_grapheme_data <- pg$participants[['3']]$graphemes[['A']] fetched_grapheme_data
The response colors are represented by an nx3 matrix, where n is the number of trials per grapheme (2 in this example). Each row corresponds to one response. The three columns correspond to the dimensions of the used color space, in this case 'L', 'u' and 'v' (because of the specification
color_space_spec = "Luv" when creating the participantgroup).
You can access participants by either their row number in the raw data frame, or by their participant ID. In the example, these happen to be the same except that one is of type numeric and the other of type character (the participant on row number 3 of the raw data frame has the ID '3'), so both
pg$participants[]$graphemes[['A']] work. If instead the third participant had the ID 'jane', you could use either
A method is a function that is linked to a particular R object, and synr relies heavily on methods. The syntax for using methods is
<object>$<method_name>(). The examples below illustrate this.
You can calculate the consistency score of a single grapheme by:
# fetching the consistency score of the second participant's grapheme 'A' cscore_p2_A <- pg$participants[]$graphemes[['A']]$get_consistency_score() cscore_p2_A
There are many more grapheme-level methods, but you usually only need the corresponding participant- and participantgroup-level methods. For this reason, no more examples of grapheme-level methods are provided in this tutorial; you can instead read the help documentation if you want (run
You can calculate an individual participant's mean consistency score by:
mean_cscore_p1 <- pg$participants[['1']]$get_mean_consistency_score() mean_cscore_p1
The participantgroup method
get_mean_consistency_scores calculates the mean consistency scores for all participants, producing a numeric vector with the consistency scores:
mean_cscores <- pg$get_mean_consistency_scores() mean_cscores
The order of the mean consistency scores is based on the order of participants in the original raw data frame.
To form a data frame that shows which participant goes with which participant score, the participantgroup method
get_ids comes in handy:
mean_cscores <- pg$get_mean_consistency_scores() p_ids <- pg$get_ids() mean_scores_df <- data.frame(participant_id=p_ids, mean_consistency_score=mean_cscores) mean_scores_df
It often helps to see how many valid color responses participants have provided during the experiment. It's common for consistency tests to provide some kind of 'no color' response. This is usually provided mainly as a tool for people who do have synesthetic associations to use for non-inducing stimuli, but might be 'abused' by people with no synesthetic associations. A mean consistency score is meaningless if a participant has for instance responded with 'no color' for all but two graphemes' responsese, since it's simple to memorize colors for two symbols.
Note that for synr to work, 'no color' responses must be coded as
NA values. For more information about this, please see the article Creating ParticipantGroup objects.
This method returns the number of graphemes that only have non-NA color responses. Thus, if data are from a consistency test with 3 trials/grapheme, the number of graphemes with 3 non-NA responses is returned.
num_onlynonna_p2 <- pg$participants[['2']]$get_number_all_colored_graphemes() num_onlynonna_p2
So, the second participant gave only valid (non-NA) color responses for 3 graphemes.
The rest of the tutorial will focus on participantgroup-level methods. You can find more info about participant-level methods by running
The participantgroup method
get_numbers_all_colored_graphemes produces a numeric vector that holds each participant's number of valid color responses:
num_onlynonna <- pg$get_numbers_all_colored_graphemes() num_onlynonna
All three participants gave only valid (non-NA) color responses for 3 graphemes. The values are in the same order that participants were in in the raw data frame, meaning that the first value corresponds to the first participant, and so on.
Of course, you can combine these values with participant ID's just like we did above with mean consistency scores:
mean_cscores <- pg$get_mean_consistency_scores() num_onlynonna <- pg$get_numbers_all_colored_graphemes() p_ids <- pg$get_ids() ctest_summary_df <- data.frame( participant_id=p_ids, mean_consistency_score=mean_cscores, num_valid_graphemes=num_onlynonna ) head(ctest_summary_df)
synr includes a unique procedure for validating participant data based on estimating how varied participants' color responses are. Detailed information is available in the validation-focused article. A very rough example and explanation follows.
The larger example data frame
synr_exampledf_large (with 3 trials per grapheme) is used in this example:
pg_large <- create_participantgroup( raw_df=synr_exampledf_large, n_trials_per_grapheme=3, id_col_name="participant_id", symbol_col_name="trial_symbol", color_col_name="response_color", color_space_spec="Luv" ) # see separate article for explanation of why 'set.seed' is called set.seed(1) # call validation method val_df <- pg_large$check_valid_get_twcv_scores( min_complete_graphemes = 5, dbscan_eps = 20, dbscan_min_pts = 4, max_var_tight_cluster = 150, max_prop_single_tight_cluster = 0.6, safe_num_clusters = 3, safe_twcv = 250, complete_graphemes_only = TRUE, symbol_filter = LETTERS ) head(val_df)
In the example, we basically ask synr to check for each participant if they have, looking at letters only:
This method, unlike other ones we've seen, returns a data frame. Looking at it, we can see that all data sets except the second one were classified as valid. The second data set was classified as invalid due to 'hi_prop_tight_cluster', which means that the participant responded with roughly the same color for more than 60% of all letter trials. The 'twcv' column gives a summary statistic which roughly describes how much variation there was in the participant's data. The 'num_clusters' column gives an estimate of about how many clearly discernible colors that the participant repeatedly used.
There are various ways to combine the returned data frame with participant ID's, here's one using the R built-in
val_id_df <- cbind( participant_id=pg_large$get_ids(), val_df ) head(val_id_df)
Again, please see the validation-focused article for more information.
Participants who have synesthetic associations might only have those for some of the graphemes used in a test. For instance, a participant might only have synesthetic associations for digits, but not letters, even though both categories are included in the test. synr helps you apply filters to calculate summary statistics for only a subset of graphemes. Filters are applied by passing a character vector of symbols/graphemes to the
symbol_filter= argument, when using participant-level or participantgroup-level methods for summary statistics.
weekdays_filter <- c( 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday' ) # note that the 'large' example data (rolled up in 'pg_large') # are used again here cscores_weekdays <- pg_large$get_mean_consistency_scores(symbol_filter=weekdays_filter) cscores_weekdays
The produced vector holds each participant's (there are 5 participants in this case) mean consistency score, only taking data from trials that had a weekday grapheme into account.
If calculating many statistics with synr, putting them into a single data frame becomes unwieldy. It may help to separate different kinds of summary statistics into different data frames. When data (produced by synr or from other sources) need to be combined, data frames can be combined with the R
merge function. A simple example follows:
pg <- create_participantgroup( raw_df=synr_exampledf_large, n_trials_per_grapheme=3, id_col_name="participant_id", symbol_col_name="trial_symbol", color_col_name="response_color", time_col_name="response_time", color_space_spec="Luv" ) # form first data frame, with consistency scores mean_cscores <- pg$get_mean_consistency_scores() p_ids <- pg$get_ids() cons_df <- data.frame( participant_id=p_ids, mean_consistency_score=mean_cscores ) # form second data frame, with validation-related information val_df <- cbind( participant_id=pg$get_ids(), pg$check_valid_get_twcv_scores() ) # combine the two data frames, by telling R to 'link them up' # based on the 'participant_id' column cons_val_df <- merge(cons_df, val_df, by='participant_id') head(cons_val_df)
They key is to make sure each separate data frame includes the participant ID's and then set
by='participant_id' (or whatever the data frames' participant ID columns are named).
It can often be helpful to get an overview of participants' response colors and each grapheme's consistency score. synr uses ggplot2 to achieve this.
For details on how the
get_plot method works, please have a look at the documentation for the Participant class, by using
help(Participant). There, you can scroll down to the description of
get_plot, under "Methods".
pg_large <- create_participantgroup( raw_df=synr_exampledf_large, n_trials_per_grapheme=3, id_col_name="participant_id", symbol_col_name="trial_symbol", color_col_name="response_color", color_space_spec="Luv" ) # increase grapheme size and angle them slightly to make them easier to see, # and only include digits and letters (excluding the weekday data in this # example) p6_plot <- pg_large$participants[['partA']]$get_plot( grapheme_size=2.2, grapheme_angle=30, symbol_filter = c(0:9, LETTERS) ) p6_plot
On the left side of the plot, you see the graphemes used in the test, colored in the participant's response colors. The bars represent the consistency score for each grapheme.
For details on how the
save_plot method works, have a look again at
help(Participant). Scroll down to the description of
save_plot, under "Methods". What is most essential is that you specify the
save_dir= argument, which is where you want the plot to be saved (including filename at the end):
pg_large$participants[]$save_plot( save_dir='path/to/save/folder/', file_format='png', grapheme_size=2.2, grapheme_angle=30 )
For details on how the
save_plots method works, run
help(ParticipantGroup). Scroll down to the description of
save_plots, under "Methods". What is most essential is again that you specify the
save_dir= argument, which is the directory you want the plots to be saved to, and the
pg_large$save_plots( save_dir='path/to/save/folder', file_format='png', grapheme_size=2.2, grapheme_angle=30 )
There are additional articles which explain synr, including some mentioned throughout this article. To better understand how synr is used in practice, you might want to read Using synr with real data: Coloured vowels.
There is more detailed and technical information about synr that you can find in the help documentation, as mentioned throughout this article.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.