knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
flag_resp()
to create and compare flagging strategiesOne use-case for response quality indicators is to use them to flag responses
which potentially are of low quality. resquin
provides the function flag_resp()
to create a data frame of booleans (T
and F
) according to user-defined cut-off values on
response quality indicators. If a respondent receives a T
value, they are flagged
as suspicious. If they receive F
value, they are deemed unsuspicious.
The strength of flag_resp()
lies in its ability to quickly create and compare
multiple flagging strategies, as the following example illustrates:
Suppose we use data on response styles to decide whether respondents are low-quality
responders on the 15 item nep
scale. We can use resp_styles()
to calculate response style indices
per respondent.
library(resquin) nep_resp_styles <- resp_styles( x = nep, scale_min = 1, # minimum response option scale_max = 5, # maximum response option min_valid_responses = 1) # default, excludes respondents with any missing value summary(nep_resp_styles)
In the first example, we will consider the acquiescence response style (ARS).
ARS represents the tendency of respondents to agree to questions regardless of
their content. Since the nep
scale includes positively and negatively keyed items,
we can expect that higher ARS values indeed correspond to this behavior: Respondents
who are more concerned about nature should choose higher response options on the positively
keyed items and more negative responses on the negatively keyed items. Just choosing
all high response options presents a substantively inconsistent response behavior,
potentially caused by acquiescence.
A first idea could be to flag respondents which have more than 80% responses in the ARS category.
first_flagging <- flag_resp(nep_resp_styles, ARS > 0.8) summary(first_flagging)
We can see that 33 respondents are flagged as suspicious, as their ARS score is above 0.8.
In a second step, we might also be interested in flagging respondents who choose the same response option repeatedly. We can use the resp_patterns()
to compute the longest string length indicator. This indicator shows the longest string of repeated response options. We will flag respondents which have a longest string length of 8 or more. We keep the ARS flagging strategy in place to compare it to the new one.
nep_resp_patterns <- resp_patterns(nep) nep_resp_patterns_resp_styles <- cbind(nep_resp_styles,nep_resp_patterns[,-1]) second_flagging <- flag_resp(nep_resp_patterns_resp_styles, ARS > 0.8, longest_string_length >= 8) summary(second_flagging)
We can see that 19 respondents have a longest string length of larger or equal to 8. The output also contains an agreement matrix between the flagging strategies. In the second row of the first column, we can see that the two flagging strategies agree on 9 flagged respondents. Together, both strategies would flag 33 + 19 - 9 = 43 respondents of 1222.
It is also possible to join mutliple flagging expressions with an &
or |
operator.
flag_resp(nep_resp_patterns_resp_styles, ARS > 0.8, longest_string_length >= 8, ARS > 0.8 | longest_string_length >= 8) |> summary()
We can use any vector of logical (i.e. T
and F
) values with the same number of rows as the nep
data
frame and compare them with the values provided by resquin
. In the following
example we create a random vector of boolean values and add it to the data frame
from the last example.
random_vector <- sample(c(F,T),1000,replace = T) random_vector[is.na(nep_resp_styles$ARS)] <- NA # Add missing data as in the other data frames # example three contains response indicator values per respondent external_indicator_data <- cbind( nep_resp_patterns_resp_styles, new_indicator = random_vector) flag_resp(external_indicator_data, ARS > 0.8, longest_string_length >= 8, new_indicator == T) |> summary()
The new indicator new_indicator
now is included in the output of the summary function and can be compared with the other indicators.
The output of flag_resp()
can be used to filter out the flagged respondents. The
output of flag_resp()
is just a collection of logicals:
flag_df <- flag_resp( nep_resp_patterns_resp_styles, ARS > 0.8, longest_string_length >= 8, ARS > 0.8 | longest_string_length >= 8) flag_df
We can use these to filter respondents from the original nep
dataset.
We can exclude the flagged respondent.
# Exclude the 33 flagged respondents with ARS > 0.8 nep[!flag_df$`ARS > 0.8`,] |> na.omit() #exclude respondents with missing values
Alternatively we can filter out the flagged respondent.
# Extract only the 33 flagged respondents with ARS 0.8 nep[flag_df$`ARS > 0.8`,] |> na.omit()
Notice that you can also use the id
column in the flag_df
to join the flag_df
to your original data.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.