flag_contaminant_rts: Flag contaminant reaction times using mixture modeling

View source: R/helpers-data.R

flag_contaminant_rtsR Documentation

Flag contaminant reaction times using mixture modeling

Description

Identifies contaminant RTs (fast guesses, attention lapses) at the trial level using mixture modeling. For each trial, it computes the posterior probability of being a contaminant given a mixture of a uniform distribution (contaminants) and an RT distribution.

The function takes a numeric vector of RTs and returns a numeric vector of contamination probabilities, making it compatible with dplyr::mutate() and dplyr::group_by() workflows.

Usage

flag_contaminant_rts(
  rt,
  distribution = c("exgaussian", "lognormal", "invgaussian"),
  contaminant_bound = c("min", "max"),
  init_contaminant = 0.05,
  max_contaminant = 0.5,
  maxit = 100,
  tol = 1e-06
)

Arguments

rt

Numeric vector. Reaction times in seconds. Must be positive.

distribution

Character. RT distribution for the mixture model: "exgaussian" (default), "lognormal", or "invgaussian".

contaminant_bound

Vector of length 2. Bounds ⁠[lower, upper]⁠ for the uniform contaminant distribution. Can be numeric values or "min"/"max" for data-driven bounds. Default c("min", "max").

init_contaminant

Numeric. Initial contaminant proportion for EM algorithm. Must be in (0, 1). Default 0.05.

max_contaminant

Numeric. Maximum allowed contaminant proportion. Values exceeding this are clipped with a warning. Must be in (0, 1]. Default 0.5.

maxit

Integer. Maximum EM iterations. Default 100.

tol

Numeric. Convergence tolerance for log-likelihood. Default 1e-6.

Details

Mixture Model

The function fits: f(RT) = pi_c * Uniform(a,b) + (1-pi_c) * f_RT(RT|theta)

where pi_c is the contaminant proportion, Uniform(a,b) is the contaminant distribution over contaminant_bound, and f_RT is the specified RT distribution with parameters theta.

Grouping

To fit separate mixtures by condition or response boundary, use dplyr::group_by() before calling this function inside dplyr::mutate().

Diagnostics

Mixture fit diagnostics (parameters, convergence, log-likelihood) are attached as the "diagnostics" attribute of the returned vector. Access them with attr(result, "diagnostics").

Value

Numeric vector of posterior contamination probabilities P(contaminant | RT), with a "diagnostics" attribute containing a one-row data.frame with columns: mixture_params (list), contaminant_prop, converged, iterations, loglik, n_trials, distribution, method.

See Also

ezdm_summary_stats() for aggregated RT statistics with contamination handling, validate_fast_guesses() for testing whether flagged contaminants show random guessing behavior

Examples

## Not run: 
# Simulate data with contaminants
library(bmm)
set.seed(123)
rt_clean <- rgamma(150, shape = 5, rate = 10)
rt_contam <- runif(50, 0.1, 0.2)

data <- data.frame(
  rt = c(rt_clean, rt_contam),
  subject = 1,
  response = sample(c("upper", "lower"), 200, replace = TRUE)
)

# Basic usage with mutate
library(dplyr)
data <- data |>
  mutate(contam_prob = flag_contaminant_rts(rt))

# Hard threshold: remove trials with P(contaminant) > 0.5
data_clean <- data |> filter(contam_prob <= 0.5)

# Separate fits by response boundary
data <- data |>
  group_by(subject, response) |>
  mutate(contam_prob = flag_contaminant_rts(rt))

# Access diagnostics
probs <- flag_contaminant_rts(data$rt)
attr(probs, "diagnostics")

## End(Not run)

bmm documentation built on March 30, 2026, 5:08 p.m.