control_for_euc: Control for Euclidean distance in several numeric variables
In JackEdTaylor/LexOPS: A Package and Shiny App for Generating Matched Stimuli

control_for_euc

R Documentation

Control for Euclidean distance in several numeric variables

Description

This function is a wrapper for control_for_map that allows you to easily control for Euclidean distance.

Usage

control_for_euc(
  x,
  vars,
  tol,
  name = NA,
  scale = TRUE,
  center = TRUE,
  weights = NA,
  standardise_weights = TRUE,
  euc_df = NA,
  standard_eval = FALSE
)

Arguments

`x`	A data frame containing the IV and strings, or a LexOPS_pipeline object resulting from one of `split_by()`, `control_for()`, etc..
`vars`	The columns from which to calculate Euclidean distance.
`tol`	The desired control tolerance, in Euclidean distance (will be interpreted as scaled Euclidean distance if `scaled == TRUE`).
`name`	What the output column should be named. If `NA` (default), will automatically assign as `sprintf("control_fun_%i", nr)`, where `nr` is the number of the control function.
`scale`, `center`	How should variables be scaled and/or centred before calculating Euclidean distance? For options, see the `scale` and `center` arguments of `scale`. Default for both is `TRUE`. Scaling can be useful when variables are in differently scaled.
`weights`	An (optional) list of weights, in the same order as `vars`. After any scaling is applied, the values will be multiplied by these weights. Default is `NA`, meaning all variables are weighted equally.
`standardise_weights`	Logical; should the weights be standardised to average to 1 (i.e., sum to the length of `vars`)? If TRUE, `weights=c(1, 3, 6)` will be treated as `weights=c(0.3, 0.6, 1.8)`. Setting `standardise_weights=TRUE` ensures that the space itself is unchanged when weights change. This means that the same tolerance can be used when the weights change.
`euc_df`	The dataframe to calculate the Euclidean distance from. By default, the function will use `df`. Giving a different dataframe to `euc_df` can be useful in some cases, such as when `df` has been filtered for generating stimuli, but you want to calculate Euclidean Distance from a full distribution.
`standard_eval`	Logical; bypasses non-standard evaluation, and allows more standard R objects in `vars` and `tol`. If `TRUE`, `vars` should be a character vector referring to columns in `df` (e.g. `c("Zipf.SUBTLEX_UK", "Length")`), and `tol` should be a vector of length 2, specifying the tolerance (e.g. `c(0, 0.5)`). Default = `FALSE`.

Value

Returns df, with details on the variables to be controlled for added to the attributes. Run the generate function to then generate the actual stimuli.

Examples


stim <- lexops |>
  split_by(CNC.Brysbaert, 1:2 ~ 4:5) |>
  control_for_euc(c(Zipf.BNC.Written, Length), 0:0.005) |>
  generate(10)

# bypass non-standard evaluation
stim <- lexops %>%
  split_by(CNC.Brysbaert, 1:2 ~ 4:5) |>
  control_for_euc(c("Zipf.BNC.Written", "Length"), c(0, 0.005), standard_eval = TRUE) |>
  generate(10)

# generate stimuli from a filtered dataframe, but calculate
# Euclidean distance from an (original) unfiltered dataframe
library(dplyr)
stim <- lexops |>
  filter(
    Zipf.SUBTLEX_UK <= 5,
    between(Length, 3, 12),
    PK.Brysbaert >= 0.9
  ) |>
  split_by(CNC.Brysbaert, 1:2 ~ 4:5) |>
  control_for_euc(
    c(Zipf.SUBTLEX_UK, Length),
    0:0.005,
    name = "Euclidean Distance",
    euc_df = lexops
  ) |>
  generate(10)

JackEdTaylor/LexOPS documentation built on Jan. 18, 2025, 10:37 a.m.