control_for_euc: Control for Euclidean distance in several numeric variables

View source: R/control_for_euc.R

control_for_eucR Documentation

Control for Euclidean distance in several numeric variables

Description

This function is a wrapper for control_for_map that allows you to easily control for Euclidean distance.

Usage

control_for_euc(
  x,
  vars,
  tol,
  name = NA,
  scale = TRUE,
  center = TRUE,
  weights = NA,
  standardise_weights = TRUE,
  euc_df = NA,
  standard_eval = FALSE
)

Arguments

x

A data frame containing the IV and strings, or a LexOPS_pipeline object resulting from one of split_by(), control_for(), etc..

vars

The columns from which to calculate Euclidean distance.

tol

The desired control tolerance, in Euclidean distance (will be interpreted as scaled Euclidean distance if scaled == TRUE).

name

What the output column should be named. If NA (default), will automatically assign as sprintf("control_fun_%i", nr), where nr is the number of the control function.

scale, center

How should variables be scaled and/or centred before calculating Euclidean distance? For options, see the scale and center arguments of scale. Default for both is TRUE. Scaling can be useful when variables are in differently scaled.

weights

An (optional) list of weights, in the same order as vars. After any scaling is applied, the values will be multiplied by these weights. Default is NA, meaning all variables are weighted equally.

standardise_weights

Logical; should the weights be standardised to average to 1 (i.e., sum to the length of vars)? If TRUE, weights=c(1, 3, 6) will be treated as weights=c(0.3, 0.6, 1.8). Setting standardise_weights=TRUE ensures that the space itself is unchanged when weights change. This means that the same tolerance can be used when the weights change.

euc_df

The dataframe to calculate the Euclidean distance from. By default, the function will use df. Giving a different dataframe to euc_df can be useful in some cases, such as when df has been filtered for generating stimuli, but you want to calculate Euclidean Distance from a full distribution.

standard_eval

Logical; bypasses non-standard evaluation, and allows more standard R objects in vars and tol. If TRUE, vars should be a character vector referring to columns in df (e.g. c("Zipf.SUBTLEX_UK", "Length")), and tol should be a vector of length 2, specifying the tolerance (e.g. c(0, 0.5)). Default = FALSE.

Value

Returns df, with details on the variables to be controlled for added to the attributes. Run the generate function to then generate the actual stimuli.

Examples


stim <- lexops |>
  split_by(CNC.Brysbaert, 1:2 ~ 4:5) |>
  control_for_euc(c(Zipf.BNC.Written, Length), 0:0.005) |>
  generate(10)

# bypass non-standard evaluation
stim <- lexops %>%
  split_by(CNC.Brysbaert, 1:2 ~ 4:5) |>
  control_for_euc(c("Zipf.BNC.Written", "Length"), c(0, 0.005), standard_eval = TRUE) |>
  generate(10)

# generate stimuli from a filtered dataframe, but calculate
# Euclidean distance from an (original) unfiltered dataframe
library(dplyr)
stim <- lexops |>
  filter(
    Zipf.SUBTLEX_UK <= 5,
    between(Length, 3, 12),
    PK.Brysbaert >= 0.9
  ) |>
  split_by(CNC.Brysbaert, 1:2 ~ 4:5) |>
  control_for_euc(
    c(Zipf.SUBTLEX_UK, Length),
    0:0.005,
    name = "Euclidean Distance",
    euc_df = lexops
  ) |>
  generate(10)


JackEdTaylor/LexOPS documentation built on Oct. 11, 2024, 10:38 p.m.