euc_dists: Calculate a word's Euclidean distance from other words
In JackEdTaylor/LexOPS: A Package and Shiny App for Generating Matched Stimuli

euc_dists

R Documentation

Calculate a word's Euclidean distance from other words

Description

Caclulates the Euclidean distance of a word from all other words in a df, on selected variables.

Usage

euc_dists(
  df = LexOPS::lexops,
  target,
  vars = "all",
  scale = TRUE,
  center = TRUE,
  weights = NA,
  standardise_weights = TRUE,
  id_col = "string",
  standard_eval = FALSE
)

Arguments

`df`	A data frame.
`target`	The target string (word) that euclidean distances are required for.
`vars`	The variables to be used as dimensions which Euclidean distance should be calculated over. Can be a vector of variable names (e.g. `c(Zipf.SUBTLEX_UK, Length)`), or, `"all"`, to use all numeric variables in the data frame. The default is `"all"`.
`scale`, `center`	How should variables be scaled and/or centred before calculating Euclidean distance? For options, see the `scale` and `center` arguments of `scale`. Default for both is `TRUE`. Scaling can be useful when variables are in differently scaled.
`weights`	An (optional) list of weights, in the same order as `vars`. After any scaling is applied, the values will be multiplied by these weights. Default is `NA`, meaning no weights are applied.
`standardise_weights`	Logical; should the weights be standardised to average to 1 (i.e., sum to the length of `vars`)? If TRUE, `weights=c(1, 3, 6)` will be treated as `weights=c(0.3, 0.6, 1.8)`. Setting `standardise_weights=TRUE` ensures that the space itself is unchanged when weights change. This means, for example, that the same tolerance can be used in `control_for_euc()`.
`id_col`	The column containing the strings (default = `"string"`).
`standard_eval`	Logical; bypasses non-standard evaluation, and allows more standard R objects in `vars`. If `TRUE`, `vars` should be a character vector referring to columns in `df` (e.g. `c("Length", "Zipf.SUBTLEX_UK")`). Default = `FALSE`.

Value

Returns a vector of Euclidean distances, in the order of rows in df.

Examples


# Get the distance of every entry in the `lexops` dataset from the word "thicket".
# (Note: This will be calculated using the dimensions of frequency, arousal, and size)
lexops |>
  euc_dists("thicket", c(Zipf.SUBTLEX_UK, AROU.Warriner, SIZE.Glasgow_Norms))

# no scaling or centering
lexops |>
  euc_dists(
    "thicket",
    c(Zipf.SUBTLEX_UK, AROU.Warriner, SIZE.Glasgow_Norms),
    scale = FALSE,
    center = FALSE
  )

# Add Euclidean distance as new column
# (Also sort ascendingly by distance; barbara will have a distance of 0 so will be first)
lexops %>%
  dplyr::mutate(ed = euc_dists(., "barbara", c(Length, Zipf.SUBTLEX_UK, BG.SUBTLEX_UK))) |>
  dplyr::arrange(ed)

# bypass non-standard evaluation
lexops |>
  euc_dists(
    "thicket",
    c("Zipf.SUBTLEX_UK", "AROU.Warriner", "SIZE.Glasgow_Norms"),
    standard_eval = TRUE
  )

JackEdTaylor/LexOPS documentation built on Jan. 18, 2025, 10:37 a.m.