match_word: An outdated wrapper for match_item()

View source: R/match_word.R

match_wordR Documentation

An outdated wrapper for match_item()

Description

The match_word() function has been replaced by match_item(). This wrapper for the function has been kept to support backwards compatibility to code written in older versions of LexOPS. This wrapper may be removed in the future, so please update your code to use match_item().

Usage

match_word(
  df = LexOPS::lexops,
  target,
  ...,
  id_col = "string",
  filter = TRUE,
  standard_eval = FALSE
)

Arguments

df

A data frame to reorder, containing the target string (default = LexOPS::lexops).

target

The target string

...

Should specify the variables and tolerances in the form ⁠Length = 0:0, Zipf.SUBTLEX_UK = -0.1:0.1, PoS.SUBTLEX_UK⁠. Numeric variables can include tolerances (as elements 2:3 of a vector). Numeric variables with no tolerances will be matched exactly.

id_col

A character vector specifying the column identifying unique observations (e.g. in LexOPS::lexops, the id_col is "string").

filter

Logical. If TRUE, matches outside the tolerances specified in vars are removed. If FALSE, a new column, matchFilter is calculated indicating whether or not the string is within all variables' tolerances. (Default = TRUE.)

standard_eval

Logical; bypasses non-standard evaluation, and allows more standard R object of list. If TRUE, ... should be a single list specifying the variables to match by and their tolerances, in the form list("numericVariable1Name", c("numericVariable2Name", -1.5, 3), "characterVariableName"). Default = FALSE.

Value

Returns data frame based on df. If filter == TRUE, will only contain matches. If filter == FALSE, will be the original df object, with a new column, "matchFilter".

See Also

lexops for the default data frame and associated variables.

Examples


# Match by number of syllables exactly
lexops |>
  match_word("thicket", Syllables.CMU)

# Match by number of syllables exactly, but keep all entries in the original dataframe
lexops |>
  match_word("thicket", Syllables.CMU, filter = FALSE)

# Match by number of syllables exactly, and rhyme
lexops |>
  match_word("thicket", Syllables.CMU, Rhyme.CMU)

# Match by length exactly, and closely by frequency (within 0.2 Zipf either way)
lexops |>
  match_word("thicket", Length, Zipf.SUBTLEX_UK = -0.2:0.2)

# The syntax makes matching by multiple variables easiy and readable
lexops |>
  match_word(
    "elephant",
    BG.SUBTLEX_UK = -0.005:0.005,
    Length = 0:0,
    Zipf.SUBTLEX_UK = -0.1:0.1,
    PoS.SUBTLEX_UK,
    RT.ELP = -10:10
  )

# Match using standard evaluation
lexops |>
  match_word("thicket", list("Length", c("Zipf.SUBTLEX_UK", -0.2, 0.2)), standard_eval = TRUE)

# Find matches within an orthographic levenshtein distance of 5 from "thicket":
library(dplyr)
library(stringdist)
targ_word <- "thicket"
lexops |>
  mutate(old = stringdist(targ_word, string, method="lv")) |>
  match_word(targ_word, old = 0:5)

# Find matches within a phonological levenshtein distance of 2 from "thicket":
# (note that this method requires 1-letter phonological transcriptions)
library(dplyr)
library(stringdist)
targ_word <- "thicket"
targ_word_pronun <- lexops |>
  filter(string == "thicket") |>
  pull(eSpeak.br_1letter)
lexops |>
  mutate(pld = stringdist(targ_word_pronun, eSpeak.br_1letter, method="lv")) |>
  match_word(targ_word, pld = 0:2)


JackEdTaylor/LexOPS documentation built on July 10, 2024, 6:40 a.m.