match_item | R Documentation |
Suggests items that are suitable matches for a target item, based on selected variables of a data frame. Note that unlike functions in the generate pipeline (e.g. control_for()
), multiple variables' tolerances can be defined in one function.
match_item(
df = LexOPS::lexops,
target,
...,
id_col = "string",
filter = TRUE,
standard_eval = FALSE
)
df |
A data frame to reorder, containing the target string (default = LexOPS::lexops). |
target |
The target string |
... |
Should specify the variables and tolerances in the form |
id_col |
A character vector specifying the column identifying unique observations (e.g. in |
filter |
Logical. If TRUE, matches outside the tolerances specified in vars are removed. If FALSE, a new column, matchFilter is calculated indicating whether or not the string is within all variables' tolerances. (Default = TRUE.) |
standard_eval |
Logical; bypasses non-standard evaluation, and allows more standard R object of list. If |
Returns data frame based on df
. If filter
== TRUE, will only contain matches. If filter
== FALSE, will be the original df
object, with a new column, "matchFilter".
lexops
for the default data frame and associated variables.
# Match by number of syllables exactly
lexops |>
match_item("thicket", Syllables.CMU)
# Match by number of syllables exactly, but keep all entries in the original dataframe
lexops |>
match_item("thicket", Syllables.CMU, filter = FALSE)
# Match by number of syllables exactly, and rhyme
lexops |>
match_item("thicket", Syllables.CMU, Rhyme.CMU)
# Match by length exactly, and closely by frequency (within 0.2 Zipf either way)
lexops |>
match_item("thicket", Length, Zipf.SUBTLEX_UK = -0.2:0.2)
# The syntax makes matching by multiple variables easiy and readable
lexops |>
match_item(
"elephant",
BG.SUBTLEX_UK = -0.005:0.005,
Length = 0:0,
Zipf.SUBTLEX_UK = -0.1:0.1,
PoS.SUBTLEX_UK,
RT.ELP = -10:10
)
# Match using standard evaluation
lexops |>
match_item("thicket", list("Length", c("Zipf.SUBTLEX_UK", -0.2, 0.2)), standard_eval = TRUE)
# Find matches within an orthographic levenshtein distance of 5 from "thicket":
library(dplyr)
library(stringdist)
targ_word <- "thicket"
lexops |>
mutate(old = stringdist(targ_word, string, method="lv")) |>
match_item(targ_word, old = 0:5)
# Find matches within a phonological levenshtein distance of 2 from "thicket":
# (note that this method requires 1-letter phonological transcriptions)
library(dplyr)
library(stringdist)
targ_word <- "thicket"
targ_word_pronun <- lexops |>
filter(string == "thicket") |>
pull(eSpeak.br_1letter)
lexops |>
mutate(pld = stringdist(targ_word_pronun, eSpeak.br_1letter, method="lv")) |>
match_item(targ_word, pld = 0:2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.