View source: R/utils-embedding-vectors.R
| test_anchors | R Documentation |
This function evaluates how well an anchor set defines a semantic relations using one of two methods: pairdir (which only evaluates semantic directions) or relco which evaluations semantic directions, semantic centroids and compound concepts). See details.
test_anchors(
anchors,
wv,
non_anchors = NULL,
method,
all = FALSE,
type = c("direction", "centroid", "compound"),
conf = 0.95,
dir_method = c("paired", "pooled", "L2", "PCA"),
n_runs = 100,
null = 0,
alpha = 0.5,
seed = NULL,
order_non_anchors = FALSE,
summarize = TRUE
)
anchors |
A data frame or list of 'anchor' terms |
wv |
Matrix of word embedding vectors (a.k.a embedding model) with rows as terms. |
non_anchors |
For 'relco', terms that are not anchors (random, unrelated, or distinctive terms). |
method |
Which metric used to evaluate, 'pairdir' or 'relco' |
all |
Logical (default |
type |
For 'relco', indicate which kind of relation, "direction", "centroid", "compound" |
conf |
For 'relco', confidence interval |
dir_method |
For 'relco' and |
n_runs |
For 'relco', number of runs |
null |
For 'relco', null hypothesis, default is 0. |
alpha |
For 'relco', significance level |
seed |
For 'relco', set sampling seed |
order_non_anchors |
Logical (default |
summarize |
Logical (default |
PairDir evaluates how parallel two anchor sets are when used to define a semantic direction. According to Boutyline and Johnston (2023):
"We find that PairDir – a measure of parallelism between the offset vectors (and thus of the internal reliability of the estimated relation) – consistently outperforms other reliability metrics in explaining axis accuracy."
Boutyline and Johnston only consider analyst specified pairs. However,
if all = TRUE, all pairwise combinations of terms between each set
are evaluated. This can allow for unequal sets of anchors, however this
increases computational complexity considerably.
Relco (anchor reliability coefficient) evaluates how well individual anchors
index a given semantic relation in comparison to a set of non-anchor words.
This can be used on semantic directions, semantic relations, or compound concepts.
See Taylor et al (2025) for details; see also the CMDist() function.
dataframe or list
Boutyline, Andrei, and Ethan Johnston. 2023. “Forging Better Axes: Evaluating and Improving the Measurement of Semantic Dimensions in Word Embeddings.” \Sexpr[results=rd]{tools:::Rd_expr_doi("10.31235/osf.io/576h3")}
Taylor, Marshall, et al. 2025. "A Simulation-Based Slope Metric for Anchor List Reliability in Word Embedding Spaces." \Sexpr[results=rd]{tools:::Rd_expr_doi("10.31235/osf.io/sc2ub_v3")}
# load example word embeddings
data(ft_wv_sample)
df_anchors <- data.frame(
a = c("rest", "rested", "stay", "stand"),
z = c("coming", "embarked", "fast", "move")
)
# test pairdir
test_anchors(df_anchors, ft_wv_sample, method = "pairdir")
test_anchors(df_anchors, ft_wv_sample, method = "pairdir", all = TRUE)
# test relco
non_anchors <- c("writ", "alloys", "ills", "atlas", "saturn", "cape", "unfolds")
## centroid
test_anchors(df_anchors[, 1], ft_wv_sample, method = "relco",
type = "centroid", non_anchors = non_anchors)
## compound
test_anchors(df_anchors$a, ft_wv_sample, method = "relco",
type = "compound", non_anchors = non_anchors)
## direction
test_anchors(df_anchors, ft_wv_sample, method = "relco",
type = "direction", dir_method = "paired",
non_anchors = non_anchors)
test_anchors(df_anchors, ft_wv_sample, method = "relco",
type = "direction", dir_method = "pooled",
non_anchors = non_anchors)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.