evaluate_expansion: Use word embedding to automatically expand seed set and...

View source: R/augmentation.R

evaluate_expansionR Documentation

Use word embedding to automatically expand seed set and compute metrics

Description

This will split the terms in train and test sets, use 'nearest_neighbours' to generate similar words, and test whether the generated words occur in the test set, giving a pseudo-precision and pseudo-recall. Note that both will be lower than actualy precision and recall as we assume the seed words are incomplete (why else expand it), but it can serve as an indication of how many terms to consider.

Usage

evaluate_expansion(seed, vectors, split = 0.5, n = 10)

Arguments

vectors

A vectors object, e.g. as returned by load_fasttext

split

Faction to use as training data

n

Number of times to resample/repeat the evaluation

dictionary

a character vector of words containing wildcards

ft_model

a FastTextR model

words

a character vector of words

vocabulary

If given, limit results to words from this vocabulary (e.g. only words occurring in the target corpus)

k

Number of candidates to investigate

Details

Note that the seed set will be passed through 'expand_wildcards' after the train/test split. If using wildcards, it might be better to provide the seed set before expansion as otherwise very similar terms might be in the train and seed set

Value

A tibble containing terms and (pseudo-)metrics


vanatteveldt/CAVA documentation built on June 4, 2022, 1:20 p.m.