expand_terms: Use word embedding to automatically expand seed set and...

Description Usage Arguments Value

View source: R/dictools.R

Description

This will split the terms in train and test sets, use 'nearest_neighbours' to generate similar words, and test whether the generated words occur in the test set, giving a pseudo-precision and pseudo-recall. Note that both will be lower than actualy precision and recall as we assume the seed words are incomplete (why else expand it), but it can serve as an indication of how many terms to consider.

Usage

1
expand_terms(ft_model, terms, vocabulary = NULL, split = 0.5, k = 1000)

Arguments

ft_model

a FastTextR model

vocabulary

If given, limit results to words from this vocabulary (e.g. only words occurring in the target corpus)

split

Faction to use as training data

k

Number of candidates to investigate

words

a character vector of words

Value

A tibble containing terms and (pseudo-)metrics


vanatteveldt/lexpander documentation built on Jan. 21, 2022, 7:18 p.m.