dictionary_meta: Assess Dictionary Categories Within a Latent Semantic Space

View source: R/dictionary_meta.R

dictionary_metaR Documentation

Assess Dictionary Categories Within a Latent Semantic Space

Description

Assess Dictionary Categories Within a Latent Semantic Space

Usage

dictionary_meta(dict, space = "auto", n_spaces = 5, suggest = FALSE,
  suggestion_terms = 10, suggest_stopwords = FALSE,
  suggest_discriminate = TRUE, pretrim_space = FALSE,
  expand_cutoff_freq = 0.98, expand_cutoff_spaces = 10,
  dimension_prop = 1, pairwise = TRUE, glob = TRUE,
  space_dir = getOption("lingmatch.lspace.dir"), verbose = TRUE)

Arguments

dict

A vector of terms, list of such vectors, or a matrix-like object to be categorized by read.dic.

space

A vector space used to calculate similarities between terms. Names of spaces (see select.lspace), a matrix with terms as row names, or "auto" to auto-select a space based on matched terms. This can also be multi to use multiple spaces, which are combined after similarities are calculated.

n_spaces

Number of spaces to draw from if space is multi.

suggest

Logical; if TRUE, will search for other terms for possible inclusion in space.

suggestion_terms

Number of terms to use when selecting suggested additions.

suggest_stopwords

Logical; if TRUE, will suggest function words.

suggest_discriminate

Logical; if TRUE, will adjust for similarity to other categories when finding suggestions.

pretrim_space

Logical; if TRUE and pairwise is TRUE, will remove terms with a similarity of 0 or less with all category centroids, prior to calculating similarities for suggestions.

expand_cutoff_freq

Proportion of mapped terms to include when expanding dictionary terms. Applies when space is a character (referring to a space to be loaded).

expand_cutoff_spaces

Number of spaces in which a term has to appear to be considered for expansion. Applies when space is a character (referring to a space to be loaded).

dimension_prop

Proportion of dimensions to use when searching for suggested additions, where less than 1 will calculate similarities to the category core using fewer dimensions of the space.

pairwise

Logical; if FALSE, will compare candidate suggestion terms with a single, averaged category vector rather than all category terms separately.

glob

Logical; if TRUE, converts globs (asterisk wildcards) to regular expressions.

space_dir

Directory from which space should be loaded.

verbose

Logical; if FALSE, will not show status messages.

Value

A list:

  • expanded: A version of dict with fuzzy terms expanded.

  • summary: A summary of each dictionary category.

  • terms: Match (expanded term) similarities within terms and categories.

  • suggested: If suggest is TRUE, a list with suggested additions for each dictionary category. Each entry is a named numeric vector with similarities for each suggested term.

Examples

dict <- list(
  furniture = c("table", "chair", "desk*", "couch*", "sofa*"),
  well_adjusted = c("happy", "bright*", "friend*", "she", "he", "they")
)
dictionary_meta(dict)

miserman/lingmatch documentation built on April 30, 2024, 5:53 a.m.