embed_target: Embed target using either: (a) a la carte OR (b) simple...

View source: R/embed_target.R

embed_targetR Documentation

Embed target using either: (a) a la carte OR (b) simple (untransformed) averaging of context embeddings

Description

For a vector of contexts (generally the context variable in get_context output), return the transformed (or untransformed) additive embeddings, aggregated or by instance, along with the local vocabulary. Keep track of which contexts were embedded and which were excluded.

Usage

embed_target(
  context,
  pre_trained,
  transform = TRUE,
  transform_matrix,
  aggregate = TRUE,
  verbose = TRUE
)

Arguments

context

(character) vector of texts - context variable in get_context output

pre_trained

(numeric) a F x D matrix corresponding to pretrained embeddings. F = number of features and D = embedding dimensions. rownames(pre_trained) = set of features for which there is a pre-trained embedding.

transform

(logical) if TRUE (default) apply the 'a la carte' transformation, if FALSE ouput untransformed averaged embeddings.

transform_matrix

(numeric) a D x D 'a la carte' transformation matrix. D = dimensions of pretrained embeddings.

aggregate

(logical) - if TRUE (default) output will return one embedding (i.e. averaged over all instances of target) if FALSE output will return one embedding per instance

verbose

(logical) - report the observations that had no overlap the provided pre-trained embeddings

Details

required packages: quanteda

Value

list with three elements:

target_embedding

the target embedding(s). Values and dimensions will vary with the above settings.

local_vocab

(character) vocabulary that appears in the set of contexts provided.

obs_included

(integer) rows of the context vector that were included in the computation. A row (context) is excluded when none of the words in the context are present in the pre-trained embeddings provided.

Examples

# find contexts for term immigration
context_immigration <- get_context(x = cr_sample_corpus, target = 'immigration',
                        window = 6, valuetype = "fixed", case_insensitive = TRUE,
                        hard_cut = FALSE, verbose = FALSE)

contexts_vectors <- embed_target(context = context_immigration$context,
pre_trained = cr_glove_subset,
transform = TRUE, transform_matrix = cr_transform,
aggregate = FALSE, verbose = FALSE)

conText documentation built on Feb. 16, 2023, 7:32 p.m.