compute_transform: Compute transformation matrix A
In conText: 'a la Carte' on Text (ConText) Embedding Regression

compute_transform

R Documentation

Compute transformation matrix A

Description

Computes a transformation matrix, given a feature-co-occurrence matrix and corresponding pre-trained embeddings.

Usage

compute_transform(x, pre_trained, weighting = 500)

Arguments

x

a (quanteda) fcm-class object.

pre_trained

(numeric) a F x D matrix corresponding to pretrained embeddings, usually trained on the same corpus as that used for x. F = number of features and D = embedding dimensions. rownames(pre_trained) = set of features for which there is a pre-trained embedding

weighting

(character or numeric) weighting options:

1: no weighting.
"log": weight by the log of the frequency count.
numeric: threshold based weighting (= 1 if token count meets threshold, 0 ow).

Recommended: use log for small corpora, a numeric threshold for larger corpora.

Value

a dgTMatrix-class D x D non-symmetrical matrix (D = dimensions of pre-trained embedding space) corresponding to an 'a la carte' transformation matrix. This matrix is optimized for the corpus and pre-trained embeddings employed.

Examples


library(quanteda)

# note, cr_sample_corpus is too small to produce sensical word vectors

# tokenize
toks <- tokens(cr_sample_corpus)

# construct feature-co-occurrence matrix
toks_fcm <- fcm(toks, context = "window", window = 6,
count = "weighted", weights = 1 / (1:6), tri = FALSE)

# you will generally want to estimate a new (corpus-specific)
# GloVe model, we will use cr_glove_subset instead
# see the Quick Start Guide to see a full example.

# estimate transform
local_transform <- compute_transform(x = toks_fcm,
pre_trained = cr_glove_subset, weighting = 'log')

conText documentation built on Feb. 16, 2023, 7:32 p.m.