transform_dfm_boe: Generate a document-feature matrix using word embeddings

View source: R/rectr.R

transform_dfm_boeR Documentation

Generate a document-feature matrix using word embeddings

Description

This function generates document-feature matrix (dfm) from a multilingual corpus.

Usage

transform_dfm_boe(
  corpus,
  emb = NULL,
  .progress = TRUE,
  mode = "bert",
  noise = FALSE,
  remove_stopwords = TRUE,
  bert_sentence_tokenization = TRUE,
  envname = "rectr_condaenv",
  path = "./"
)

Arguments

corpus

a multilingual corpus generated by create_corpus()

emb

a list of word embeddings loaded from read_ft()

.progress

boolean, displaying a progress bar or not

mode

character, either 'bert' or 'fasttext'

noise

boolean, printing noise so that you know the transmation is progressing

remove_stopwords,

boolean, whether or not to remove stopwords

Value

a rectr_dfm object


chainsawriot/rectr documentation built on July 30, 2023, 2:30 p.m.