vectorize.words: Word vectorization
In fdm2id: Data Mining and R Programming for Beginners

vectorize.words

R Documentation

Word vectorization

Description

Vectorize words from a corpus of documents.

Usage

vectorize.words(
  corpus = NULL,
  ndim = 50,
  maxwords = NULL,
  mincount = 5,
  minphrasecount = NULL,
  window = 5,
  maxcooc = 10,
  maxiter = 10,
  epsilon = 0.01,
  lang = "en",
  stopwords = lang,
  ...
)

Arguments

`corpus`	The corpus of documents (a vector of characters).
`ndim`	The number of dimensions of the vector space.
`maxwords`	The maximum number of words.
`mincount`	Minimum word count to be considered as frequent.
`minphrasecount`	Minimum collocation of words count to be considered as frequent.
`window`	Window for term-co-occurence matrix construction.
`maxcooc`	Maximum number of co-occurrences to use in the weighting function.
`maxiter`	The maximum number of iteration to fit the GloVe model.
`epsilon`	Defines early stopping strategy when fit the GloVe model.
`lang`	The language of the documents (NULL if no stemming).
`stopwords`	Stopwords, or the language of the documents. NULL if stop words should not be removed.
`...`	Other parameters.

Value

The vectorized words.

Examples

## Not run: 
text = loadtext ("http://mattmahoney.net/dc/text8.zip")
words = vectorize.words (text, minphrasecount = 50)
query.words (words, origin = "paris", sub = "france", add = "germany")
query.words (words, origin = "berlin", sub = "germany", add = "france")
query.words (words, origin = "new_zealand")

## End(Not run)

fdm2id documentation built on July 9, 2023, 6:05 p.m.