map_to_dtm: Create a document term matrix using a vectoriser

Description Usage Arguments Value

View source: R/map-to-dtm.R

Description

This function uses a vectoriser created with the text2vec package to map a new piece of text, or vector of text, onto a document term matrix (dtm). The vectoriser has a concept of a vocabulary, a set of tokens which determine the columns of the resulting document term matrix. Any term that doesn't match to a token in the vocabulary will be ignored. Optionally, the document term matrix can be weighted by a term frequency-inverse document frequency (tfidf) object, created with the text2vec::TfIdf function.

Usage

1
2
3
4
5
map_to_dtm(
  x,
  vectoriser = ModelAsAPackage::vectoriser,
  tfidf = ModelAsAPackage::tfidf
)

Arguments

x

A character or vector of characters, usually sentences, paragraphs or similar pieces of natural language.

vectoriser

A vectoriser constructed with the text2vec package. By default, the vectoriser from the built package will be used.

tfidf

A tfidf object constructed with the text2vec package. By default, the tfidf from the built package will be used. If no tfidf is NULL, then an unweighted document term matrix will be returned.

Value

A document-term matrix with rows representing the textual objects in x, and columns representing the tokens in the vocabulary used to generate the given vectoriser.


mdneuzerling/ModelAsAPackage documentation built on Feb. 1, 2020, 12:57 a.m.