map_to_dtm: Create a document term matrix using a vectoriser

Description Usage Arguments Value

View source: R/map-to-dtm.R

Description

This function uses a vectoriser created with the text2vec package to map a new piece of text, or vector of text, onto a document term matrix (dtm). The vectoriser has a concept of a vocabulary, a set of tokens which determine the columns of the resulting document term matrix. Any term that doesn't match to a token in the vocabulary will be ignored. Optionally, the document term matrix can be weighted by a term frequency-inverse document frequency (tfidf) object, created with the text2vec::TfIdf function.

Usage

1
map_to_dtm(x, vectoriser, tfidf = NULL)

Arguments

x

A character or vector of characters, usually sentences, paragraphs or similar pieces of natural language.

vectoriser

A vectoriser constructed with the text2vec package.

tfidf

A tfidf object constructed with the text2vec package. If no tfidf is NULL, then an unweighted document term matrix will be returned.

Value

A document-term matrix with rows representing the textual objects in x, and columns representing the tokens in the vocabulary used to generate the given vectoriser.


mdneuzerling/DrakeModelling documentation built on June 26, 2020, 1:25 p.m.