create_dtm: Create a document-term matrix

Description Usage Arguments Value Examples

Description

A function to create a document-term matrix from a corpus It is a convenience function based around tidytext::cast_dtm The format of the returned data frame is intended to be suitable as input to machine learning tasks

Usage

1
create_dtm(corpus, filterwords, stop = TRUE, doc_title = "title")

Arguments

corpus

A data frame containing columns for title and text

filterwords

A data frame containing words to filter on

stop

A boolean denoting whether to use filterwords as top words

doc_title

The column name containing document title

Value

A data frame of a document term matrix.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
## Not run: 
library(tidytext)
books <- data.frame(title = c("Book A", "Book B", "Book C"), 
                    text = c("Once upon a time", "A long time ago", 
                    "In a land far away"),
                    stringsAsFactors=FALSE)
book_dtm <- create_dtm(books)
book_dtm_1 <- create_dtm(books,filterwords=stop_words)

## End(Not run)

cldatascience/tidygramr documentation built on May 10, 2019, 1:09 a.m.