knitr::opts_chunk$set(echo = TRUE)

tidygramr Build Status Codecov branch

tidygramr is a collection of utility functions based on the tidytext package. The goal of tidygramr is to clean text and to prepare tidy n-gram models. The package is mainly based on examples from the tidytext package and related documentation.

License: MIT

Installation

You can install tidygramr from github using devtools:

library(devtools)
install_github("cldatascience/tidygramr")

Examples

Here are some basic examples outlining how to create n-gram models from Jane Austen's works (see janeaustenr). These examples replicate examples in the book Tidy Text Mining with R, but make use of utility functions in tidygramr to obtain the same results.

Create n-gram models:

library(janeaustenr)
library(tidygramr)
unigrams <- create_ngrams(austen_books(), "unigram")
bigrams <- create_ngrams(austen_books(), "bigram")
trigrams <- create_ngrams(austen_books(), "trigram")

Create a table of bigram frequencies (stop words removed):

library(tidytext)
library(janeaustenr)
library(tidygramr)
bigrams <- create_ngrams(austen_books(), "bigram", stopwords=stop_words)
bigram_freqs <- count_ngrams(bigrams, doc_title="book")
head(bigram_freqs)

Calculate tf-idf of bigrams (stop words removed):

library(tidytext)
library(janeaustenr)
library(tidygramr)
bigrams <- create_ngrams(austen_books(), "bigram", stopwords=stop_words)
bigram_tfidf <- create_tfidf(bigrams, doc_title="book")
head(bigram_tfidf)

For more information on tidy text mining, please see the excellent Tidy Text Mining with R.



cldatascience/tidygramr documentation built on May 10, 2019, 1:09 a.m.