knitr::opts_chunk$set(echo = TRUE)
tidygramr
is a collection of utility functions based on the tidytext package. The goal of tidygramr
is to clean text and to prepare tidy n-gram models. The package is mainly based on examples from the tidytext
package and related documentation.
License: MIT
You can install tidygramr
from github using devtools:
library(devtools) install_github("cldatascience/tidygramr")
Here are some basic examples outlining how to create n-gram models from
Jane Austen's works (see
janeaustenr).
These examples replicate examples in the book
Tidy Text Mining with R,
but make use of utility functions in tidygramr
to obtain the same results.
Create n-gram models:
library(janeaustenr) library(tidygramr) unigrams <- create_ngrams(austen_books(), "unigram") bigrams <- create_ngrams(austen_books(), "bigram") trigrams <- create_ngrams(austen_books(), "trigram")
Create a table of bigram frequencies (stop words removed):
library(tidytext) library(janeaustenr) library(tidygramr) bigrams <- create_ngrams(austen_books(), "bigram", stopwords=stop_words) bigram_freqs <- count_ngrams(bigrams, doc_title="book") head(bigram_freqs)
Calculate tf-idf of bigrams (stop words removed):
library(tidytext) library(janeaustenr) library(tidygramr) bigrams <- create_ngrams(austen_books(), "bigram", stopwords=stop_words) bigram_tfidf <- create_tfidf(bigrams, doc_title="book") head(bigram_tfidf)
For more information on tidy text mining, please see the excellent Tidy Text Mining with R.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.