In cldatascience/tidygramr: Clean Text and Create Tidy n-grams Using Tools such as 'tidytext'

knitr::opts_chunk$set(echo = TRUE)

tidygramr

tidygramr is a collection of utility functions based on the tidytext package. The goal of tidygramr is to clean text and to prepare tidy n-gram models. The package is mainly based on examples from the tidytext package and related documentation.

License: MIT

Installation

You can install tidygramr from github using devtools:

library(devtools)
install_github("cldatascience/tidygramr")

Examples

Here are some basic examples outlining how to create n-gram models from Jane Austen's works (see janeaustenr). These examples replicate examples in the book Tidy Text Mining with R, but make use of utility functions in tidygramr to obtain the same results.

Create n-gram models:

library(janeaustenr)
library(tidygramr)
unigrams <- create_ngrams(austen_books(), "unigram")
bigrams <- create_ngrams(austen_books(), "bigram")
trigrams <- create_ngrams(austen_books(), "trigram")

Create a table of bigram frequencies (stop words removed):

library(tidytext)
library(janeaustenr)
library(tidygramr)
bigrams <- create_ngrams(austen_books(), "bigram", stopwords=stop_words)
bigram_freqs <- count_ngrams(bigrams, doc_title="book")
head(bigram_freqs)

Calculate tf-idf of bigrams (stop words removed):

library(tidytext)
library(janeaustenr)
library(tidygramr)
bigrams <- create_ngrams(austen_books(), "bigram", stopwords=stop_words)
bigram_tfidf <- create_tfidf(bigrams, doc_title="book")
head(bigram_tfidf)

For more information on tidy text mining, please see the excellent Tidy Text Mining with R.

cldatascience/tidygramr documentation built on May 10, 2019, 1:09 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

cldatascience/tidygramr
Clean Text and Create Tidy n-grams Using Tools such as 'tidytext'

In cldatascience/tidygramr: Clean Text and Create Tidy n-grams Using Tools such as 'tidytext'

tidygramr

Installation

Examples

R Package Documentation

Browse R Packages

We want your feedback!

cldatascience/tidygramr Clean Text and Create Tidy n-grams Using Tools such as 'tidytext'

In cldatascience/tidygramr: Clean Text and Create Tidy n-grams Using Tools such as 'tidytext'

tidygramr

Installation

Examples

R Package Documentation

Browse R Packages

We want your feedback!

cldatascience/tidygramr
Clean Text and Create Tidy n-grams Using Tools such as 'tidytext'