knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)
set.seed(0)
library(dplyr)
library(tidytext)
library(tibble)

Text Sampling

Author: Nicolas Pröllochs
License: MIT

The textsampler R-Package samples texts from a predefined text source. This implementation uses tidy data principles and works seamlessly with existing text mining packages such as tm, tidytext, and rvest. In addition, it supplies multiple built-in text datasets for a hassle-free sampling of words, sentences, and texts.

Installation

You can easily install the latest development version of textsampler via GitHub.

# Install the development version from GitHub:
# install.packages("devtools")
devtools::install_github("nproellochs/textsampler")

Usage

This section shows the basic functionality of how to sample text from a predefined text source. First, load the corresponding package textsampler.

library(textsampler)

Quick demonstration

The following example shows how to sample sentences from a built-in database of texts. The result is a data frame containing five random sentences.

# Sample five sentences
sample_text(n = 5, type = "sentences")

Example: Sampling text from built-in text source

The following example shows how to sample words from a built-in text source ("english_words"). The result is a data frame containing five random words.

# Sample five words from english_words
sample_text(n = 5, type = "words", source = "english_words")

Example: Sampling text from website

The textsampler R-package works with tidy tools and can easily be combined with existing packages such as the rvest R-package. The following example shows how to sample texts from a website. Specifically, the example samples 15 famous quotes by Julius Ceasar.

library(rvest)
read_html("https://www.brainyquote.com/authors/julius-caesar-quotes/") %>%
  html_nodes(xpath = ".//a[contains(@class, 'b-qt qt_')]") %>%
  html_text() %>% 
  enframe() %>% 
  sample_text(n = 15, source = ., input = "value", min_length = 1, max_length = 40,
              shuffle = F, clean = T)

Example: Sampling text from vector source

The textsamplr R-package can be used to sample text from a vector source. The following example samples five random sentences from a book downloaded by the gutenbergr R-Package.

library(gutenbergr)
full_text <- gutenberg_download(5314)

textsampler::sample_text(n = 5, source = full_text$text[1:1000], type = "sentences", shuffle = T)

Example: Sampling text data with specific text characteristics

The textsamplr R-package allows one to sample texts with specific text characteristics. The following example samples three sentences from Amazon reviews, all of which have a maximum length of 5 words and contain the word 'great'.

sample_text(n = 5, source = "amazon_sentences", type = "sentences", 
            max_length = 5, word_list = c("great"))

Contributing

If you experience any difficulties with the package, or have suggestions, or want to contribute directly, you have the following options:

License

textsampler is released under the MIT License

Copyright (c) 2019 Nicolas Pröllochs



nproellochs/textsampler documentation built on Nov. 4, 2019, 10:10 p.m.