knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-" )
set.seed(0) library(dplyr) library(tidytext) library(tibble)
Author: Nicolas Pröllochs
License: MIT
The textsampler R-Package samples texts from a predefined text source. This implementation uses tidy data principles and works seamlessly with existing text mining packages such as tm, tidytext, and rvest. In addition, it supplies multiple built-in text datasets for a hassle-free sampling of words, sentences, and texts.
You can easily install the latest development version of textsampler via GitHub.
# Install the development version from GitHub: # install.packages("devtools") devtools::install_github("nproellochs/textsampler")
This section shows the basic functionality of how to sample text from a predefined text source. First, load the corresponding package textsampler.
library(textsampler)
The following example shows how to sample sentences from a built-in database of texts. The result is a data frame containing five random sentences.
# Sample five sentences sample_text(n = 5, type = "sentences")
The following example shows how to sample words from a built-in text source ("english_words"). The result is a data frame containing five random words.
# Sample five words from english_words sample_text(n = 5, type = "words", source = "english_words")
The textsampler R-package works with tidy tools and can easily be combined with existing packages such as the rvest R-package. The following example shows how to sample texts from a website. Specifically, the example samples 15 famous quotes by Julius Ceasar.
library(rvest) read_html("https://www.brainyquote.com/authors/julius-caesar-quotes/") %>% html_nodes(xpath = ".//a[contains(@class, 'b-qt qt_')]") %>% html_text() %>% enframe() %>% sample_text(n = 15, source = ., input = "value", min_length = 1, max_length = 40, shuffle = F, clean = T)
The textsamplr R-package can be used to sample text from a vector source. The following example samples five random sentences from a book downloaded by the gutenbergr R-Package.
library(gutenbergr) full_text <- gutenberg_download(5314) textsampler::sample_text(n = 5, source = full_text$text[1:1000], type = "sentences", shuffle = T)
The textsamplr R-package allows one to sample texts with specific text characteristics. The following example samples three sentences from Amazon reviews, all of which have a maximum length of 5 words and contain the word 'great'.
sample_text(n = 5, source = "amazon_sentences", type = "sentences", max_length = 5, word_list = c("great"))
If you experience any difficulties with the package, or have suggestions, or want to contribute directly, you have the following options:
textsampler is released under the MIT License
Copyright (c) 2019 Nicolas Pröllochs
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.