sample_text: Sample texts from a predefined text source

Description Usage Arguments Value Examples

Description

Performs text sampling. Requires input data in the form of raw texts.

Usage

1
2
3
4
sample_text(n = 1, source = "yelp_sentences", type = "sentences",
  sub_token = "words", max_length = 50, min_length = 1,
  word_list = NULL, shuffle = T, input = NULL, tbl = T,
  clean = T, ...)

Arguments

n

Number of texts to be sampled. n is an integer greater than 0. By default, n is set to 1.

source

Text source. A vector of characters, a data.frame, or an object of type Corpus. Alternatively, one can load a predefined dataset by specifiying a string. In the latter case, possible values are imdb_sentences, amazon_sentences, yelp_sentences and english_words.

type

Type of texts to be sampled. Possible values are texts, paragraphs, sentences, words, and characters.

sub_token

A string specifying the text unit for filtering texts by length via min_length and max_length. Possible values are texts, paragraphs, sentences, words, and characters.

max_length

Maximum length of the texts to be sampled. max_length is an integer greater than 0. By default, max_length is set to 1.

min_length

Minimum length of the texts to be sampled. min_length is an integer greater than 0. By default, min_length is set to 1.

word_list

A word list.

shuffle

If true, the text samples are returned in random order. Default is true.

input

A string defining the column name of the raw text data in source. The value is ignored if source is not of type dataframe.

tbl

If true, the output is returned as a tibble. Default: true.

clean

If true, the texts are cleaned before text sampling. Default is true.

...

Additional parameters passed to function for e.g. preprocessing.

Value

An object of class data.frame.

Examples

1
2
# Sample three sentences from Yelp reviews.
sample_text(n = 3, source = "yelp_sentences", type = "sentences")

nproellochs/textsampler documentation built on Nov. 4, 2019, 10:10 p.m.