Bootstrap_Data_Frame: A function for bootstraping textual data so that all levels...

Description Usage Arguments Details Value Author(s) Examples

View source: R/Personal_Functions.R

Description

This function takes a corpus and a set of labels and uses Bootstrap_Vocab to increase the size of each label until they are all the same length. Stop words are not bootstrapped.

Usage

1
Bootstrap_Data_Frame(text, tags, stopwords, min_length = 7, max_length = 15)

Arguments

text

text is the collection of textual data to bootstrap up.

tags

tags are the collection of tags that will be used to bootstrap. There should be one for every entry in 'text'. They do not have to be unique.

stopwords

stopwords to make sure are not apart of the bootstrapping process. It is advised to eliminate the most common words. See Stop_Word_Maker()

min_length

The shortest length allowable for bootstrapped words

max_length

The longest length allowable for bootstrapped words

Details

Most of the bootstrapped words will be nonseneical. The intention of this package is not to create new sentences, but to instead trick your model into thinking it has equal lengthed levels. This method is meant for bag of words style models.

Value

A data frame of your original documents along with the bootstrapped ones (column 1) along with their tags (column 2).

Author(s)

Travis Barton

Examples

1
2
3
4
5
6
7
8
test_set = c('I like cats', 'I like dogs', 'we love animals', 'I am a vet',
             'US politics bore me', 'I dont like to vote',
             'The rainbow looked nice today dont you think tommy')
test_tags = c('animals', 'animals', 'animals', 'animals',
             'politics', 'politics',
             'misc')

Bootstrap_Data_Frame(test_set, test_tags, c("I", "we"), min_length = 3, max_length = 8)

LilRhino documentation built on Oct. 31, 2019, 4:59 p.m.