Bootstrap_Data_Frame: A function for bootstraping textual data so that all levels...

View source: R/Personal_Functions.R

Bootstrap_Data_FrameR Documentation

A function for bootstraping textual data so that all levels have the same number of entries.

Description

This function takes a corpus and a set of labels and uses Bootstrap_Vocab to increase the size of each label until they are all the same length. Stop words are not bootstrapped.

Usage

Bootstrap_Data_Frame(text, tags, stopwords, min_length = 7, max_length = 15)

Arguments

text

text is the collection of textual data to bootstrap up.

tags

tags are the collection of tags that will be used to bootstrap. There should be one for every entry in 'text'. They do not have to be unique.

stopwords

stopwords to make sure are not apart of the bootstrapping process. It is advised to eliminate the most common words. See Stop_Word_Maker()

min_length

The shortest length allowable for bootstrapped words

max_length

The longest length allowable for bootstrapped words

Details

Most of the bootstrapped words will be nonseneical. The intention of this package is not to create new sentences, but to instead trick your model into thinking it has equal lengthed levels. This method is meant for bag of words style models.

Value

A data frame of your original documents along with the bootstrapped ones (column 1) along with their tags (column 2).

Author(s)

Travis Barton

Examples

test_set = c('I like cats', 'I like dogs', 'we love animals', 'I am a vet',
             'US politics bore me', 'I dont like to vote',
             'The rainbow looked nice today dont you think tommy')
test_tags = c('animals', 'animals', 'animals', 'animals',
             'politics', 'politics',
             'misc')

Bootstrap_Data_Frame(test_set, test_tags, c("I", "we"), min_length = 3, max_length = 8)

LilRhino documentation built on April 28, 2022, 1:06 a.m.