View source: R/Personal_Functions.R
Bootstrap_Data_Frame | R Documentation |
This function takes a corpus and a set of labels and uses Bootstrap_Vocab to increase the size of each label until they are all the same length. Stop words are not bootstrapped.
Bootstrap_Data_Frame(text, tags, stopwords, min_length = 7, max_length = 15)
text |
text is the collection of textual data to bootstrap up. |
tags |
tags are the collection of tags that will be used to bootstrap. There should be one for every entry in 'text'. They do not have to be unique. |
stopwords |
stopwords to make sure are not apart of the bootstrapping process. It is advised to eliminate the most common words. See Stop_Word_Maker() |
min_length |
The shortest length allowable for bootstrapped words |
max_length |
The longest length allowable for bootstrapped words |
Most of the bootstrapped words will be nonseneical. The intention of this package is not to create new sentences, but to instead trick your model into thinking it has equal lengthed levels. This method is meant for bag of words style models.
A data frame of your original documents along with the bootstrapped ones (column 1) along with their tags (column 2).
Travis Barton
test_set = c('I like cats', 'I like dogs', 'we love animals', 'I am a vet', 'US politics bore me', 'I dont like to vote', 'The rainbow looked nice today dont you think tommy') test_tags = c('animals', 'animals', 'animals', 'animals', 'politics', 'politics', 'misc') Bootstrap_Data_Frame(test_set, test_tags, c("I", "we"), min_length = 3, max_length = 8)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.