CorpusStudio: CorpusStudio
In DecisionScients/NLPStudio: Natural Language Processing in Objected Oriented R Environment

Description Usage Arguments Format Details Author(s) See Also

CorpusStudio Creates a Corpus object then prepares it for cross-validation downstream.

1	CorpusStudio

`x`	a series of character vectors, each containing the text for a single document, a FileSet object containing .txt files, a character string containing the directory holding .txt files, a quanteda corpus object, or a tm VCorpus, or tm SimpleCorpus object.#'
`name`	Character string containing the name to assign to the final CVSet or CVSetKFold object.
`cv`	The type of cross-validation product to deliver. Valid values are c('standard', 'kFold'). The default is standard and one letter abbreviations are acceptable.
`textConfig`	a TextConfig object which encapsulates the text cleaning configuration.
`n`	Numeric parameter used by the sample method. It contains the number of samples to obtain from the Corpus or the proportion of the Corpus to sample prior to splitting into cross-validation set(s).
`k`	Numeric. If 'cv' is 'kFold', this number indicates the number of folds to produce.
`stratify`	Logical. If TRUE (default), splits and sampling will be stratefied.
`replace`	Logical. If TRUE, sampling is conducted with replacement. The default is FALSE.
`train`	Numeric indicating the proportion of the Corpus to allocate to the training set. Acceptable values are between 0 and 1. The total of the values for the train, validation and test parameters must equal 1.
`validation`	Numeric indicating the proportion of the Corpus to allocate to the validation set. Acceptable values are between 0 and 1. The total of the values for the train, validation and test parameters must equal 1.
`test`	Numeric indicating the proportion of the Corpus to allocate to the test set. Acceptable values are between 0 and 1. The total of the values for the train, validation and test parameters must equal 1.
`seed`	Numeric used to initialize a pseudorandom number generator.

An object of class R6ClassGenerator of length 24.

Class responsible for creating, cleaning, sampling, splitting and constructing the cross-validation object that will be used by downstream modeling classes. This is performed in five states.

The first stage builds the corpus object from one of several sources: a directory source, a FileSet object, a TM Corpus object, or a quanteda corpus object. The second stage is optional and reshapes the Corpus object into word, sentence or paragraph units. The third stage, the Sampling Stage takes a stratified or non-stratfified sampling from the Corpus object. The forth stage, the cross-validation stage, produces one of two cross-validation objects: a CVSet object, which is comprised of a training, test and optional validation set, or a CVSetKFold object which contains k-folds, each comprised of a training and test set. The cross-validation product is the final product and forms the data basis for the modeling phase.

John James, jjames@dataScienceSalon.org

Other CorpusStudio Family of Classes: KFold, Sample0, Sample, Segment, Split, TokenizerNLP, TokenizerQ, Tokenizer, Token

DecisionScients/NLPStudio documentation built on May 15, 2019, 12:51 p.m.

DecisionScients/NLPStudio index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

DecisionScients/NLPStudio
Natural Language Processing in Objected Oriented R Environment

CorpusStudio: CorpusStudio
In DecisionScients/NLPStudio: Natural Language Processing in Objected Oriented R Environment

Description

Usage

Arguments

Format

Details

Author(s)

See Also

Related to CorpusStudio in DecisionScients/NLPStudio...

R Package Documentation

Browse R Packages

We want your feedback!

DecisionScients/NLPStudio Natural Language Processing in Objected Oriented R Environment

CorpusStudio: CorpusStudio In DecisionScients/NLPStudio: Natural Language Processing in Objected Oriented R Environment

Description

Usage

Arguments

Format

Details

Author(s)

See Also

Related to CorpusStudio in DecisionScients/NLPStudio...

R Package Documentation

Browse R Packages

We want your feedback!

DecisionScients/NLPStudio
Natural Language Processing in Objected Oriented R Environment

CorpusStudio: CorpusStudio
In DecisionScients/NLPStudio: Natural Language Processing in Objected Oriented R Environment