create_container: creates a container for training, classifying, and analyzing...

Description Usage Arguments Value Author(s) Examples

View source: R/create_container.R

Description

Given a DocumentTermMatrix from the tm package and corresponding document labels, creates a container of class matrix_container-class that can be used for training and classification (i.e. train_model, train_models, classify_model, classify_models)

Usage

1
create_container(matrix, labels, trainSize=NULL, testSize=NULL, virgin)

Arguments

matrix

A document-term matrix of class DocumentTermMatrix or TermDocumentMatrix from the tm package, or generated by create_matrix.

labels

A factor or vector of labels corresponding to each document in the matrix.

trainSize

A range (e.g. 1:1000) specifying the number of documents to use for training the models. Can be left blank for classifying corpora using saved models that don't need to be trained.

testSize

A range (e.g. 1:1000) specifying the number of documents to use for classification. Can be left blank for training on all data in the matrix.

virgin

A logical (TRUE or FALSE) specifying whether to treat the classification data as virgin data or not.

Value

A container of class matrix_container-class that can be passed into other functions such as train_model, train_models, classify_model, classify_models, and create_analytics.

Author(s)

Timothy P. Jurka <tpjurka@ucdavis.edu>, Loren Collingwood <loren.collingwood@gmail.com>

Examples

1
2
3
4
5
6
7
library(RTextTools)
data(NYTimes)
data <- NYTimes[sample(1:3100,size=100,replace=FALSE),]
matrix <- create_matrix(cbind(data["Title"],data["Subject"]), language="english", 
removeNumbers=TRUE, stemWords=FALSE, weighting=tm::weightTfIdf)
container <- create_container(matrix,data$Topic.Code,trainSize=1:75, testSize=76:100, 
virgin=FALSE)

Example output

Loading required package: SparseM

Attaching package: 'SparseM'

The following object is masked from 'package:base':

    backsolve

RTextTools documentation built on April 26, 2020, 9:05 a.m.