create_matrix: creates a document-term matrix.

Description Usage Arguments Author(s) Examples

Description

Creates an object of class DocumentTermMatrix from tm.

Usage

1
2
3
create_matrix(textColumns, language="english", minDocFreq=1, minWordLength=3, 
removeNumbers=TRUE, removePunctuation=TRUE, removeSparseTerms=0, removeStopwords=TRUE, 
stemWords=FALSE, stripWhitespace=TRUE, toLower=TRUE, weighting=weightTf)

Arguments

textColumns

Either character vector (e.g. data$Title) or a cbind() of columns to use for training the algorithms (e.g. cbind(data$Title,data$Subject)).

language

The language to be used for stemming the text data.

minDocFreq

The minimum number of times a word should appear in a document for it to be included in the matrix. See package tm for more details.

minWordLength

The minimum number of letters a word should contain to be included in the matrix. See package tm for more details.

removeNumbers

A logical parameter to specify whether to remove numbers.

removePunctuation

A logical parameter to specify whether to remove punctuation.

removeSparseTerms

See package tm for more details.

removeStopwords

A logical parameter to specify whether to remove stopwords using the language specified in language.

stemWords

A logical parameter to specify whether to stem words using the language specified in language.

stripWhitespace

A logical parameter to specify whether to strip whitespace.

toLower

A logical parameter to specify whether to make all text lowercase.

weighting

Either weightTf or weightTfIdf. See package tm for more details.

Author(s)

Timothy P. Jurka <tpjurka@ucdavis.edu>

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
library(sentiment)

# DEFINE THE DOCUMENTS
documents <- c("I am very happy, excited, and optimistic.",
				"I am very scared, annoyed, and irritated.",
				"Iraq's political crisis entered its second week one step closer to the potential 
				dissolution of the government, with a call for elections by a vital coalition partner 
				and a suicide attack that extended the spate of violence that has followed the withdrawal 
				of U.S. troops.",
				"With nightfall approaching, Los Angeles authorities are urging residents to keep their
				outdoor lights on as police and fire officials try to catch the person or people responsible 
				for nearly 40 arson fires in the last three days.")

matrix <- create_matrix(documents, language="english", removeNumbers=TRUE, 
stemWords=FALSE, weighting=weightTfIdf)

abhy/sentiment documentation built on May 10, 2019, 4:10 a.m.