classify.dtm: Creates a Document/Term Matrix

Description Usage Arguments Examples

View source: R/classify.R

Description

Provides a utility function used to create a document/term matrix using the package tm

Usage

1
2
3
4
classify.dtm(sentences, language = "english", minDocFreq = 1,
  minWordLength = 4, removeNumbers = TRUE, removePunctuation = TRUE,
  removeStopwords = TRUE, stemWords = FALSE, stripWhitespace = TRUE,
  toLower = TRUE, weighting = weightTf)

Arguments

sentences

a character vector of sentences to use for training

language

the language to use for word stemming

minDocFreq

the minimum number of times a word is needed in a document to be included in the analysis

minWordLength

the minimnum word length for inclusion in analysis

removeNumbers

a boolean regarding whether numbers whould be removed

removeStopwords

a boolean specifying whether stopwords in the specified language should be removed

stemWords

a boolean specying whether words should be stemmed to their root form in the language specified

stripWhitespace

a boolean indicating whether whitespace should be stripped

toLower

a boolean indicating whether all words should be transformed to their lowercase representations

weighting

either weightTf or weightIfIdf (see the package tm for details)

removePuncutation

a boolean specifying whether to remove puncutation

Examples

1
classify.dtm(c("I am happy", "I am sad", "I am miserable", "The weather is good today"))

mananshah99/sentR documentation built on May 21, 2019, 11:23 a.m.