formatProcureText: Formats procurement text into a term document matrix

Description Usage Arguments Examples

View source: R/formatProcureText.R

Description

Formats procurement text into a term document matrix

Usage

1
formatProcureText(procure, text.var)

Arguments

procure
text.var

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
##---- Should be DIRECTLY executable !! ----
##-- ==>  Define data, use random,
##--	or do  help(data=index)  for the standard data sets.

## The function is currently defined as
function (procure, text.var) 
{
    TrigramTokenizer <- function(x) NGramTokenizer(x, Weka_control(min = 1, 
        max = 2))
    text <- mapply(paste, procure[, text.var], collapse = " ")
    text <- stripWhitespace(text)
    text <- removePunctuation(text)
    text <- tolower(text)
    text <- Corpus(VectorSource(text))
    text <- tm_map(text, removeWords, c("the", stopwords("english")))
    text <- tm_map(text, removeNumbers)
    dtm <- DocumentTermMatrix(text, control = list(weighting = weightTf, 
        tokenize = TrigramTokenizer))
    return(dtm)
  }

jon-mellon/procureClassify documentation built on May 19, 2019, 7:26 p.m.