dc: Document classifier

Description Usage Arguments Examples

View source: R/Doccat.R

Description

Classify document.

Usage

1
2
3
dc_(model, documents, output = NULL)

dc(model, documents)

Arguments

model

Model to use, generally returned by dc_train.

documents

Documents to classify.

output

Full path to output file.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
## Not run: 
# get working directory
# need to pass full path
wd <- getwd()

data <- data.frame(
  class = c("Sport", "Business", "Sport", "Sport", "Business", "Politics", "Politics", "Politics"),
  doc = c("Football, tennis, golf and, bowling and, score.",
          "Marketing, Finance, Legal and, Administration.",
          "Tennis, Ski, Golf and, gym and, match.",
          "football, climbing and gym.",
          "Marketing, Business, Money and, Management.",
          "This document talks politics and Donal Trump.",
          "Donald Trump is the President of the US, sadly.",
          "Article about politics and president Trump.")
)

# repeat data 50 times
# Obviously do not do that in te real world
data <- do.call("rbind", replicate(20, data[sample(nrow(data), 3),],
                                   simplify = FALSE))

# train model
model <- dc_train(paste0(wd, "/classifier.bin"),"en", data)

# create documents to classify
documents <- data.frame(
  docs = c("This discusses golf which is a sport.",
           "This document is about business administration.",
           "This is about people who do sport, go to the gym and play tennis.",
           "Some play tennis and work in Finance",
           "This documents discusses finance and money management.")
)

# classify documents
classified <- dc(model, documents)
cat(classified)

## End(Not run)

news-r/decipher documentation built on July 19, 2019, 5:58 p.m.