dc_train: Train Document Classifer
In news-r/decipher: Natural Language Processing tools for R

Description Usage Arguments Details Examples

View source: R/Doccat.R

Train document classifier.

1	dc_train(model, lang, data)

`model`	Full path to Output model file.
`lang`	Language which is being processed.
`data`	a data.frame of classifed documents, see details and examples.

data is a data.frame of 2 columns:

class - the dodcument class
document - the document

Note that you need a 5'000 classified document to train a decent model. The examples below are just to demonstrate how to run the code.

## Not run: 
# get working directory
# need to pass full path
wd <- getwd()

data <- data.frame(
  class = c("Sport", "Business", "Sport", "Sport", "Business", "Politics", "Politics", "Politics"),
  doc = c("Football, tennis, golf and, bowling and, score.",
          "Marketing, Finance, Legal and, Administration.",
          "Tennis, Ski, Golf and, gym and, match.",
          "football, climbing and gym.",
          "Marketing, Business, Money and, Management.",
          "This document talks politics and Donal Trump.",
          "Donald Trump is the President of the US, sadly.",
          "Article about politics and president Trump.")
)

# Error not enough data
# model <- dc_train(model = paste0(wd, "/model.bin"), data = data, lang = "en")

# repeat data 50 times
# Obviously do not do that in te real world
data <- do.call("rbind", replicate(50, data[sample(nrow(data), 4),],
                                   simplify = FALSE))

# train model
model <- dc_train(model = paste0(wd, "/model.bin"), data = data, lang = "en")

## End(Not run)