Description Usage Arguments Details Examples
Train document classifier.
1 |
model |
Full path to Output model file. |
lang |
Language which is being processed. |
data |
a data.frame of classifed documents, see details and examples. |
data
is a data.frame of 2 columns:
class - the dodcument class
document - the document
Note that you need a 5'000 classified document to train a decent model. The examples below are just to demonstrate how to run the code.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | ## Not run:
# get working directory
# need to pass full path
wd <- getwd()
data <- data.frame(
class = c("Sport", "Business", "Sport", "Sport", "Business", "Politics", "Politics", "Politics"),
doc = c("Football, tennis, golf and, bowling and, score.",
"Marketing, Finance, Legal and, Administration.",
"Tennis, Ski, Golf and, gym and, match.",
"football, climbing and gym.",
"Marketing, Business, Money and, Management.",
"This document talks politics and Donal Trump.",
"Donald Trump is the President of the US, sadly.",
"Article about politics and president Trump.")
)
# Error not enough data
# model <- dc_train(model = paste0(wd, "/model.bin"), data = data, lang = "en")
# repeat data 50 times
# Obviously do not do that in te real world
data <- do.call("rbind", replicate(50, data[sample(nrow(data), 4),],
simplify = FALSE))
# train model
model <- dc_train(model = paste0(wd, "/model.bin"), data = data, lang = "en")
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.