View source: R/dataset_ag_news.R
dataset_ag_news | R Documentation |
The AG's news topic classification dataset is constructed by choosing 4 largest classes from the original corpus. Each class contains 30,000 training samples and 1,900 testing samples. The total number of training samples is 120,000 and testing 7,600. Version 3, Updated 09/09/2015
dataset_ag_news(
dir = NULL,
split = c("train", "test"),
delete = FALSE,
return_path = FALSE,
clean = FALSE,
manual_download = FALSE
)
dir |
Character, path to directory where data will be stored. If
|
split |
Character. Return training ("train") data or testing ("test") data. Defaults to "train". |
delete |
Logical, set |
return_path |
Logical, set |
clean |
Logical, set |
manual_download |
Logical, set |
The classes in this dataset are
World
Sports
Business
Sci/Tech
A tibble with 120,000 or 30,000 rows for "train" and "test" respectively and 3 variables:
Character, denoting new class
Character, title of article
Character, description of article
http://groups.di.unipi.it/~gulli/AG_corpus_of_news_articles.html
https://github.com/srhrshr/torchDatasets/raw/master/dbpedia_csv.tar.gz
Other topic:
dataset_dbpedia()
,
dataset_trec()
## Not run:
dataset_ag_news()
# Custom directory
dataset_ag_news(dir = "data/")
# Deleting dataset
dataset_ag_news(delete = TRUE)
# Returning filepath of data
dataset_ag_news(return_path = TRUE)
# Access both training and testing dataset
train <- dataset_ag_news(split = "train")
test <- dataset_ag_news(split = "test")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.