dataset_dbpedia: DBpedia Ontology Dataset
In textdata: Download and Load Various Text Datasets

dataset_dbpedia

R Documentation

DBpedia Ontology Dataset

Description

DBpedia ontology dataset classification dataset. It contains 560,000 training samples and 70,000 testing samples for each of 14 nonoverlapping classes from DBpedia.

Usage

dataset_dbpedia(
  dir = NULL,
  split = c("train", "test"),
  delete = FALSE,
  return_path = FALSE,
  clean = FALSE,
  manual_download = FALSE
)

Arguments

`dir`	Character, path to directory where data will be stored. If `NULL`, user_cache_dir will be used to determine path.
`split`	Character. Return training ("train") data or testing ("test") data. Defaults to "train".
`delete`	Logical, set `TRUE` to delete dataset.
`return_path`	Logical, set `TRUE` to return the path of the dataset.
`clean`	Logical, set `TRUE` to remove intermediate files. This can greatly reduce the size. Defaults to FALSE.
`manual_download`	Logical, set `TRUE` if you have manually downloaded the file and placed it in the folder designated by running this function with `return_path = TRUE`.

Details

The classes are

Company
EducationalInstitution
Artist
Athlete
OfficeHolder
MeanOfTransportation
Building
NaturalPlace
Village
Animal
Plant
Album
Film
WrittenWork

Value

A tibble with 560,000 or 70,000 rows for "train" and "test" respectively and 3 variables:

class: Character, denoting the class class
title: Character, title of article
description: Character, description of article

Source

https://papers.nips.cc/paper/5782-character-level-convolutional-networks-for-text-classification.pdf

https://www.dbpedia.org/

https://github.com/srhrshr/torchDatasets/raw/master/dbpedia_csv.tar.gz

Examples

## Not run: 
dataset_dbpedia()

# Custom directory
dataset_dbpedia(dir = "data/")

# Deleting dataset
dataset_dbpedia(delete = TRUE)

# Returning filepath of data
dataset_dbpedia(return_path = TRUE)

# Access both training and testing dataset
train <- dataset_dbpedia(split = "train")
test <- dataset_dbpedia(split = "test")

## End(Not run)

textdata documentation built on May 29, 2024, 2:57 a.m.