HF_load_dataset: Load_dataset
In fastai: Interface to 'fastai'

HF_load_dataset

R Documentation

Load_dataset

Description

Load a dataset

Usage

HF_load_dataset(
  path,
  name = NULL,
  data_dir = NULL,
  data_files = NULL,
  split = NULL,
  cache_dir = NULL,
  features = NULL,
  download_config = NULL,
  download_mode = NULL,
  ignore_verifications = FALSE,
  save_infos = FALSE,
  script_version = NULL,
  ...
)

Arguments

`path`	path
`name`	name
`data_dir`	dataset dir
`data_files`	dataset files
`split`	split
`cache_dir`	cache directory
`features`	features
`download_config`	download configuration
`download_mode`	download mode
`ignore_verifications`	ignore verifications or not
`save_infos`	save information or not
`script_version`	script version
`...`	additional arguments

Details

This method does the following under the hood: 1. Download and import in the library the dataset loading script from “path“ if it's not already cached inside the library. Processing scripts are small python scripts that define the citation, info and format of the dataset, contain the URL to the original data files and the code to load examples from the original data files. You can find some of the scripts here: https://github.com/huggingface/datasets/datasets and easily upload yours to share them using the CLI “datasets-cli“. 2. Run the dataset loading script which will: * Download the dataset file from the original URL (see the script) if it's not already downloaded and cached. * Process and cache the dataset in typed Arrow tables for caching. Arrow table are arbitrarily long, typed tables which can store nested objects and be mapped to numpy/pandas/python standard types. They can be directly access from drive, loaded in RAM or even streamed over the web. 3. Return a dataset build from the requested splits in “split“ (default: all).