flickr_caption_dataset: Flickr Caption Datasets
In torchvision: Models, Datasets and Transformations for Images

flickr_caption_dataset

R Documentation

Flickr Caption Datasets

Description

Flickr8k Dataset

Usage

flickr8k_caption_dataset(
  root = tempdir(),
  train = TRUE,
  transform = NULL,
  target_transform = NULL,
  download = FALSE
)

flickr30k_caption_dataset(
  root = tempdir(),
  train = TRUE,
  transform = NULL,
  target_transform = NULL,
  download = FALSE
)

Arguments

`root`	Character. Root directory where the dataset will be stored under `root/flickr30k`.
`train`	: If `TRUE`, loads the training set. If `FALSE`, loads the test set. Default is `TRUE`.
`transform`	Optional function to transform input images after loading. Default is `NULL`.
`target_transform`	Optional function to transform labels. Default is `NULL`.
`download`	Logical. Whether to download the dataset if not found locally. Default is `FALSE`.

Details

The Flickr8k and Flickr30k collections are image captionning datasets composed of 8,000 and 30,000 color images respectively, each paired with five human-annotated captions. The images are in RGB format with varying spatial resolutions, and these datasets are widely used for training and evaluating vision-language models.

Value

A torch dataset of class flickr8k_caption_dataset. Each element is a named list:

x: a H x W x 3 integer array representing an RGB image.
y: a character vector containing all five captions associated with the image.

A torch dataset of class flickr30k_caption_dataset. Each element is a named list:

x: a H x W x 3 integer array representing an RGB image.
y: a character vector containing all five captions associated with the image.

Examples

## Not run: 
# Load the Flickr8k caption dataset
flickr8k <- flickr8k_caption_dataset(download = TRUE)

# Access the first item
first_item <- flickr8k[1]
first_item$x  # image array with shape {3, H, W}
first_item$y  # character vector containing five captions.

# Load the Flickr30k caption dataset
flickr30k <- flickr30k_caption_dataset(download = TRUE)

# Access the first item
first_item <- flickr30k[1]
first_item$x  # image array with shape {3, H, W}
first_item$y  # character vector containing five captions.

## End(Not run)

torchvision documentation built on Nov. 6, 2025, 9:07 a.m.