coco_caption_dataset: COCO Caption Dataset

coco_caption_datasetR Documentation

COCO Caption Dataset

Description

Loads the MS COCO dataset for image captioning.

Usage

coco_caption_dataset(
  root = tempdir(),
  train = TRUE,
  year = c("2014"),
  download = FALSE,
  transform = NULL,
  target_transform = NULL
)

Arguments

root

Root directory where the dataset is stored or will be downloaded to.

train

Logical. If TRUE, loads the training split; otherwise, loads the validation split.

year

Character. Dataset version year. One of "2014".

download

Logical. If TRUE, downloads the dataset if it's not already present in the root directory.

transform

Optional transform function applied to the image.

target_transform

Optional transform function applied to the target (labels, boxes, etc.).

Value

An object of class coco_caption_dataset. Each item is a list:

  • x: an ⁠(H, W, C)⁠ numeric array containing the RGB image.

  • y: a character string with the image caption.

See Also

Other caption_dataset: flickr_caption_dataset

Examples

## Not run: 
ds <- coco_caption_dataset(
  train = FALSE,
  download = TRUE
)
example <- ds[1]

# Access image and caption
x <- example$x
y <- example$y

# Prepare image for plotting
image_array <- as.numeric(x)
dim(image_array) <- dim(x)

plot(as.raster(image_array))
title(main = y, col.main = "black")

## End(Not run)

torchvision documentation built on Aug. 8, 2025, 7:27 p.m.