flickr_caption_dataset | R Documentation |
Flickr8k Dataset
flickr8k_caption_dataset(
root = tempdir(),
train = TRUE,
transform = NULL,
target_transform = NULL,
download = FALSE
)
flickr30k_caption_dataset(
root = tempdir(),
train = TRUE,
transform = NULL,
target_transform = NULL,
download = FALSE
)
root |
Character. Root directory where the dataset will be stored under |
train |
: If |
transform |
Optional function to transform input images after loading. Default is |
target_transform |
Optional function to transform labels. Default is |
download |
Logical. Whether to download the dataset if not found locally. Default is |
The Flickr8k and Flickr30k collections are image captionning datasets composed of 8,000 and 30,000 color images respectively, each paired with five human-annotated captions. The images are in RGB format with varying spatial resolutions, and these datasets are widely used for training and evaluating vision-language models.
A torch dataset of class flickr8k_caption_dataset
.
Each element is a named list:
x
: a H x W x 3 integer array representing an RGB image.
y
: a character vector containing all five captions associated with the image.
A torch dataset of class flickr30k_caption_dataset
.
Each element is a named list:
x
: a H x W x 3 integer array representing an RGB image.
y
: a character vector containing all five captions associated with the image.
Other caption_dataset:
coco_caption_dataset()
## Not run:
# Load the Flickr8k caption dataset
flickr8k <- flickr8k_caption_dataset(download = TRUE)
# Access the first item
first_item <- flickr8k[1]
first_item$x # image array with shape {3, H, W}
first_item$y # character vector containing five captions.
# Load the Flickr30k caption dataset
flickr30k <- flickr30k_caption_dataset(download = TRUE)
# Access the first item
first_item <- flickr30k[1]
first_item$x # image array with shape {3, H, W}
first_item$y # character vector containing five captions.
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.