spod_available_data: Get available data list

View source: R/available-data.R

spod_available_dataR Documentation

Get available data list

Description

[Stable]

Get a table with links to available data files for the specified data version. Optionally check (see arguments) the file size and availability of data files previously downloaded into the cache directory specified with SPANISH_OD_DATA_DIR environment variable (set by spod_set_data_dir()) or a custom path specified with data_dir argument. By default the data is fetched from Amazon S3 bucket where the data is stored. If that fails, the function falls back to downloading an XML file from the Spanish Ministry of Transport website. You can also control this behaviour with use_s3 argument.

Usage

spod_available_data(
  ver = 2,
  check_local_files = FALSE,
  quiet = FALSE,
  data_dir = spod_get_data_dir(),
  use_s3 = TRUE,
  force = FALSE
)

Arguments

ver

Integer. Can be 1 or 2. The version of the data to use. v1 spans 2020-2021, v2 covers 2022 and onwards. See more details in codebooks with spod_codebook().

check_local_files

Logical. Whether to check if the local files exist and get the file size. Defaults to FALSE.

quiet

A logical value indicating whether to suppress messages. Default is FALSE.

data_dir

The directory where the data is stored. Defaults to the value returned by spod_get_data_dir().

use_s3

[Experimental] Logical. If TRUE, use Amazon S3 to get available data list, which does not require downloading the XML file and caching it locally, which may be a bit faster. If FALSE, use the XML file to get available data list.

force

Logical. If TRUE, force re-download of metadata. For Amazon S3 this queries the S3 bucket for the XML file it re-downloads. If FALSE, only update the available data list if it is older than 1 day.

Value

A tibble with links, release dates of files in the data, dates of data coverage, local paths to files, and the download status.

target_url

character. The URL link to the data file.

pub_ts

POSIXct. The timestamp of when the file was published.

file_extension

character. The file extension of the data file (e.g., 'tar', 'gz').

data_ym

Date. The year and month of the data coverage, if available.

data_ymd

Date. The specific date of the data coverage, if available.

study

factor. Study category derived from the URL (e.g., 'basic', 'complete', 'routes').

type

factor. Data type category derived from the URL (e.g., 'number_of_trips', 'origin-destination', 'overnight_stays', 'data_quality', 'metadata').

period

factor. Temporal granularity category derived from the URL (e.g., 'day', 'month').

zones

factor. Geographic zone classification derived from the URL (e.g., 'districts', 'municipalities', 'large_urban_areas').

local_path

character. The local file path where the data is (or going to be) stored.

downloaded

logical. Indicator of whether the data file has been downloaded locally. This is only available if check_local_files is TRUE.

Examples




# Set data dir for file downloads
spod_set_data_dir(tempdir())

# Get available data list for v1 (2020-2021) data
spod_available_data(ver = 1)

# Get available data list for v2 (2022 onwards) data
spod_available_data(ver = 2)

# Get available data list for v2 (2022 onwards) data
# while also checking for local files that are already downloaded
spod_available_data(ver = 2, check_local_files = TRUE)



spanishoddata documentation built on June 16, 2025, 1:07 a.m.