get_stats19: Download, read and format STATS19 data in one function.

View source: R/get.R

get_stats19R Documentation

Download, read and format STATS19 data in one function.

Description

Download, read and format STATS19 data in one function.

Usage

get_stats19(
  year = NULL,
  type = "collision",
  data_dir = get_data_directory(),
  file_name = NULL,
  format = TRUE,
  ask = FALSE,
  silent = FALSE,
  output_format = "tibble",
  engine = "readr",
  where = NULL,
  ...
)

Arguments

year

Single year for which data are to be read

type

One of 'collision', 'casualty', 'Vehicle'; defaults to 'collision'.

data_dir

Where sets of downloaded data would be found.

file_name

Character string of a specific STATS19 CSV filename to download/read. If NULL, filenames are inferred from year and type.

format

Switch to return raw read from file, default is TRUE.

ask

Should you be asked whether or not to download the files? TRUE by default.

silent

Boolean. If FALSE (default value), display useful progress messages on the screen.

output_format

A string that specifies the desired output format. The default value is "tibble". Other possible values are "data.frame", "sf" and "ppp", that, respectively, returns objects of class data.frame, sf::sf and spatstat.geom::ppp. Any other string is ignored and a tibble output is returned. See details and examples.

engine

CSV reader backend. Defaults to "readr". Set to "duckdb" to query files via DuckDB before loading into R.

where

Optional SQL predicate appended to the WHERE clause when engine = "duckdb", e.g. "longitude > -1.9 AND longitude < -1.2". Ignored when engine = "readr".

...

Other arguments be passed to format_sf() or format_ppp() functions. Read and run the examples.

Details

This function gets STATS19 data. Behind the scenes it uses dl_stats19() and ⁠read_*⁠ functions, returning a tibble (default), data.frame, sf or ppp object, depending on the output_format parameter.

By default, stats19 downloads files to a temporary directory. You can change this behavior to save the files in a permanent directory. This is done by setting the STATS19_DOWNLOAD_DIRECTORY environment variable. A convenient way to do this is by adding ⁠STATS19_DOWNLOAD_DIRECTORY=/path/to/a/dir⁠ to your .Renviron file, which can be opened with usethis::edit_r_environ().

The function returns data for a specific year (e.g. year = 2022)

Note: for years before 2016 the function may return data from more years than are requested due to the nature of the files hosted at data.gov.uk.

As this function uses dl_stats19 function, it can download many MB of data, so ensure you have a sufficient disk space.

If output_format = "data.frame" or output_format = "sf" or output_format = "ppp" then the output data is transformed into a data.frame, sf or ppp object using the as.data.frame() or format_sf() or format_ppp() functions, as shown in the examples.

See Also

dl_stats19()

read_collisions()

Examples


if(curl::has_internet()) {
col = get_stats19(year = 2022, type = "collision")
cas = get_stats19(year = 2022, type = "casualty")
veh = get_stats19(year = 2022, type = "vehicle")
class(col)
# data.frame output
x = get_stats19(2022, silent = TRUE, output_format = "data.frame")
class(x)

# # Get 5-years worth of data (commented-out due to large response size):
# col_5 = get_stats19(year = 5, type = "collision")
# cas_5 = get_stats19(year = 5, type = "casualty")
# veh_5 = get_stats19(year = 5, type = "vehicle")


# Run tests only if endpoint is alive:
if(nrow(x) > 0) {

# use duckdb engine
col_duck = get_stats19(year = 2022, type = "collision", engine = "duckdb")

# use duckdb with where clause
col_where = get_stats19(year = 2022, type = "collision", engine = "duckdb",
                       where = "speed_limit = 30")

# sf output
x_sf = get_stats19(2022, silent = TRUE, output_format = "sf")

# sf output with lonlat coordinates
x_sf = get_stats19(2022, silent = TRUE, output_format = "sf", lonlat = TRUE)
sf::st_crs(x_sf)

if (requireNamespace("spatstat.geom", quietly = TRUE)) {
# ppp output
x_ppp = get_stats19(2022, silent = TRUE, output_format = "ppp")

# We can use the window parameter of format_ppp function to filter only the
# events occurred in a specific area. For example we can create a new bbox
# of 5km around the city center of Leeds

leeds_window = spatstat.geom::owin(
xrange = c(425046.1, 435046.1),
yrange = c(428577.2, 438577.2)
)

leeds_ppp = get_stats19(2022, silent = TRUE, output_format = "ppp", window = leeds_window)
spatstat.geom::plot.ppp(leeds_ppp, use.marks = FALSE, clipwin = leeds_window)
}
}
}


stats19 documentation built on March 18, 2026, 5:08 p.m.