spod_get | R Documentation |
This function creates a DuckDB lazy table connection object from the specified type and zones. It checks for missing data and downloads it if necessary. The connnection is made to the raw CSV files in gzip archives, so analysing the data through this connection may be slow if you select more than a few days. You can manipulate this object using dplyr
functions such as select, filter, mutate, group_by, summarise, etc. In the end of any sequence of commands you will need to add collect to execute the whole chain of data manipulations and load the results into memory in an R data.frame
/tibble
. See codebooks for v1 and v2 data in vignettes with spod_codebook(1) and spod_codebook(2).
If you want to analyse longer periods of time (especiially several months or even the whole data over several years), consider using the spod_convert and then spod_connect.
If you want to quickly get the origin-destination data with flows aggregated for a single day at municipal level and without any extra socio-economic variables, consider using the spod_quick_get_od function.
spod_get(
type = c("od", "origin-destination", "os", "overnight_stays", "nt", "number_of_trips"),
zones = c("districts", "dist", "distr", "distritos", "municipalities", "muni",
"municip", "municipios", "lua", "large_urban_areas", "gau", "grandes_areas_urbanas"),
dates = NULL,
data_dir = spod_get_data_dir(),
quiet = FALSE,
max_mem_gb = max(4, spod_available_ram() - 4),
max_n_cpu = max(1, parallelly::availableCores() - 1),
max_download_size_gb = 1,
duckdb_target = ":memory:",
temp_path = spod_get_temp_dir(),
ignore_missing_dates = FALSE
)
type |
The type of data to download. Can be |
zones |
The zones for which to download the data. Can be |
dates |
A The possible values can be any of the following:
|
data_dir |
The directory where the data is stored. Defaults to the value returned by |
quiet |
A |
max_mem_gb |
The maximum memory to use in GB. A conservative default is 3 GB, which should be enough for resaving the data to |
max_n_cpu |
The maximum number of threads to use. Defaults to the number of available cores minus 1. |
max_download_size_gb |
The maximum download size in gigabytes. Defaults to 1. |
duckdb_target |
(Optional) The path to the duckdb file to save the data to, if a convertation from CSV is reuqested by the |
temp_path |
The path to the temp folder for DuckDB for intermediate spilling in case the set memory limit and/or physical memory of the computer is too low to perform the query. By default this is set to the |
ignore_missing_dates |
Logical. If |
A DuckDB lazy table connection object of class tbl_duckdb_connection
.
# create a connection to the v1 data
spod_set_data_dir(tempdir())
dates <- c("2020-02-14", "2020-03-14", "2021-02-14", "2021-02-14", "2021-02-15")
nt_dist <- spod_get(type = "number_of_trips", zones = "distr", dates = dates)
# nt_dist is a table view filtered to the specified dates
# for advanced users only
# access the source connection with all dates
# list tables
DBI::dbListTables(nt_dist$src$con)
# disconnect
spod_disconnect(nt_dist)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.