knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(eudata)
library(dplyr) library(purrr) library(ggplot2)
The data on GISCO is divided into topics.
get_topics()
Select the topics that you are interested in. Within each topic there are numerous files. These may differ in in the year they are associated with, spatial resolution, coordinate reference system, data format, among other things.
This package provides an easy access to the latest files. The example below selects the highest resolution file, where the coordinate system is the usual lat/long.
api <- get_topic("NUTS") file_list <- get_latest_files(api)$gpkg |> grep(pattern = "01M_.*_4326_", value = TRUE) file_list
Be aware, that these files can be huge. The get_content_length
function returns the size of a file without downloading it. It is not vectorized, so you have to use a map
like construct if you have a list of files.
to_tibble <- function(x, column_name = "value") tibble::tibble(names = names(x), `:=`(!!column_name, x)) file_sizes <- map_int(file_list, get_content_length, api = api) |> to_tibble(column_name = "size") # tibble::as_tibble_col() file_sizes |> knitr::kable( format.args = list(big.mark = "_", scientific = FALSE) )
Suppose we selected a file to download. Then you can save it to a local file using the get_content
function. It also save a copy into a cache under your cache folder. The place of this folder is OS dependent, use rappdirs::user_cache_dir("eudata")
to locate it.
If you do not specify a dest
file, the data will be downloaded into a temporary file. The path to this file is the body
element of the result of the call.
file_to_download <- grep(pattern = "RG.*LEVL_3", file_list, value = TRUE) file_to_download result <- get_content( api = api, end_point = file_to_download, save_to_file = TRUE ) result
The selected data format dpkg
can be read into memory with the sf
package. First only the first five records are shown.
db_file <- result$body layer <- sf::st_layers(db_file) layer sample <- sf::st_read( db_file, query = glue::glue("select * from \"{layer}\" limit 5") ) sample
Once you have the structure of the database, it is easy to filter, for example, for Hungarian data only.
hu_data <- sf::st_read( db_file, query = glue::glue("select * from \"{layer}\" where CNTR_CODE = \"HU\"") ) hu_data |> knitr::kable()
A map with ggplot2
.
hu_data |> ggplot() + geom_sf()
Another example, now for postal codes.
api <- get_topic("Postal") file_to_download <- grep("_4326", get_latest_files(api)$gpkg, value = TRUE) result <- get_content(api, file_to_download, save_to_file = TRUE) result
db_file <- result$body layer <- sf::st_layers(db_file) layer sample <- sf::st_read( db_file, query = glue::glue("select * from \"{layer}\" limit 5") ) sample hu_data <- sf::st_read( db_file, query = glue::glue("select * from \"{layer}\" where CNTR_ID = \"HU\"") ) hu_data |> select(POSTCODE, LAU_NAT)
EOV <- "EPSG:23700" hu_data |> filter(grepl("^Gyöngyös$", LAU_NAT)) |> ggplot() + geom_sf() + coord_sf(crs = EOV, datum = EOV)
Cities with the highest number of associated postal codes
hu_data |> sf::st_drop_geometry() |> count(LAU_NAT) |> arrange(-n) |> filter(n > 1)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.