openes_load: Extract data and metadata from a given data set of <URL:...

Description Usage Arguments Details Value Examples

View source: R/openes_load.R

Description

Extract data and metadata from a given data set of https://datos.gob.es/

Usage

1
openes_load(x, encoding = "UTF-8", guess_encoding = TRUE, ...)

Arguments

x

A tibble given by openes_keywords only containing one dataset (1 row) or the end path of a dataset such as 'l01280148-seguridad-ciudadana-actuaciones-de-seccion-del-menor-en-educacion-vial-20141' from https://datos.gob.es/es/catalogo/l01280148-seguridad-ciudadana-actuaciones-de-seccion-del-menor-en-educacion-vial-20141.

encoding

The encoding passed to read (all) the files. Most cases should be resolved with either 'UTF-8', 'latin1' or 'ASCII'.

guess_encoding

A logical stating whether to guess the encoding. This is set to TRUE by default. Whenever guess_encoding is set to TRUE, the 'encoding' argument is ignored. If guess_encoding fails to guess the encoding, openes_load falls back to the encoding argument.

...

Arguments passed to read_csv and the other related read_* functions from readr. Internally, openes_load determines the delimiter of the file being read but the arguments for each of these functions are practically the same, so it doesn't matter how openes_load determines the delimiter, any of the arguments will work on all read_* functions.

Details

openes_load can return two possible outcomes: either an empty list or a list with a slot called metadata and another slot called data. Whenever the path_id argument is an invalid dataset path, it will return an empty list. When path_id is a valid dataset path, openes_load will return an a list with the two slots described above.

For the metadata slot, openes_load returns a tibble with most available metadata of the dataset. The columns are:

The metadata of the API can sometimes be returned in an incorrect order. For example, there are cases when there are several languages available and the order of the different descriptions are not in the same order of the languages. If you find any of these errors, try raising the issue directly to https://datos.gob.es/ as the package extracts all metadata in the same order as it is.

Whenever the metadata is in different languages, the resulting tibble will have the same numer of rows as there are languages containing the different texts in different languages and repeating the same information whenever it's similar across languages (such as the dates, which are language agnostic).

In case the API returns empty requests, both data and metadata will be empty tibble's with the same column names.

For the data slot, openes_load returns a list containing at least one tibble. If the dataset being request has file formats that openes_load can read (see permitted_formats) it will read those files. If that dataset has several files, then it will return a list of the same length as there are datasets where each slot in that list is a tibble with the data. If for some reason any of the datasets being read cannot be read, openes_load has a fall back mechanism that returns the format that attempted to read together with the URL so that the user can try to read the dataset directly. In any case, the result will always be a list with tibble's where each one could be the requested dataset (success) or a dataset with the format and url that attempted to read but failed (failure).

Inside the data slot, each list slot containing tibble's will be named according to the dataset that was read. When there is more than one dataset, the user can then enter the website in the url column in the metadata slot to see all names of the datasets. This is handy, for example, when the same dataset is repeated across time and we want to figure out which data is which from the slot.

The API of https://datos.gob.es/ is not completely homogenous because it is an aggregator of many different API's from different cities and provinces of Spain. openes_load can only read a limited number of file formats but will keep increasing as the package evolves. You can check the available file formats in permitted_formats. If the file format of the requested path_id is not readable, openes_load will return a list with only one tibble with all available formats with their respective data URL inside the data slot so that users can read the manually.

In a similar line, in order for openes_load to provide the safest behavior, it is very conservative in which publisher it can read from https://datos.gob.es/. Because some publishers do not have standardized datasets, reading many different publishers can become very messy. openes_load currently reads files from selected publishers because they offer standardized datasets which makes it safer to read. As the package evolves and the data quality improves between publishers, the package will include more publishers. See the publishers that the package can read in publishers_available.

Value

if path_id is a valid dataset path, a list with two slots: metadata and data. Each slot contains tibble's that contain either metadata or the data itself. If path_id is not a valid dataset path, it returns an empty list. See the details section for some caveats.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
# For a dataset with only one file to read

example_id <- 'l01080193-fecundidad-en-madres-de-15-a-19-anos-de-la-ciudad-de-barcelona1'
some_data <- openes_load(example_id)

# Print the file to get some useful information
some_data

# Access the metadata
some_data$metadata

# Access the data. Note that the name of the dataset is in the list slot. Whenever
# there are different files being read, you might want to enter to homepage
# of the dataset in datos.gob.es with some_data$metadata$url or directly
# to the homepage of dataset at the publisher's website
# some_data$metadata$publisher_data_url
some_data$data


# For a dataset with many files

## Not run: 
example_id <- 'l01080193-domicilios-segun-nacionalidad'
res <- openes_load(example_id)

# Note that you can see how many files were read in '# of files read'
res

# See how all datasets were read but we're not sure what each one means.
# Check the metadata and read the description. If that doesn't do it,
# go to the URL of the dataset from the metadata.
res$data

# Also note that some of the datasets were not read uniformly correct. For example,
# some of these datasets were read with more columns or more rows. This is left
# to the user to fix. We could've added new arguments to the `...` but that would
# apply to ALL datasets and it then becomes too complicated.


# Encoding problems

long <- "l01080193-descripcion-de-la-causalidad-de-los-accidentes"
string <- "-gestionados-por-la-guardia-urbana-en-la-ciudad-de-barcelona"

id <- paste0(long, string)
pl <- openes_load(id)

# The dataset is read successfully but once we print them, there's an error
pl$data
$`2011_ACCIDENTS_CAUSES_GU_BCN_.csv`
Error in nchar(x[is_na], type = "width") :
  invalid multibyte string, element 1

# This error is due to an encoding problem.
# We can use readr::guess_encoding to determine the encoding and reread

# This suggests an ASCII encoding
library(readr)
guess_encoding(pl$data[[1]])

pl <- openes_load(id, 'ASCII')

# Success
pl$data


# For exploring datasets with openes_keywords and piping to openes_load
library(dplyr)

kw <- openes_keywords("turismo", "l01080193") # Ayuntamiento de Barcelona#'
kw

dts <-
 kw 
 filter(is_readable == TRUE,
        grepl("Tipos de propietarios", description)) 
 openes_load()

dts$metadata

dts$data

## End(Not run)

cimentadaj/datos_gob documentation built on April 16, 2021, 11:47 a.m.