The Canary Islands Statistics Institute (Instituto Canario de Estadítica, istacbase)^[http://www.gobiernodecanarias.org/istacbase/] is the central organ of the autonomous statistical system and official research center of the Government of the Canary Islands, created and regulated by Law 1/1991 of 28 January, Statistics of the Autonomous Community of the Canary Islands (CAC), and among others, assigns the following functions:
Providing statistical information: The istacbase has among its objectives to provide, with technical and professional independence, statistical information of interest to the CAC, taking into account the fragmentation of the territory and its singularities and complying with the principles established in the Code of Good Practices of the European Statistics.
Coordinate public statistical activity: The istacbase is the body responsible for promoting, managing and coordinating the public statistical activity of the CAC, assuming the exercise of the statutory competence provided for in Article 30, paragraph 23, of the Statute of Autonomy of the Canary Islands .
To help provide access to this rich source of information, istac themselves, provide a well structured API^[https://es.slideshare.net/ISTAC/guia-de-uso-api-de-acceso-a-istac-base]. While this API is very useful for integration into web services and other high-level applications, it becomes quickly overwhelming for researchers who have neither the time nor the expertise to develop software to interface with the API. This leaves the researcher to rely on manual bulk downloads of spreadsheets of the data they are interested in. This too is can quickly become overwhelming, as the work is manual, time consuming, and not easily reproducible. The goal of the istacbaser
R-package is to provide a bridge between these alternatives and allow researchers to focus on their research questions and not the question of accessing the data. The istacbaser
R-package allows researchers to quickly search and download the data of their particular interest in a programmatic and reproducible fashion; this facilitates a seamless integration into their workflow and allows analysis to be quickly rerun on different areas of interest and with realtime access to the latest available data.
istacbaser
R-package:POSIXct
dates for easy integration into plotting and time-series analysis techniquesgrep
style searching for data descriptions and namesThe first step would be searching for the data you are interested in. istacbase_search()
provides grep
style searching of all available indicators from the istacbase API and returns the indicator information that matches your query.
cache
For performance and ease of use, a cached version of useful information is provided with the istacbaser
R-package. This data is called cache
and provides a snapshot of available islands, indicators, and other relevant information. cache
is by default the the source from which istacbase_search()
and istacbase()
uses to find matching information. The structure of cache
is as follows
library(istacbaser) str(cache, max.level = 1)
istacbase_search()
istacbase_search()
searches through the cache
data frame to find indicators that match a search pattern. An example of the structure of this data frame is below
knitr::kable(head(istacbaser::cache[4310:4311, ]))
By default the search is done over the titulo
field and returns all the columns of the matching rows. The ID
values are inputs into istacbase()
, the function for downloading the data. To return the key columns ID
and titulo
for the cache
data frame, you can set extra = TRUE
.
library(istacbaser) busqueda <- istacbase_search(pattern = "parado") head(busqueda)
Other fields can be searched by simply changing the fields
parameter. For example
library(istacbaser) EPA_busqueda <- istacbase_search(pattern = "Encuesta de Población Activa", fields = "encuesta") head(EPA_busqueda)
Regular expressions are also supported.
library(istacbaser) # 'pobreza' OR 'parados' OR 'trabajador' popatr_busqueda <- istacbase_search(pattern = "pobreza|parados|trabajador") head(popatr_busqueda)
istacbase()
Once you have found the set of indicators that you would like to explore further, the next step is downloading the data with istacbase()
. The following examples are meant to highlight the different ways in which istacbase()
can be used and demonstrate the major optional parameters.
The default value for the islas
parameter is a special value of all
which as you might expect, returns data on the selected ID
for every available country or region, if it is aviable.
library(istacbaser) pop_data <- istacbase(istacbase_table = "dem.pob.exp.res.40") head(pop_data)
If you are interested in only some subset of islands you can pass along the specific island to the island
parameter. The islands that can be passed to the islas
parameter correspond to all
,Canarias
,Lanzarote
,Fuerteventura
,Gran Canaria
,Tenerife
,La Gomera
,La Palma
or El Hierro
.
library(istacbaser) # Population, total # country values: iso3c, iso2c, regionID, adminID, incomeID pop_data <- istacbase(islas = c("Lanzarote","Fuerteventura"), istacbase_table = "dem.pob.exp.res.40") head(pop_data)
POSIXct
Setting the parameter POSIXct = TRUE
gives you the posibility to work with dates. You can set a startdate
and enddate
.
library(istacbaser) # islas values: all,Canarias,Lanzarote,Fuerteventura,Gran Canaria,Tenerife,La Gomera,La Palma or El Hierro pop_data <- istacbase(islas = c("Lanzarote","Fuerteventura"), istacbase_table = "dem.pob.exp.res.40") head(pop_data)
POSIXct = TRUE
The default format for the Periodo
or Años
column is not conducive to sorting or plotting, especially when downloading sub annual data, such as monthly or quarterly data. To address this, if TRUE
, the POSIXct
parameter adds the additional columns fecha
and periodicidad
. fecha
converts the default date into a POSIXct
. periodicidad
denotes the time resolution that the date represents. This option requires the use of the package lubridate (>= 1.5.0)
. If POSIXct = TRUE
and lubridate (>= 1.5.0)
is not available, a warning
is produced and the option is ignored.
startdate
and enddate
must be in the format YYYY
.
library(istacbaser) pop_data <- istacbase(istacbase_table = "dem.pob.exp.res.40", POSIXct = TRUE, startdate = 2010, enddate = 2016) head(pop_data)
The POSIXct = TRUE
option makes plotting and sorting dates much easier.
library(istacbaser) library(ggplot2) pop_data <- istacbase(islas = "Canarias", istacbase_table = "dem.pob.exp.res.40", POSIXct = TRUE, startdate = 2010, enddate = 2016) ggplot(pop_data, aes(x = fecha, y = valor, colour = Nacionalidades)) + geom_line(size = 1) + labs(title = "Población según sexos y nacionalidades. Islas de Canarias y años", x = "Fecha", y = "Habitantes") + theme(legend.position="bottom") + facet_wrap(~ Sexos)
mrv
If you do not know the latest date an indicator you are interested in is available you can use the mrv
instead of startdate
and enddate
. mrv
stands for most recent value and takes a integer
corresponding to the number of most recent values you wish to return
library(istacbaser) pop_data <- istacbase(istacbase_table = 'dem.pob.exp.res.40', POSIXct = TRUE, mrv = 1) head(pop_data)
You can increase this value and it will return no more than the mrv
value. However, if mrv
is greater than the number of available data it will return all data instead.
freq
If the data has several granularity, you can select son eof them with the parameter freq
. Possible values are:
anual
,semestral
,trimestral
,mensual
,quincenal
,semanal
. If the granularity selected is not aviable all granularity will be showed.
library(istacbaser) soc_data <- istacbase(istacbase_table = 'soc.sal.est.ser.1625', POSIXct = TRUE, freq = "semestral") head(soc_data)
startdate
, enddate
, mrv
, freq
with POSIXct = FALSE
If you make a query with istacbase()
and POSIXct = FALSE
the startdate
, enddate
, mrv
, freq
are ignored and the function will launch a warning.
library(istacbaser) soc_data <- istacbase(istacbase_table = 'soc.sal.est.ser.1625', POSIXct = FALSE, freq = "semestral") head(soc_data)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.