# Below are code to run to setup the data # Get PXWEB levels px_levels <- pxweb_get("https://api.scb.se/OV0104/v1/doris/en/ssd/BE/BE0101/BE0101A/") px_levels save(px_levels, file = "vignettes/px_levels_example.rda") # Get PXWEB metadata about a table px_meta <- pxweb_get("https://api.scb.se/OV0104/v1/doris/en/ssd/BE/BE0101/BE0101A/BefolkningNy") px_meta save(px_meta, file = "vignettes/px_meta_example.rda") # Example Download pxweb_query_list <- list( "Civilstand" = c("*"), # Use "*" to select all "Kon" = c("1", "2"), "ContentsCode" = c("BE0101N1"), "Tid" = c("2015", "2016", "2017") ) pxq <- pxweb_query(pxweb_query_list) pxd <- pxweb_get( "https://api.scb.se/OV0104/v1/doris/en/ssd/BE/BE0101/BE0101A/BefolkningNy", pxq ) save(pxd, file = "vignettes/pxd_example.rda") pxq$response$format <- "json-stat" pxjstat <- pxweb_get( "https://api.scb.se/OV0104/v1/doris/en/ssd/BE/BE0101/BE0101A/BefolkningNy", pxq ) save(pxjstat, file = "vignettes/pxjstat_example.rda") pxq$response$format <- "px" pxfp <- pxweb_get( "https://api.scb.se/OV0104/v1/doris/en/ssd/BE/BE0101/BE0101A/BefolkningNy", pxq ) save(pxfp, file = "vignettes/pxfp_example.rda") pxweb_cite_example <- capture.output(pxweb_cite(pxd))
This R package provides tools to access PX-WEB API. Your contributions and bug reports and other feedback are welcome!
We can find more information on the PX-Web/PC-Axis API here.
PXWEB is an API structure developed by Statistics Sweden and other national statistical institutions (NSI) to disseminate public statistics in a structured way. This API enables downloading and using data from statistical agencies without using a web browser direct over HTTP/HTTPS.
The pxweb
R package connects any PXWEB API to R and facilitates the access, use and referencing of data from PXWEB APIs.
A number of organizations use PXWEB to distribute hierarchical data. You can browse the available data sets at:
The data in PXWEB APIs consists of metadata and data parts. Metadata is structured in a hierarchical node tree, where each node contains information about subnodes. The leaf nodes have information on which the dimensions are available for the data at that leaf node.
To install the latest stable release version from CRAN, just use:
install.packages("pxweb")
To install the latest stable release version from GitHub, just use:
library("remotes") remotes::install_github("ropengov/pxweb")
Test the installation by loading the library:
library(pxweb)
A tutorial is included with the package with:
vignette(topic="pxweb")
We also recommend setting the UTF-8 encoding since each API may have local specific letters:
Sys.setlocale(locale = "UTF-8")
There are two ways of using the pxweb
R package to access data, either interactively or using the core functions. To access data, two parts are needed, an URL to the data table in the API and a query specifying what data is of interest.
The simplest way of using pxweb
is to use it interactively, navigate the API to the data of interest, and then set up the query of interest.
# Navigate through all pxweb api:s in the R package API catalogue d <- pxweb_interactive() # Get data from SCB (Statistics Sweden) d <- pxweb_interactive("api.scb.se") # Fetching data from statfi (Statistics Finland) d <- pxweb_interactive("pxnet2.stat.fi") # Fetching data from StatBank (Statistics Norway) d <- pxweb_interactive("data.ssb.no") # To see all available PXWEB APIs use pxweb_apis <- pxweb_api_catalogue()
In the example above, we use the interactive functionality from the PXWEB API root, but we could use any path to the API.
# Start with a specific path. d <- pxweb_interactive("https://api.scb.se/OV0104/v1/doris/en/ssd/BE/BE0101/BE0101A")
This functionality also means that we can navigate any PXWEB API, irrespectively of if they are a part of the R package API catalogue or not. Just supply an URL to somewhere in the API and then navigate the API from there.
Due to new CRAN policies, it is not possible to use an R function to edit the API catalogue of the R package, but editing them can be done quickly from R using file.edit()
.
file.edit(pxweb_api_catalogue_path())
Although, if the pxweb
is installed again, it will overwrite the old API catalogue. So the easiest way is to add a PXWEB API to the global catalogue. To do this, do a pull request at the pxweb GitHub page here.
Under the hood, the pxweb package uses the pxweb_get()
function to access data from the PXWEB API. It also keeps track of the API's time limits and splits big queries into optimal downloadable chunks. If we use pxweb_get()
without a query, the function either returns a PXWEB LEVELS object or a PXWEB METADATA object. What is returned depends on if the URL points to a table in the API or not. Here is an example of a PXWEB LEVELS object.
# Get PXWEB levels px_levels <- pxweb_get("https://api.scb.se/OV0104/v1/doris/en/ssd/BE/BE0101/BE0101A/") px_levels
load("px_levels_example.rda") px_levels
And if we use pxweb_get()
for a table, a PXWEB METADATA object is returned.
# Get PXWEB metadata about a table px_meta <- pxweb_get("https://api.scb.se/OV0104/v1/doris/en/ssd/BE/BE0101/BE0101A/BefolkningNy") px_meta
load("px_meta_example.rda") px_meta
To download data, we need both the URL to the table and a query specifying what parts of the table are of interest. An URL to a table is an URL that will return a metadata object if not a query is supplied. Creating a query can be done in three main ways. The first and most straightforward approach is to use pxweb_interactive()
to explore the table URL and create a query interactively.
d <- pxweb_interactive("https://api.scb.se/OV0104/v1/doris/en/ssd/BE/BE0101/BE0101A/BefolkningNy")
The interactive function will return the query and the URL, even if the data is not downloaded.
# save(d, file = "d_example.rda") load("d_example.rda")
d$url d$query
We can also turn the query into a JSON query that we can use outside R.
pxweb_query_as_json(d$query, pretty = TRUE)
The second approach is to specify the query either as an R list or a JSON object. Some Statistical Agencies, such as Statistics Sweden, supply queries directly as a JSON object on their web pages. We can use these queries directly. Below is another example of a JSON query for the table above. For details on setting up a JSON query, see the PXWEB API documentation.
{ "query": [ { "code": "Civilstand", "selection": { "filter": "item", "values": ["OG", "G", "ÄNKL", "SK"] } }, { "code": "Kon", "selection": { "filter": "item", "values": ["1", "2"] } }, { "code": "ContentsCode", "selection": { "filter": "item", "values": ["BE0101N1"] } }, { "code": "Tid", "selection": { "filter": "item", "values": ["2015", "2016", "2017"] } } ], "response": { "format": "json" } }
To use this JSON query, we store the JSON query as a file and supply the path to the file to the "pxweb_query()
"function.
pxq <- pxweb_query("path/to/the/json/query.json")
Finally, we can create a PXWEB query from an R list where each list element is a variable and selected observation.
pxweb_query_list <- list( "Civilstand" = c("*"), # Use "*" to select all "Kon" = c("1", "2"), "ContentsCode" = c("BE0101N1"), "Tid" = c("2015", "2016", "2017") ) pxq <- pxweb_query(pxweb_query_list) pxq
We can validate the query against the metadata object to asses that we can use the query. This validation is done automatically when the data is fetched with pxweb_get()
but can also be done manually.
pxweb_validate_query_with_metadata(pxq, px_meta)
When we have the URL to a data table and a query, we can download the data with "pxweb_get()
". The function returns a pxweb_data
object that contains the downloaded data.
pxd <- pxweb_get( "https://api.scb.se/OV0104/v1/doris/en/ssd/BE/BE0101/BE0101A/BefolkningNy", pxq ) pxd
load("pxd_example.rda") pxd
If we instead want a JSON-stat object, we change the response format to JSON-stat, and we will get a JSON-stat object returned.
pxq$response$format <- "json-stat" pxjstat <- pxweb_get( "https://api.scb.se/OV0104/v1/doris/en/ssd/BE/BE0101/BE0101A/BefolkningNy", pxq ) pxjstat
load("pxjstat_example.rda") pxjstat
Some return formats return files. Then, these responses are stored in the R tempdir()
folded, and the file paths are returned by pxweb_get()
. Currently, px
and sdmx
formats can be downloaded as files, but file an issue if you need other response formats.
pxq$response$format <- "px" pxfp <- pxweb_get( "https://api.scb.se/OV0104/v1/doris/en/ssd/BE/BE0101/BE0101A/BefolkningNy", pxq ) pxfp
load("pxfp_example.rda") pxfp
If the queries are large (contain more values than the PXWEB API maximum allowed values), the query is chunked into optimal chunks and is then downloaded sequentially. PXWEB data objects are then combined into one large PXWEB data object, while JSON-stat objects are returned as a list of JSON-stat objects, and other files are stored in tempdir()
as separate files.
For more advanced connections to the API, the pxweb_advanced_get()
gives the flexibility to access the underlying HTTP calls using httr
and log the HTTP calls for debugging.
We can then convert the downloaded PXWEB data objects to a data. frame
or to a character matrix. The character matrix contains the "raw" data while data. frame
returns an R data.frame
in a tidy format. This conversion means missing values (such as ".." are converted to NA
) in a data. frame
. Using the arguments variable.value.type
and column.name.type
, we can choose if we want the code or the text column names and value types.
pxdf <- as.data.frame(pxd, column.name.type = "text", variable.value.type = "text") head(pxdf)
pxdf <- as.data.frame(pxd, column.name.type = "code", variable.value.type = "code") head(pxdf)
Similarly, we can access the raw data as a character matrix with as.matrix
.
pxmat <- as.matrix(pxd, column.name.type = "code", variable.value.type = "code") head(pxmat)
In addition to the data, the PXWEB DATA object may also contain comments for the data. This can be accessed using pxweb_data_comments()
function.
pxdc <- pxweb_data_comments(pxd) pxdc
In this case, we did not have any comments. If we have comments, we can turn the comments into a data. frame
with one comment per row.
as.data.frame(pxdc)
Finally, if we use the data, we can easily create a citation for a pxweb_data
object using the pxweb_cite()
function. For full reproducibility, please also cite the package.
pxweb_cite(pxd)
load("pxweb_cite_example.rda") cat(pxweb_cite_example, sep = "\n")
See TROUBLESHOOTING.md for a list of current known issues.
This work can be freely used, modified and distributed under the open license specified in the DESCRIPTION file.
We created this vignette with
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.