In evanodell/dwpstat: Access 'Stat-Xplore' data on the UK benefits system

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Package Info

The UK Department for Work and Pensions (DWP) operates the Stat-Xplore database, containing official statistics on benefit claims, caseloads, and other relevant data. The DWP also operates a public API service, which dwpstat wraps. Full documentation for the DWP's API is available (here)[https://stat-xplore.dwp.gov.uk/webapi/online-help/Open-Data-API.html].

API key and rate limits

Users need to sign up for a Stat-Xplore account and create an API key to access the API. The API key can be set using dwp_api_key(), which checks for an DWP_API_KEY environmental variable, and allows users to set the key for a given session.

The API is rate limited, a limit that resets every hour. You can use dwp_rate_limit() to check current usage rates.

library(dwpstat)
dwp_rate_limit()
# A tibble: 1 x 3
  remaining reset               limit
      <int> <dttm>              <int>
      9997 2018-10-01 14:18:42 10000

Identify Available Data

Use dwp_schema() to identify available datasets, and the variables within those datasets. dwp_schema() accepts ID variables at any level and returns all metadata at the level below. If dwp_schema() is empty, returns all available data folders at the root level of the API.

Get a tibble of all available folders in the API:

library(tibble)

a <- dwp_schema()

glimpse(a)

Observations: 23
Variables: 4
$ id       <chr> "str:folder:faa", "str:folder:fbc", "str:folder:fbencom", "str:folder:fbb", "str:folder:fca...
$ label    <chr> "Attendance Allowance", "Benefit Cap", "Benefit Combinations", "Bereavement Benefits", "Car...
$ location <chr> "http://stat-xplore.dwp.gov.uk/webapi/rest/v1/schema/str:folder:faa", "http://stat-xplore.d...
$ type     <chr> "FOLDER", "FOLDER", "FOLDER", "FOLDER", "FOLDER", "FOLDER", "FOLDER", "FOLDER", "FOLDER", "...

Get a tibble of metadata on all databases in the ESA folder

b <- dwp_schema("str:folder:fesa")

glimpse(b)

Observations: 1
Variables: 4
$ id       <chr> "str:database:ESA_Caseload"
$ label    <chr> "ESA Caseload"
$ location <chr> "http://stat-xplore.dwp.gov.uk/webapi/rest/v1/schema/str:database:ESA_Caseload"
$ type     <chr> "DATABASE"

Get a tibble of all variables on all databases in the ESA folder:

c <- dwp_schema("str:database:ESA_Caseload")

glimpse(c)

Observations: 15
Variables: 4
$ id       <chr> "str:count:ESA_Caseload:V_F_ESA", "str:measure:ESA_Caseload:V_F_ESA:CAWKLYAMT", "str:group:...
$ label    <chr> "Employment and Support Allowance Caseload", "Weekly Award Amount", "Admin-use only", "Quar...
$ location <chr> "http://stat-xplore.dwp.gov.uk/webapi/rest/v1/schema/str:count:ESA_Caseload:V_F_ESA", "http...
$ type     <chr> "COUNT", "MEASURE", "GROUP", "FIELD", "GROUP", "FIELD", "FIELD", "FIELD", "FIELD", "FIELD",...

Given their ID, you can use dwp_scheme to return the names of levels in group and field variables

d <- dwp_schema("str:field:ESA_Caseload:V_F_ESA:CTDURTN")

glimpse(d)

Observations: 1
Variables: 4
$ id       <chr> "str:valueset:ESA_Caseload:V_F_ESA:CTDURTN:C_ESA_DURATION"
$ label    <chr> "Duration of current claim"
$ location <chr> "http://stat-xplore.dwp.gov.uk/webapi/rest/v1/schema/str:valueset:ESA_Caseload:V_F_ESA:CTDU...
$ type     <chr> "VALUESET"

Returns a tibble of levels for the duration options for ESA caseloads

e <- dwp_schema("str:valueset:ESA_Caseload:V_F_ESA:CTDURTN:C_ESA_DURATION")

glimpse(e)

Observations: 7
Variables: 4
$ id       <chr> "str:value:ESA_Caseload:V_F_ESA:CTDURTN:C_ESA_DURATION:1", "str:value:ESA_Caseload:V_F_ESA:...
$ label    <chr> "Up to 3 months", "3 months up to 6 months", "6 months up to 1 year", "1 year and up to 2 y...
$ location <chr> "http://stat-xplore.dwp.gov.uk/webapi/rest/v1/schema/str:value:ESA_Caseload:V_F_ESA:CTDURTN...
$ type     <chr> "VALUE", "VALUE", "VALUE", "VALUE", "VALUE", "VALUE", "VALUE"

Get data

Due to the structure of the API, actual data is returned in a list with six levels. The process for converting that data to a data frame or tibble varies considerably depending on the column, row and wafer fields used, and so no generic function for conversion is provided at this time.

The data structure can be difficult to work with in R; it is often unclear how to match together data labels, in the fields list level, with the actual data, in the cubes level. The example below shows a relatively simple process of assigning names to a query.

library(purrr)
pip <- dwp_get_data(database = "str:database:PIP_Monthly",
                  measures = "str:count:PIP_Monthly:V_F_PIP_MONTHLY",
                  column = "str:field:PIP_Monthly:V_F_PIP_MONTHLY:SEX",
                  row = "str:field:PIP_Monthly:F_PIP_DATE:DATE2")

pip2 <- as.data.frame(map(pip$cubes, "values"))

pip_names <- pip$fields$items

names(pip2) <- pip_names[[2]]$labels

pip2$date <- pip_names[[1]]$labels

glimpse(pip2)
Observations: 64
Variables: 4
$ Male                 <dbl> 12, 124, 304, 920, 1710, 2714, 4291, 6580, 8922, 12354, 15710, 20238, 25162, 30...
$ Female               <dbl> 17, 118, 303, 940, 1746, 2756, 4579, 7280, 10087, 14275, 18384, 23902, 30006, 3...
$ `Unknown or missing` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ date                 <list> ["201304 (Apr-13)", "201305 (May-13)", "201306 (Jun-13)", "201307 (Jul-13)", "...