knitr::opts_chunk$set(echo = TRUE)
library(ckanr)
library(tidyverse)
library(jsonlite)
library(dplyr)
library(hdxr)

hdxr

hdxr is a package designed to allow you to interact with Humanitarian Data Exchange (HDX) in R, allowing for more reproducible workflow with humanitarian data.

Currently it has functions that can read information from:

Installation:

library(devtools)

install_github("callumgwtaylor/hdxr")

Load:

library(hdxr)

Connecting to HDX

HDX uses CKAN to store data, and to connect to its server you first use:

hdx_connect()

This simply uses the ckanr command ckanr_setup(), specifying HDX

Datasets

Listing Datasets

To see what datasets are available on HDX, use:

hdx_dataset_list()

This will provide a dataframe, with a column containing the first 5000 titles of datasets available (use the limit argument to get more than 5000)

Retreiving Datasets

To download further information about a dataset, use:

hdx_dataset_search(term = "blah")

This will retrieve a dataframe of the first 10 datasets that have titles similar to the provided search term.

To return more than 10 results use the rows argument.

When you know the correct title, and only want that result, use the exact argument.

hdx_dataset_search(term = "afghanistan-roads", exact = TRUE)

Resources

Extracting Resources

Once you've selected the dataset you want using the hdx_dataset_search() function, you can extract information about the resources (the data files themselves) using:

hdx_resource_list()

This can be piped, in a situation like:

hdx_dataset_search(term = "afghanistan-roads", exact = TRUE) %>%
  hdx_resource_list()

This will provide a dataframe, with one line for each resource found, joined on to the initial information about the datasets.

Loading CSV

The hdx_resource_csv() function will take the results of hdx_resource_list(), select the resources that are csv, and download them into the environment, as a nested dataframe. You can use this function to load multiple csv files into the environment at once.

Once downloaded, you can then unnest the csv file you want to explore further:

afghanistan <- hdx_dataset_search(term = "ocha-afghanistan-topline-figures") %>%
  hdx_resource_list() %>%
  hdx_resource_csv()

afghanistan[1,] %>%
  unnest()

Loading Shapefiles

The hdx_resource_shapefile() function will take the results of hdx_resource_list(), select the resources that are zipped shapefiles, and download the first one. It will unzip it into its own folder, then load the shapefiles into the environment using simple features.

afghanistan <- hdx_dataset_search(term = "afghanistan-roads") %>%
  hdx_resource_list() %>%
  hdx_resource_shapefile()

Example Usage

hdx_connect()

list <- hdx_dataset_list(500)

dim(list)

head(list)

# Search for a specific package, then filter by title using dplyr
dataset <- hdx_dataset_search("ACLED Conflict Data") %>%
  filter(title == "ACLED Conflict Data for Algeria")

# Or if you know the exact name, search directly for that

dataset <- hdx_dataset_search("acled-conflict-data-for-algeria", exact = TRUE)

names(dataset)

# Expand the resources section from the search you just performed
resources <- hdx_resource_list(dataset)

# The results can also be piped:
resources <- hdx_dataset_search("acled-conflict-data-for-algeria", exact = TRUE) %>%
  hdx_resource_list()

names(resources)

# You can use column `format` to filter for the type of data you want
resources$format

# You can use column `hdx_rel_url` to find the address of the resource you want
resources$hdx_rel_url

# You can use hdx_resource_csv to download csv files from the server into a nested dataframe. This can all be done in a pipeline.

kenya <- hdx_dataset_search("kenya-conflict-2012-2014", exact = TRUE) %>%
  hdx_resource_list() %>%
  hdx_resource_csv()

kenya

kenya %>%
  unnest()

# You can use hdx_resource_shapefiles to download the first shapefile from the server into a simple features pipeline. This can all be done in a pipeline.

afghanistan <- hdx_dataset_search("afghanistan-roads", exact = TRUE) %>%
  hdx_resource_list() %>%
  hdx_resource_shapefile()

afghanistan 


callumgwtaylor/hdxr documentation built on May 14, 2019, 9:22 p.m.