With proper authorization, you can add and update datasets and resources to the Data Catalog through ddhconnect
.
The World Bank Data Catalog is built using the DKAN platform. In line with DKAN terminology, the Data Catalog content is organized in datasets and resources.
For more information, please refer to the DKAN documentation
First, set up your connection and authenticate your credentials using dkanr::dkanr_setup()
. You can contact the World Bank's IT Services to obtain login information to access the API's extended services, provided that you are authorized for these credentials.
library(dkanr) library(ddhconnect) dkanr_setup( url = 'http://ddh1stg.prod.acquia-sites.com', username = Sys.getenv("ddh_username"), password = Sys.getenv("ddh_stg_password") )
library(dkanr) library(ddhconnect) dkanr_setup( url = 'https://datacatalog.worldbank.org', username = "your_username", password = "your_password" )
To add content to the Data Catalog, you'll need to
Consider the case where we want to add a geospatial dataset, and attach a zip file containing the data as a resource.
To add a new dataset to the Data Catalog, you need to provide a minimum of information about it. You can find the required metadata fields for a given data type using get_required_fields()
. In our example, we want to add a geospatial dataset.
get_required_fields("Geospatial")
The metadata should be organized into a named list with the field names and corresponding values. The create_json_body()
will then take care of the formatting and generate the required JSON body for you.
The required JSON body is different for dataset and resource. Set the node_type
argument in create_json_body()
to generate the correct JSON body.
metadata = list("title" = "My first dataset", "body" = "A short description: What my dataset is about.", "field_topic" = "Poverty", "field_wbddh_data_class" = "Public", "field_wbddh_data_type" = "Microdata", "field_wbddh_languages_supported" = "English", "field_frequency" = "Periodicity not specified", "field_wbddh_economy_coverage" = "Blend", "field_wbddh_country" = c("Afghanistan","Albania","Algeria"), "field_license_wbddh" = "Custom License") json_dataset <- create_json_dataset(values = metadata)
Many of the fields here (Ex: field_topic
, field_wbddh_data_class
) take a controlled list of values as input. To view the fields that take a controlled list of values, use get_lov_fields()
.
get_lov_fields()
You can view the valid list of values for these controlled fields using get_lovs()
. The list_value_name
column in the dataframe shows the valid values:
lovs <- get_lovs() head(lovs)
Add the dataset to the Data Catalog by passing the formatted JSON to create_dataset()
.
resp_dataset <- create_dataset(body = json_dataset)
create_dataset()
prints the ID number (node ID) of the newly created dataset. You can view the metadata using get_metadata()
.
resp <- get_metadata(resp_dataset$nid) resp
Once the dataset is created, you can add the data files, links and supporting documents as resources. Creating a resource is very similar to creating a dataset:
create_json_body()
to generate the required JSON bodycreate_resource()
.resource_metadata <- c("title" = "My geospatial file", "field_wbddh_data_class" = "Public", "field_wbddh_resource_type" = "Download") json_resource <- create_json_resource(values = resource_metadata, dataset_nid = resp_dataset$nid) resp_resource <- create_resource(body = json_resource)
If your resource is a local file, you can upload it to the Data Catalog using attach_file_to_resource()
.
attach_file_to_resource(resource_nid = resp_resource$nid, file_path = "./path/to/your_file.zip")
If your resource is a link to an external page, you can add the link as part of the resource metadata.
resource_metadata <- c("title" = "Small Area Estimation of Poverty", "field_wbddh_data_class" = "Public", "field_wbddh_resource_type" = "Related Material", "field_link_api" = "http://www.nsb.gov.bt/publication/files/pub9zo1561nu.pdf") json_resource <- create_json_resource(values = resource_metadata, dataset_nid = "111") resp_resource <- create_resource(body = json_resource)
To update content in the Data Catalog, you need to know the node ID of the dataset you wish you edit. To obtain the node id, you can search the catalog by the title of the dataset. For example, to get the node id of the "Test Poverty Map" dataset that we just created, you can perform a search using search_catalog()
.
search_results <- search_catalog(fields = c("nid", "title"), filters = c("title" = "My first dataset")) poverty_map_nid <- search_results[[1]]$nid
The process of updating a dataset is similar to the process of adding a new dataset. Create a named vector with the updated metadata fields and the correspodning new values. Format the metadata using create_json_body()
and pass it to update_dataset()
to update the dataset.
update_dataset_metadata <- c("field_wbddh_country" = "Colombia", "field_tags" = "Poverty", "workflow_status" = "published") update_dataset_json <- create_json_body(values = update_dataset_metadata) update_dataset(nid = poverty_map_nid, body = update_dataset_json)
Note that you need to set the workflow_status to "published" every time you update the node.
Use the same process as above to update the resource metadata.
Get the resource node id
Format the new metadata using create_json_body()
* Update the resource using update_resource()
.
update_resource_metadata <- c("field_format" = "CSV") update_resource_json <- create_json_resource(values = update_resource_metadata, dataset_nid = poverty_map_nid) resource_nid <- get_resource_nids(poverty_map_nid) update_resource(nid = resource_nid, body = resource_json)
Once you're done using the API, use the logout_ddh()
function to end the connection:
logout_ddh()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.