knitr::opts_chunk$set( collapse = TRUE, warning = FALSE, message = FALSE, eval = FALSE, fig.width = 10, fig.path = "man/figures", comment = "#> " )
ridl
is an R client for the UNHCR Raw Internal Data Library (RIDL) platform.
The UNHCR RIDL platform is UNHCR internal platform to easily store, find and analyze raw data.
In order to easily use the ridl
package, it's important to understand some key concepts of this platform. RIDL documentation is available here for more details.
Container
A container
is a placeholder where we can share data on RIDL
.
A container
is represented in the ridl
package as a RIDLContainer
object and can hold zero or multiple datasets
.
Most functions are prefixed by ridl_container
or rc
ridl_container_show
or rc_show
ridl_container_list
or rc_list
Dataset
A dataset
is a placeholder where we can share data files (resources
). In a dataset page there's some metadata that give you enough context and information to properly store the data files and use them. A data file e.g an Excel file is called a resource
and many of them can be shared in dataset
page. In the ridl
package, a RIDLDataset
object is used to represent a dataset
.
Most functions are prefixed by ridl_dataset
or rd
:
ridl_dataset_show
or rd_show
ridl_dataset_list
or rd_list
ridl_dataset_exist
or rd_exist
ridl_datasets_search
or rd_search
Container
object in which you have the dataset: ridl_datasets_container_get
or rd_container_get
Resource
A resource
is a file shared in dataset
page, it includes microdata
and supporting documents like reports or survey forms. A RIDLResource
class is used to implement all the logic needed to manipulate RIDL resource
.
Most functions are prefixed by ridl_resource
or rr
ridl_resource_show
or rr_show
This package is not on yet on CRAN and to install it, you will need the remotes
package.
You can get ridl
from Gitlab or Github (mirror)
## install.packages("remotes") remotes::install_gitlab("dickoa/ridl")
library("ridl")
The ridl
package requires you to add your API token and store it for further use. It is preferred option, you no longer need to use the API key.
In order to have an API token, you need to generate one, by going to the following this URL: "ridl-server-url/user/@your-user-name/api-tokens".
You give it a name, and generate a token. Once generated, you can store it in your .Renviron
file which is automatically read by R on startup.
You can either edit directly the .Renviron
file or access it by calling usethis::edit_r_environ()
(assuming you have the usethis
package installed) and entering:
```{bash, eval=FALSE, engine="sh"} RIDL_API_TOKEN=xxxxxxxxxxxxxxxxxx
Once the environment variable is set you will need to restart your session. ```r library("ridl") ridl_config_get() ## <RIDL Configuration> ## RIDL site url: https://ridl.unhcr.org ## RIDL API token: xxxxxxxxxxxxxxxxxx
If you plan to use RIDL testing environment (https://ridl-uat.unhcr.org), you'll need to also setup the RIDL_UAT_API_TOKEN
variable.
```{bash, eval=FALSE, engine="sh"} RIDL_UAT_API_TOKEN=xxxxxxxxxxxxxxxxxx
You can also configure directly the `ridl` package using the `ridl_config_setup` function and check the config using `ridl_config_get` but it's not persistent if you close your session. ```r ridl_config_setup(site = "test", token = "xxxxxxxxxxxxxxxxxx") ridl_config_get() ## <RIDL Configuration> ## RIDL site: https://ridl-uat.unhcr.org/ ## RIDL API token: xxxxxxxxxxxxxxxxxx
Now that we are connected to RIDL, we can search for dataset using ridl_dataset_search
.
ridl_config_setup(site = "prod") ridl_dataset_search("mali", visibility = "public", rows = 2) ## search internally public dataset in RIDL, limit the results to two rows ## [[1]] ## <RIDL Dataset> 6f37029d-0ec2-4322-88ed-6447b2eebf3a ## Title: Socio-economic assessment of Malian refugees in Burkina Faso 2016 ## Name: unhcr-bfa-2016-sea-1-1 ## Visibility: public ## Resources (up to 5): DDI XML, DDI RDF, UNHCR_BFA_2016_SEA_household_v1_1, UNHCR_BFA_2016_SEA_individual_v1_1, UNHCR_BFA_2016_final report ## [[2]] ## <RIDL Dataset> 59573073-aef6-42c1-a9db-efae3f95051c ## Title: Socio-economic assessment of refugees in Mauritania's Mberra camp 2017 ## Name: unhcr-mrt-2017-sea-1-1 ## Visibility: public ## Resources (up to 5): DDI XML, DDI RDF, UNHCR_MRT_2017_SEA_household_v1_1, UNHCR_MRT_2017_SEA_individual_v1_1, UNHCR_MRT_2017_SEA_questionnaire ## attr(,"class") ## [1] "ridl_datasets_list"
We can select a particular dataset
from the list (a ridl_dataset_list
is a list) of dataset
using R
function to access elements from list (e.g [[
). In this example, we can use either purrr::pluck
or dplyr::nth
since they both play well with the pipe operator %>%
. Once the dataset selected, it's possible to list all its resource
objects using ridl_resource_list
.
library(tidyverse) ridl_dataset_search("mali", visibility = "public", rows = 2) |> nth(1) |> ridl_resource_list(format = "stata") ## <RIDL Resource> 026f9547-d7b2-4ec3-bbaa-5096837b1f01 ## Name: UNHCR_BFA_2016_SEA_household_v1_1 ## Description: BFA SEA household level data ## Type: microdata ## Size: 1278720 ## Format: Stata ## [[2]] ## <RIDL Resource> 30ab9f7a-9b84-4695-88ba-7504a4aed9e2 ## Name: UNHCR_BFA_2016_SEA_individual_v1_1 ## Description: BFA SEA individual data ## Type: microdata ## Size: 143744 ## Format: Stata ## attr(,"class") ## [1] "ridl_resource_list"
A ridl_resource_list
is a simple R
list
and can be manipulated using purrr::pluck
or dplyr::nth
to select the one you want to read
into your R
session or download
.
library(tidyverse) ridl_dataset_search("mali", visibility = "public", rows = 2) |> nth(1) |> ridl_dataset_resource_get_all(format = "stata") |> nth(1) |> ridl_resource_read() ## + # A tibble: 1,690 x 459 ## hhid q002a q006 q008 q102 q113 q200 q201 ## <dbl> <dbl+l> <dbl+l> <dbl+l> <dbl> <dbl> <dbl+l> <dbl+l> ## 1 10004 1 [Cam… 1 [Oui] 1 [Pré… 3 0 2 [Dou… 1 [For… ## 2 10008 1 [Cam… 1 [Oui] 1 [Pré… 3 1 2 [Dou… 1 [For… ## 3 10012 1 [Cam… 1 [Oui] 1 [Pré… 7 1 2 [Dou… 1 [For… ## 4 10016 1 [Cam… 1 [Oui] 1 [Pré… 2 1 2 [Dou… 1 [For… ## 5 10020 1 [Cam… 1 [Oui] 1 [Pré… 6 1 2 [Dou… 1 [For… ## 6 10024 1 [Cam… 1 [Oui] 1 [Pré… 3 1 2 [Dou… 1 [For… ## 7 10028 1 [Cam… 1 [Oui] 1 [Pré… 5 1 2 [Dou… 1 [For… ## 8 10032 1 [Cam… 1 [Oui] 1 [Pré… 7 1 2 [Dou… 1 [For… ## 9 10036 1 [Cam… 1 [Oui] 1 [Pré… 4 3 2 [Dou… 1 [For… ## 10 10040 1 [Cam… 1 [Oui] 1 [Pré… 2 1 2 [Dou… 1 [For… ## # … with 1,680 more rows, and 451 more variables: ## # q202 <dbl+lbl>, q203 <dbl>, q204 <dbl+lbl>, q205 <dbl+lbl>, ## # q206_1 <dbl+lbl>, q206_2 <dbl+lbl>, q206_3 <dbl+lbl>, ## # q206_4 <dbl+lbl>, q206_5 <dbl+lbl>, q206_6 <dbl+lbl>, ## # q207 <dbl+lbl>, q208 <dbl+lbl>, q209 <dbl+lbl>, q210 <dbl>, ## # q211 <dbl+lbl>, q21201 <dbl+lbl>, q21202 <dbl+lbl>, ## # q21203 <dbl+lbl>, q213 <dbl+lbl>, q214 <dbl>, ## # q215 <dbl+lbl>, q216 <dbl>, q217 <dbl+lbl>, q218 <dbl>, ## # q219 <dbl+lbl>, q220 <dbl+lbl>, q221 <dbl+lbl>, ## # q222 <dbl+lbl>, q223 <dbl+lbl>, q224 <dbl+lbl>, q225 <dbl>, ## # q226 <dbl+lbl>, q227 <dbl>, q22801 <dbl+lbl>, ## # q22802 <dbl+lbl>, q22803 <dbl+lbl>, q22804 <dbl+lbl>, ## # q22805 <dbl+lbl>, q22806 <dbl+lbl>, q22807 <dbl+lbl>, ## # q22808 <dbl+lbl>, q22809 <dbl+lbl>, q22810 <dbl+lbl>, ## # q22811 <dbl+lbl>, q229 <dbl+lbl>, q230 <dbl>, ## # q231 <dbl+lbl>, q232 <dbl>, q23301 <dbl+lbl>, ## # q23302 <dbl+lbl>, q23303 <dbl+lbl>, q23304 <dbl+lbl>, ## # q23305 <dbl+lbl>, q23306 <dbl+lbl>, q23307 <dbl+lbl>, ## # q23308 <dbl+lbl>, q23309 <dbl+lbl>, q23310 <dbl+lbl>, ## # q23311 <dbl+lbl>, q234 <dbl+lbl>, q23501 <dbl+lbl>, ## # q23502 <dbl+lbl>, q23503 <dbl+lbl>, q23504 <dbl+lbl>, ## # q23505 <dbl+lbl>, q23506 <dbl+lbl>, q23507 <dbl+lbl>, ## # q23508 <dbl+lbl>, q23509 <dbl+lbl>, q23510 <dbl+lbl>, ## # q23511 <dbl+lbl>, q23512 <dbl+lbl>, q23513 <dbl+lbl>, ## # q23514 <dbl+lbl>, q23515 <dbl+lbl>, q23516 <dbl+lbl>, ## # q23517 <dbl+lbl>, q23518 <dbl+lbl>, q23601 <dbl+lbl>, ## # q23602 <dbl+lbl>, q23603 <dbl+lbl>, q23604 <dbl+lbl>, ## # q23605 <dbl+lbl>, q23606 <dbl+lbl>, q23607 <dbl+lbl>, ## # q23608 <dbl+lbl>, q23609 <dbl+lbl>, q23610 <dbl+lbl>, ## # q23611 <dbl+lbl>, q23612 <dbl+lbl>, q23613 <dbl+lbl>, ## # q23614 <dbl+lbl>, q237 <dbl+lbl>, q238 <dbl+lbl>, ## # q23901 <dbl+lbl>, q23902 <dbl+lbl>, q23903 <dbl+lbl>, ## # q23904 <dbl+lbl>, q23909 <dbl+lbl>, q240 <dbl+lbl>, …
read
will not work with all resources in RIDL, so far the following format are supported: csv
, xlsx
, xls
, dta
(Stata
).
I will consider adding more data types in the future, feel free to file an issue if it doesn't work as expected or you want to add a support for a new format.
For Excel files (xlsx
and xls
), you can also use get_sheets
to list available sheets and use the sheet
paramater in read
to specify the sheet you want to read (default is to read the first sheet).
We can also use ridl_dataset_show
to directly read and access a dataset object.
dataset_name <- "official-cross-border-figures-of-venezuelan-individuals" rd_show(dataset_name) |> rd_resource_get_all() |> nth(1) |> rr_read() ## + Reading sheet: VEN_Official Borders Figures ## # A tibble: 1,314 x 5 ## Country `Mov Type` `Border Point` Month_Year Total_individua… ## <chr> <chr> <chr> <chr> <dbl> ## 1 Ecuador Entry from… Aeropuerto In… January-20 0 ## 2 Ecuador Entry from… Aeropuerto In… February-… 1 ## 3 Ecuador Entry from… Aeropuerto In… March-20 0 ## 4 Ecuador Entry from… Aeropuerto In… April-20 0 ## 5 Ecuador Entry from… Aeropuerto In… May-20 0 ## 6 Ecuador Entry from… Aeropuerto In… June-20 2 ## 7 Ecuador Entry from… Aeropuerto In… July-20 2 ## 8 Ecuador Entry from… Aeropuerto In… August-20 2 ## 9 Ecuador Entry from… Aeropuerto In… September… NA ## 10 Ecuador Entry from… Aeropuerto In… January-20 0 # … with 1,304 more rows
If you know the id of a RIDL Resource
object you can also use directly ridl_resource_show
to access directly the desired resource.
rd_show(dataset_name) |> rd_resource_get_all() |> nth(1) ## + <RIDL Resource> 68e39d44-88ae-49f9-b492-3635341c92be ## Name: VEN_OfficialFiguresBorders ## Description: Compilation of official figures on Venezuelan population per month per entry-exit point. ## Type: microdata ## Size: 39998 ## Format: XLSX ridl_resource_show("68e39d44-88ae-49f9-b492-3635341c92be") |> ridl_resource_read() ## + Reading sheet: VEN_Official Borders Figures ## # A tibble: 1,314 x 5 ## Country `Mov Type` `Border Point` Month_Year Total_individua… ## <chr> <chr> <chr> <chr> <dbl> ## 1 Ecuador Entry from… Aeropuerto In… January-20 0 ## 2 Ecuador Entry from… Aeropuerto In… February-… 1 ## 3 Ecuador Entry from… Aeropuerto In… March-20 0 ## 4 Ecuador Entry from… Aeropuerto In… April-20 0 ## 5 Ecuador Entry from… Aeropuerto In… May-20 0 ## 6 Ecuador Entry from… Aeropuerto In… June-20 2 ## 7 Ecuador Entry from… Aeropuerto In… July-20 2 ## 8 Ecuador Entry from… Aeropuerto In… August-20 2 ## 9 Ecuador Entry from… Aeropuerto In… September… NA ## 10 Ecuador Entry from… Aeropuerto In… January-20 0 # … with 1,304 more rows
ct <- ridl_container_list(sort = "package_count") head(ct) ## [1] "ethiopia-sens" "data-deposit" "kenya-sens" ## [4] "afghanistan" "bangladesh-sens" "south-sudan-sens" grep("niger-", ct, ignore.case = TRUE, value = TRUE) ## [1] "niger-protection" "niger-sens" ridl_container_show("niger-protection") ## <RIDL Container> d341942e-547e-404b-bcdf-c72b2cd85530 ## Name: niger-protection ## Display name: Niger: Protection ## No. Datasets: 5 ## No. Members: 3 ridl_container_show("niger-protection") |> ridl_dataset_list() ## [1] "enrolement-pdi-tillaberi-tillaberi-niger-2020" ## [2] "identify-asylum-seekers-in-migration-flow-agadez-niger-2018-2019-2020" ## [3] "monitoring-the-migration-flow-1-agadez-niger-2019-2020" ## [4] "enrolement-pdi-tahoua-aout-2020-tahoua-niger-2020" ## [5] "enrolement-pdi-maradi-maradi-niger-2020"
It's possible to create a RIDLDataset
object we can manipulate and upload to the RIDL platform.
ridl_dataset(name = "test-dataset-pen", title = "Test Dataset PEN", notes = "Some description", owner_org = "africa", data_collector = "unhcr", keywords = list(3, 4), unit_of_measurement = "kg", data_collection_technique = "f2f", archived = FALSE, visibility = "restricted", external_access_level = "data_enclave") ## <RIDL Dataset> ## Title: Test Dataset PEN ## Name: test-dataset-pen ## Visibility: restricted ## Container: Africa ## Resources (up to 5):
ds <- ridl_dataset(name = "test-dataset", title = "Test Dataset", notes = "An example dataset", owner_org = "west-africa", data_collector = "ACF, UNHCR", keywords = list(3, 4), unit_of_measurement = "individual", data_collection_technique = "f2f", sampling_procedure = "nonprobability", operational_purpose_of_data = "cartography", archived = "False", visibility = "restricted", external_access_level = "open_access") ds ## <RIDL Dataset> ## Title: Test Dataset ## Name: test-dataset ## Visibility: public ## Resources (up to 5):
ridl_resource
can also be used to create a RIDLResource
.
rs <- ridl_resource(name = "Test resource", type = "data", format = "CSV", file_type = "microdata", identifiability = "anonymized_public", date_range_start = "2018-01-01", date_range_end = "2019-01-01", process_status = "anonymized", visibility = "public", version = 1L) rs ## <RIDL Resource> ## Name: Test resource ## Description: ## Type: microdata ## Size: ## Format: CSV
We can add the resource to the dataset and upload it to the RIDL platform.
ds |> ridl_dataset_resource_add(rs) ds ## <RIDL Dataset> ## Title: Test Dataset ## Name: test-dataset ## Visibility: restricted ## Resources (up to 5): Test resource
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.