The goal of {tidyndr} is to provide a specialized, simple and easy to
use functions that wrap around existing functions in R
for
manipulation of the NDR patient
line-list file allowing the user to focus on the tasks to be completed
rather than the code/formula details.
The functions presented are similar to the PEPFAR MER indicators and are currently grouped into four categories:
The read_ndr
function for reading the patient-level line-list
downloaded from the front-end of the NDR in ‘csv’ format.
The PEPFAR treatment group of indicators that can be performed on the NDR line-list.
The ‘Viral Load’ indicators (tx_vl_eligible()
, tx_pvls_den()
tx_pvls_num()
and tx_vl_unsuppressed()
).
The summary functions (summarise_ndr()
and disaggregrate()
)
provides a tabular summary for the tasks that have been completed
using any of the functions above.
You can install the released version of tidyndr from CRAN with:
install.packages("tidyndr")
Or the development version from GitHub with:
# install.packages("devtools")
devtools::install_github("stephenbalogun/tidyndr",
build_vignette = TRUE)
library(tidyndr)
#> Attaching package: 'tidyndr'
#> A package for analysis of the front-end patient-level data from the Nigeria National Data Repository.
read_ndr()
reads the downloaded “.csv” file into
R
using
vroom::vroom()
behind the scene and
passing appropriate column types to the col_types
argument. It also
formats the variable names using the
snakecase
style.
## read from a local file path (not run)
# file_path <- system.file("extdata", "ndr_example.csv", package = "tidyndr")
# read_ndr(file_path, time_stamp = "2021-02-15")
### read line-list available on the internet
path <- "https://raw.githubusercontent.com/stephenbalogun/example_files/main/ndr_example.csv"
ndr_example <- read_ndr(path, time_stamp = "2021-02-20")
#>
#> Three new variables created:
#> [1] `date_lost`
#> [2] `appointment_date
#> [2] `current_status
The functions included in this group are:
tx_new()
tx_curr()
tx_ml()
and tx_ml_outcomes()
tx_rtt()
Other supporting functions are: tx_mmd()
, tx_regimen()
and
tx_appointment()
## Subset "TX_NEW"
tx_new(ndr_example, from = "2021-07-01", to = "2021-09-30")
#> Warning: One or more parsing issues, see `problems()` for details
#> # A tibble: 0 x 52
#> # ... with 52 variables: ip <fct>, state <fct>, lga <fct>, facility <fct>,
#> # datim_code <fct>, sex <fct>, patient_identifier <chr>,
#> # hospital_number <chr>, date_of_birth <date>, age_at_art_initiation <dbl>,
#> # current_age <dbl>, art_start_date <date>, art_start_date_source <fct>,
#> # last_drug_pickup_date <date>, last_drug_pickup_date_q1 <date>,
#> # last_drug_pickup_date_q2 <date>, last_drug_pickup_date_q3 <date>,
#> # last_drug_pickup_date_q4 <date>, last_regimen <fct>, ...
## Generate line-list of clients with medication refill in October 2021
ndr_example %>%
tx_appointment(from = "2021-10-01",
to = "2021-10-31"
)
#> # A tibble: 0 x 52
#> # ... with 52 variables: ip <fct>, state <fct>, lga <fct>, facility <fct>,
#> # datim_code <fct>, sex <fct>, patient_identifier <chr>,
#> # hospital_number <chr>, date_of_birth <date>, age_at_art_initiation <dbl>,
#> # current_age <dbl>, art_start_date <date>, art_start_date_source <fct>,
#> # last_drug_pickup_date <date>, last_drug_pickup_date_q1 <date>,
#> # last_drug_pickup_date_q2 <date>, last_drug_pickup_date_q3 <date>,
#> # last_drug_pickup_date_q4 <date>, last_regimen <fct>, ...
## Generate list of clients who were active at the beginning of October 2021 but became inactive at the end of December 2021.
tx_ml(new_data = ndr_example,
from = "2021-10-01",
to = "2021-12-31")
#> # A tibble: 0 x 52
#> # ... with 52 variables: ip <fct>, state <fct>, lga <fct>, facility <fct>,
#> # datim_code <fct>, sex <fct>, patient_identifier <chr>,
#> # hospital_number <chr>, date_of_birth <date>, age_at_art_initiation <dbl>,
#> # current_age <dbl>, art_start_date <date>, art_start_date_source <fct>,
#> # last_drug_pickup_date <date>, last_drug_pickup_date_q1 <date>,
#> # last_drug_pickup_date_q2 <date>, last_drug_pickup_date_q3 <date>,
#> # last_drug_pickup_date_q4 <date>, last_regimen <fct>, ...
The tx_vl_eligible()
, tx_pvls_den()
and the tx_pvls_num()
functions come in handy when you need to generate the line-list of
clients who are eligible for viral load test at a given point for a
given facility/state, those who have a valid viral load result (not more
than 1 year for people aged 20 years and above and not more than 6
months for paediatrics and adolescents less or equal to 19 years), and
those who are virally suppressed (out of those with valid viral load
results). When the sample = TRUE
attribute is supplied to the
tx_vl_eligible()
function, it generates the line-list of only those
who are due for a viral load test out of all those who are eligible.
## Generate list of clients who are eligible for VL (i.e. expected to have a documented VL result)
ndr_example %>%
tx_vl_eligible(ref = "2021-12-31")
#> # A tibble: 27,020 x 52
#> ip state lga facility datim_code sex patient_identif~ hospital_number
#> <fct> <fct> <fct> <fct> <fct> <fct> <chr> <chr>
#> 1 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ M State 3001 0001
#> 2 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ F State 1002 0001
#> 3 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ F State 3003 0001
#> 4 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ M State 1003 0002
#> 5 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ F State 2004 0002
#> 6 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ F State 3005 0001
#> 7 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ F State 2005 0001
#> 8 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ M State 1004 0003
#> 9 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ F State 3007 0002
#> 10 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ F State 1005 0004
#> # ... with 27,010 more rows, and 44 more variables: date_of_birth <date>,
#> # age_at_art_initiation <dbl>, current_age <dbl>, art_start_date <date>,
#> # art_start_date_source <fct>, last_drug_pickup_date <date>,
#> # last_drug_pickup_date_q1 <date>, last_drug_pickup_date_q2 <date>,
#> # last_drug_pickup_date_q3 <date>, last_drug_pickup_date_q4 <date>,
#> # last_regimen <fct>, last_clinic_visit_date <date>,
#> # days_of_arv_refill <dbl>, pregnancy_status <fct>, ...
## Generate list of clients that will be expected to have a viral load test done by March 2022
ndr_example %>%
tx_vl_eligible("2022-03-31",
sample = TRUE)
#> # A tibble: 27,020 x 52
#> ip state lga facility datim_code sex patient_identif~ hospital_number
#> <fct> <fct> <fct> <fct> <fct> <fct> <chr> <chr>
#> 1 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ M State 3001 0001
#> 2 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ F State 1002 0001
#> 3 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ F State 3003 0001
#> 4 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ M State 1003 0002
#> 5 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ F State 2004 0002
#> 6 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ F State 3005 0001
#> 7 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ F State 2005 0001
#> 8 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ M State 1004 0003
#> 9 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ F State 3007 0002
#> 10 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ F State 1005 0004
#> # ... with 27,010 more rows, and 44 more variables: date_of_birth <date>,
#> # age_at_art_initiation <dbl>, current_age <dbl>, art_start_date <date>,
#> # art_start_date_source <fct>, last_drug_pickup_date <date>,
#> # last_drug_pickup_date_q1 <date>, last_drug_pickup_date_q2 <date>,
#> # last_drug_pickup_date_q3 <date>, last_drug_pickup_date_q4 <date>,
#> # last_regimen <fct>, last_clinic_visit_date <date>,
#> # days_of_arv_refill <dbl>, pregnancy_status <fct>, ...
### Calculate the Viral Load Coverage as of December 2021
no_of_vl_results <- tx_pvls_den(ndr_example,
ref = "2021-12-31") %>%
nrow()
no_of_vl_eligible <- tx_vl_eligible(ndr_example,
ref = "2021-12-31") %>%
nrow()
vl_coverage <- scales::percent(no_of_vl_results / no_of_vl_eligible)
print(vl_coverage)
#> [1] "2%"
For all the ‘Treatment’ and ‘Viral Suppression’ indicators (except
tx_ml_outcomes()
, which should be use with tx_ml()
), you have
control over the level of action (state or facility) by supplying to the
states
and/or facilities
arguments the values of interest . For more
than one state or facility, combine the values with the c()
e.g.
## subset clients that have medication appointment in between January and March of 2021 in
## and are also due for viral load
ndr_example %>%
tx_appointment(from = "2022-01-01",
to = "2022-03-31",
) %>%
tx_vl_eligible(sample = TRUE)
#> # A tibble: 0 x 52
#> # ... with 52 variables: ip <fct>, state <fct>, lga <fct>, facility <fct>,
#> # datim_code <fct>, sex <fct>, patient_identifier <chr>,
#> # hospital_number <chr>, date_of_birth <date>, age_at_art_initiation <dbl>,
#> # current_age <dbl>, art_start_date <date>, art_start_date_source <fct>,
#> # last_drug_pickup_date <date>, last_drug_pickup_date_q1 <date>,
#> # last_drug_pickup_date_q2 <date>, last_drug_pickup_date_q3 <date>,
#> # last_drug_pickup_date_q4 <date>, last_regimen <fct>, ...
You might want to generate a summary table of all the indicators you
have pulled out. The summarise_ndr()
(or summarize_ndr()
) allows you
to do this with ease. It accepts all the line-lists you are interested
in creating a summary table for, the level at which you want the summary
to be created (country/ip, state or facility), and the names you want to
give to each of your summary column.
## generates line-list of TX_NEW between July and December 2021
new <- tx_new(ndr_example, from = "2021-07-01", to = "202112-31")
## generates line-list of currently active clients
curr <- tx_curr(ndr_example)
## generates line-list of clients who were active at the beginning of the October but inactive at end of December 2021
ml <- tx_ml(new_data = ndr_example, from = "2021-10-01", to = "2021-12-31")
summarise_ndr(new, curr, ml,
level = "state",
names = c("tx_new", "tx_curr", "tx_ml"))
#> # A tibble: 4 x 5
#> ip state tx_new tx_curr tx_ml
#> <chr> <chr> <int> <int> <int>
#> 1 Total - 0 27020 0
#> 2 IP_name State 1 0 5645 0
#> 3 IP_name State 2 0 7929 0
#> 4 IP_name State 3 0 13446 0
The disaggregate()
allows you to summarise an indicator of interest
into finer details based on “current_age”, “sex” “pregnancy_status”,
“art_duration”, “months_dispensed (of ARV)” or “age_sex”. These are
supplied to the by
parameter of the function. The default
disaggregates the variable of interest at the level of “states” but can
also do this at “country/ip”, “lga” or “facility” level when any of this
is supplied to the level
parameter.
## generates line-list of TX_NEW between July and September 2021
new_clients <- tx_new(ndr_example, from = "2021-07-01", to = "2021-09-30")
disaggregate(new_clients,
by = "current_age", pivot_wide = FALSE)
#> # A tibble: 65 x 4
#> ip state current_age number
#> <chr> <chr> <chr> <int>
#> 1 IP_name State 1 <1 0
#> 2 IP_name State 1 1-4 0
#> 3 IP_name State 1 5-9 0
#> 4 IP_name State 1 10-14 0
#> 5 IP_name State 1 15-19 0
#> 6 IP_name State 1 20-24 0
#> 7 IP_name State 1 25-29 0
#> 8 IP_name State 1 30-34 0
#> 9 IP_name State 1 35-39 0
#> 10 IP_name State 1 40-44 0
#> # ... with 55 more rows
## disaggregate 'TX_CURR' by sex
ndr_example %>%
tx_curr() %>%
disaggregate(by = "sex")
#> # A tibble: 5 x 5
#> ip state Male Female unknown
#> <chr> <chr> <int> <int> <int>
#> 1 IP_name "State 1" 1661 3984 0
#> 2 IP_name "State 2" 2335 5594 0
#> 3 IP_name "State 3" 5894 7552 0
#> 4 IP_name "" 0 0 0
#> 5 Total "-" 9890 17130 0
Please note that the tidyndr project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.