knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
# Install the latest CRAN release install.packages("diyar") # Or, install the development version from GitHub install.packages("devtools") devtools::install_github("OlisaNsonwu/diyar")
diyar
is an R
package for linking records with shared characteristics.
The linked records represent an entity, which depending on the context of the analysis can be unique patients,
infection episodes, overlapping periods of care, clusters or other occurrences as defined by a case definition. This makes it useful in ordinarily complex analyses such as record linkage,
contact or network analyses e.t.c.
The main functions are links()
, episodes()
and partitions()
.
They are flexible in regards to how they compare records, as well as what are considered matches.
Their functionality can sometimes overlap however, each is better suited to particular use cases:
links()
- link records with no relevance to an index record. For example, deterministic record linkageepisodes()
- link records in relation to an index record. For example, contact and network analysis. partitions()
- link records in relation to a fixed interval.Key features;
library(diyar) data(missing_staff_id) dfr_stages <- missing_staff_id[c("age", "hair_colour", "branch_office")] priority_order_1 <- c("hair_colour", "branch_office") priority_order_2 <- c("branch_office", "hair_colour") dfr_stages$id.1 <- links(criteria = as.list(dfr_stages[priority_order_1])) dfr_stages$id.2 <- links(criteria = as.list(dfr_stages[priority_order_2]))
sub_criteria()
.sub.cri.1 <- sub_criteria( hair.color = dfr_stages$hair_colour, age = dfr_stages$age, match_funcs = c( "exact" = exact_match, "age.range" = range_match) ) last_word_wf <- function(x) tolower(gsub("^.* ", "", x)) last_word_cmp <- function(x, y) last_word_wf(x) == last_word_wf(y) not_equal <- function(x, y) x != y sub.cri.2 <- sub_criteria( dfr_stages$branch_office, dfr_stages$age, match_funcs = c( "last.word" = last_word_cmp, "not.equal" = not_equal) ) sub.cri.3 <- sub_criteria(sub.cri.1, sub.cri.2, operator = "and") sub.cri.1 sub.cri.2 sub.cri.3 dfr_stages$id.3 <- links( criteria = "place_holder", sub_criteria = list("cr1" = sub.cri.3) ) dfr_stages
There are variations of links()
like links_wf_probabilistic()
and links_af_probabilistic()
for specific use cases such as probabilistic record linkage.
Key features;
dfr_2 <- data.frame(date = as.Date("2020-01-01") + c(1:5, 10:15, 20:25)) dfr_2$id.1 <- episodes( date = dfr_2$date, case_length = 2, episodes_max = 1)
dfr_2$pref <- c(rep(2, 8), 1, rep(2, 8)) dfr_2$id.2 <- episodes( date = dfr_2$date, case_length = number_line(-2, 2), episodes_max = 1, custom_sort = dfr_2$pref)
dfr_2$id.3 <- episodes( date = dfr_2$date, case_length = number_line(-2, 2), episode_type = "rolling", recurrence_length = 1, episodes_max = 1, rolls_max = 1)
dfr_2$period <- number_line(dfr_2$date, dfr_2$date + 5) dfr_2$id.4 <- episodes( date = dfr_2$period, case_length = index_window(dfr_2$period), episodes_max = 1) dfr_2
There are variations of episodes()
like episodes_wf_splits()
for specific use cases such as more efficient handling of duplicate records.
Key features;
dfr_3 <- dfr_2["date"] dfr_3$id.1 <- partitions( date = dfr_3$date, window = number_line(as.Date(c("2020-01-10", "2020-01-17")), as.Date(c("2020-01-12", "2020-01-24"))) )
dfr_3$id.2 <- partitions(date = dfr_3$date, by = 3, separate = TRUE) dfr_3$id.3 <- partitions(date = dfr_3$date, length.out = 3, separate = TRUE) dfr_3
Find out more!
vignette("number_line")
vignette("episodes")
vignette("links")
vignette("panes")
Please report any bug or issues with using this package here.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.