Relation Types and Restrictions

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

This vignette explains how the relation_type and restrictions arguments work to ensure predictable mappings and check for correct relations between variables in your data. Examples also demonstrate how and when to use the arguments atomic, heterogenous_outputs, handle_duplicate_mappings, report_properties, and map_error_response.

Restrictions

It is sometimes important to ensure mappings have certain properties. For example, in a database, you will likely find it necessary that every possible user ID maps to no more than one username. relatable functions can be used to enforce this restriction with either relation_type (explained below), or a list of restrictions, which are used if relation_type = NULL. The following restrictions can be applied:

Thus for our ID to username mapping, we might want to create a function like this:

library(relatable)
valid_ids <- 10:99 # All possible ID numbers
usernames <- # List of usernames in order of entry
  c("Leonardo", "Michelangelo", "Raphael", "Donatello") 

get_user_from_id <- relation(
  A = valid_ids,
  B = usernames,
  default = "No username found",
  relation_type = NULL,
  restrictions = list(max_one_y_per_x = TRUE, max_one_x_per_y = TRUE),
  map_error_response = "throw"  # If restrictions are violated, return an error instead
)                               # of a warning

get_user_from_id(11)

Now, if the function is later updated to include a username that is already taken, an error will be thrown.

Relation types

Relation types translate to predetermined sets of restrictions, and are applied with a short string rather than a list. The following relation types are possible:

| |min_one_y_per_x |min_one_x_per_y |max_one_y_per_x |max_one_x_per_y | |:------------|:---------------|:---------------|:---------------|:---------------| |one_to_one |FALSE |FALSE |TRUE |TRUE | |many_to_many |FALSE |FALSE |FALSE |FALSE | |one_to_many |FALSE |FALSE |FALSE |TRUE | |many_to_one |FALSE |FALSE |TRUE |FALSE | |func |TRUE |FALSE |TRUE |FALSE | |injection |TRUE |FALSE |TRUE |TRUE | |surjection |TRUE |TRUE |TRUE |FALSE | |bijection |TRUE |TRUE |TRUE |TRUE |

Examples

Enforce properties of relations between vectors

By default, the relation_type is assumed to be a funtcion ("func"), meaning that each input maps to one and only one output. These restrictions can be tightened or loosened depending on your needs. Many-to-many relations have loose restrictions allowing multiple outputs from a single input:

## Authors have a many-to-many relation with books:
## a book can have multiple authors and authors can write multiple books
my_library <- tibble::tribble(
  ~author,              ~work,
  "Arendt",             "The Human Condition",
  "Austen-Smith",       "Social Choice and Voting Models",
  "Austen-Smith",       "Positive Political Theory",
  "Banks",              "Positive Political Theory",
  "Camus",              "The Myth of Sisyphus",
  "Camus",              "The Rebel",
  "Arendt",             "The Origins of Totalitarianism",
  "Dryzek",             "Theories of the Democratic State",
  "Dunleavy",           "Theories of the Democratic State"
)

relate(
  X = c("Arendt", "Austen-Smith", "Banks", "Dryzek", "Dunleavy"),
  A = my_library$author,
  B = my_library$work,
  atomic = FALSE, # relations with multiple outputs must return lists
  named = TRUE,
  relation_type = "many_to_many"
)

You may want exactly one unique output for each input (a bijection), but have duplicate mappings in your input vectors:

## Duplicate mappings usually return twice, but this can be changed...
relate(
  X = 1:3,
  A = c(1, 2, 2, 3, 4, 5),
  B = c('a', 'b', 'b', 'c', 'd', 'e'),
  relation_type = "many_to_many",
  atomic = FALSE
)

## Use relate or relation with handle_duplicate_mappings = TRUE to avoid errors resulting
## from duplicate mappings to and from the same inputs. Bijections ensure that each input
## has exactly one unique output.
relate(
  X = 1:3,
  A = c(1, 2, 2, 3, 4, 5),
  B = c('a', 'b', 'b', 'c', 'd', 'e'),
  relation_type = "bijection",
  handle_duplicate_mappings = TRUE
)

Determine relation properties for safer mappings

To illustrate some of the more advanced ways relatable can help ensure safer data manipulation, we will use the emperors data set of Ancient Roman Emperors assembled by github user Zoni Nation.

report_properties can give you useful information about the relation between two vectors:

emperors <- read.csv(
  "https://raw.githubusercontent.com/zonination/emperors/master/emperors.csv",
  stringsAsFactors = FALSE
)
colnames(emperors)

## Suppose we want a function to map each emperor to the time of their reign.
## First, let's see that a unique mapping from either name or name.full is possible by
## using relation's report properties argument:
relation(emperors$name.full, emperors$reign.start,
  relation_type = NULL,
  atomic = FALSE,
  report_properties = TRUE)

relation(emperors$name, emperors$reign.start,
  relation_type = NULL,
  atomic = FALSE,
  report_properties = TRUE)

## Neither mapping fulfils the criterion of max_one_y_per_x, but this is not a problem: in
## the later years of the Roman Empire, some emperors were co-rulers whose reigns began at
## the same time.
relate(c("0305-05-01", "0337-05-22"), emperors$reign.start, emperors$name,
  named = TRUE, relation_type = NULL, atomic = FALSE)

## However, we can infer from max_one_y_per_x = FALSE that some elements of name.full are
## non-unique. This is because both Vespasian and his eldest son and successor Titus took
## the same imperial title.
relate(c("Vespasian", "Titus"), emperors$name, emperors$name.full,
  named = TRUE)

## Hence we can determine that name and not name.full is a better choice for our mapping
## function.
reign_start <- relation(emperors$name, emperors$reign.start)
reign_start("Constantine the Great")

## Repeating the vector A can let us return multiple variables at once to return an n-tuple
nice_date <- function(s) {
  d <- as.Date(s, "%Y-%m-%d")
  return(format.Date(d, "%d %B, %Y AD"))
}

reign_duration <- relation(
  rep(emperors$name, 2),
  nice_date(c(emperors$reign.start, emperors$reign.end)),
  relation_type = NULL,
  atomic = FALSE, named = TRUE
)
reign_duration(c("Vespasian", "Titus", "Domitian"))

## Or just for fun...
obituary <- with(
  emperors,
  relation(
    A = rep(name, 3),
    B = c(
      paste0("Born in ", birth.cty, ", ", birth.prv, " on ", nice_date(birth)),
      paste0("Came to power by ", rise, " on ", nice_date(reign.start)),
      paste0("Died from ", cause, " by ", killer, " on ", nice_date(death))
    ),
    relation_type = NULL,
    atomic = FALSE, named = TRUE
  )
)

obituary(
  c("Marcus Aurelius", "Commodus", "Pertinax", "Didius Julianus", "Septimus Severus", "Caracalla")
)


Try the relatable package in your browser

Any scripts or data that you put into this service are public.

relatable documentation built on May 2, 2019, 8:30 a.m.