The teal.data
package provides a way to define primary keys for a dataset and establish relationships with other datasets.
Each dataset joining keys can be characterized by:
SQL
foreign keys.Typically, an application developer specifies these keys manually.
However, for datasets following the ADaM standard, teal.data
can automatically assign keys using the default_cdisc_join_keys
object.
Refer to the section "Joining Keys with ADaM Datasets" for details on using this object to select specific datasets.
join_keys
class in teal applicationsThe primary function of the join_keys
class in teal
applications is to facilitate the seamless merging of datasets using teal.transform
.
Additionally, it plays a role in the data filtering using the Filter Panel in a teal
application.
The filters applied to a (parent) dataset are also applied to their children.
join_keys
The join_keys
object contains information about the foreign/primary keys of multiple datasets.
Each key is represented by a pair of datasets (by name reference) and a named character vector that encodes the column name mapping between the two datasets.
In addition, a foreign key may also contain a parent-child attribute that is used in the "Filter Panel" as we mentioned above.
A new join keys can be created as an empty object, or by defining an initial set of primary and foreign keys.
That initial object can be extended by adding/modifying/removing keys and by establishing parent-child relationships between datasets.
join_keys(...)
: Join Keys Constructor / Getter / SetterConvenient function that is used both as the constructor and as the getter for join_keys objects.
As the Getter it is used to retrieve the joining keys that are contained in other objects, such as a teal_data
object.
As the Constructor it is used to specify a collection of multiple individual keys (via join_key
function described below).
join_key(dataset_1, dataset_2, key, parent)
: Single Join Key ConstructorSpecifies a primary key or a relationship between two datasets.
dataset_1
, dataset_2
: names of the datasets (if dataset_2
is the same as dataset_1
or is omitted, it creates a primary key);key
: named character
vector of column name mapping between datasets (unnamed vector assumes column names between datasets are the same);directed
(optional): flag that indicates whether there is a parent-child directed relationship between dataset_2
to dataset_1
(latter as a parent).Note that join keys are assumed to be symmetric, i.e., join_key("ds1", "ds2", c("ds1_col" = "ds2_col"))
establishes a relationship from "x" to "y" and vice versa.
By default, the new joining key will set the dataset_1
as the parent.
# nolint start: commented_code_linter.
library(teal.data) jk <- join_keys( join_key("ds1", keys = "col_1"), # ds1: [col_1] join_key("ds2", keys = c("col_1", "col_2")), # ds2: [col_1, col_2] join_key("ds3", keys = c("col_1", "col_3")), # ds3: [col_1, col_3] join_key("ds1", "ds2", keys = "col_1"), # ds1 <-- ds2 join_key("ds1", "ds3", keys = "col_1"), # ds1 <-- ds3 join_key("ds4", "ds5", keys = c("col_4" = "col_5"), directed = FALSE) # ds4 <--> ds5 ) # The parent-child relationships are created automatically (unless 'parent' parameter is "none") jk
# nolint end: commented_code_linter.
| Output of print(jk)
| Output annotation |
| ---------------------------------- |:----------------------------------------:|
| ## A join_keys object containing foreign keys between 3 datasets:
| Title |
| ## ds1: [col_1]
| Primary keys
\## <-- ds2: [col_1]
| **Foreign keys**
<--
denotes ds1
is the parent of ds2
) |
| ## <-- ds3: [col_1]
| |
| ## ds2: [col_1, col_2]
| |
| ## --> ds1: [col_1]
| arrow -->
denotes ds2
is a child of ds1
|
| ## --* (implicit via parent with): ds3
| Implicit relationship between ds2
& ds3
(given that they share common keys with same parent) |
| ## ds3: [col_1, col_3]
| |
| ## --> ds1: [col_1]
| |
| ## --* (implicit via parent with): ds2
| |
| ## ds4: [no primary keys]
| |
| ## <-> ds5: [col_5]
| Foreign keys
(arrow <->
denotes no parent definition between datasets)_ |
| ## ds5: [no primary keys]
| |
| ## <-> ds4: [col_4]
| |
The subset operator with 2 indices (x[i, j]
) is used to retrieve the primary/foreign keys.
Both indices must be a string denoting the dataset name.
# Using the jk object defined in "Anatomy of Join Keys" jk # Getting primary key of "ds1" jk["ds1", "ds1"] # Getting foreign keys between "ds4" and "ds5" jk["ds4", "ds5"]
Note that there is a symmetry in the keys between ds4
and ds5
relationship:
jk["ds5", "ds4"] jk["ds5", "ds4"]
When only 1 argument is used this operator will return a join_keys
object that is filtered accordingly.
# Using the jk object defined in "Anatomy of Join Keys" jk # Getting primary key of "ds1" jk["ds1", "ds1"] # Getting keys of "ds1" and "ds2" jk[c("ds1", "ds2")]
Modifying or adding a key uses the same notation with the assignment operator <-
.
A symmetric relationship will be created automatically, where the parent (by default) will be the dataset defined in the first index.
Assigning NULL
value will delete the relationship.
# Adding a new ds5 <-- ds1 key jk["ds1", "ds5"] <- "a_column" # Removing an existing key jk["ds4", "ds5"] <- NULL
The combining of multiple joining keys is achieved using the c()
generic function, which generates a symmetric and valid join_keys
object. When encountering identical relationship pairs, this operation retains the last occurrence of the pair within the specified argument order.
For added convenience, the function also accommodates join_key_set
objects created through the join_key
function. These objects can be provided as the initial argument or in any other position as needed.
jk1 <- join_keys(join_key("ds1", "ds1", "col_1")) jk2 <- join_keys(join_key("ds2", "ds2", "col_1"), join_key("ds1", "ds2", "col_1")) # Merging c(jk1, jk2) # Keeping last occurence c(jk1, jk2, join_keys(join_key("ds2", "ds2", "col_2"), join_key("ds1", "ds2", c("col_1" = "col_2")))) # Merges join_key and join_key_set objects (from join_key function) c(jk1, join_key("ds3", "ds3", "col_3"))
There are 2 types of relationships encoded with joining keys that are described in the following sections. The primary and foreign keys are created explicitly using the constructor for individual keys (join_key
).
Additionally, the join_keys
object infers implicit relationships when two datasets share foreign keys to a parent dataset, but not between themselves. These implicit relationships are available just like another foreign key and can be used to merge datasets, despite not being defined by the user.
teal_data
When using the teal_data
function, the simplest method to define the join keys is to use the join_keys
argument.
We can specify the column(s) of the dataset that (together) uniquely identify rows in the dataset.
# nolint start: commented_code_linter.
library(teal.data) td_pk <- within( teal_data(), ds1 <- transform(iris, id = seq_len(nrow(iris))) ) datanames(td_pk) <- "ds1" join_keys(td_pk) <- join_keys(join_key("ds1", keys = "id")) join_keys(td_pk)
We can extend the previous example and define primary keys for multiple datasets:
td_pk <- within( td_pk, { ds2 <- data.frame(W = 10:1, V = 5:14, M = rep(1:5, 2)) ds3 <- data.frame(V = 5:14, N = 4) } ) datanames(td_pk) <- c(datanames(td_pk), "ds2", "ds3") join_keys(td_pk)["ds2", "ds2"] <- c("V", "W") join_keys(td_pk)["ds3", "ds3"] <- c("V", "W") join_keys(td_pk)
# nolint end: commented_code_linter.
teal_data
When passing multiple datasets to the teal_data
function, dataset relationships are set using join_keys
and join_key
functions, which then can be used to merge datasets together within teal
apps.
For users familiar with SQL
database schema, these relationships are symmetric and not as strict as SQL
foreign key relationships as teal
does not validate whether the values defined as foreign key columns are present in the table.
For example:
library(teal.data) td_fk <- within( teal_data(), { ds1 <- data.frame(X = 1:10, Y = 21:30, Z = 1:10) ds2 <- data.frame(W = 10:1, V = 5:14, M = rep(1:5, 2)) ds3 <- data.frame(V = 5:14, N = 4) } ) datanames(td_fk) <- c("ds1", "ds2", "ds3") join_keys(td_fk) <- join_keys( # Primary keys join_key("ds1", keys = c("X")), join_key("ds2", keys = c("V", "W")), join_key("ds3", keys = c("V")), # Foreign keys join_key("ds1", "ds2", c("X" = "W")), join_key("ds2", "ds3", c("V" = "V")) ) join_keys(td_fk)
Two datasets that share common foreign keys to the same parent dataset have an implicit relationship between them that is modeled and accessible in joining keys.
This is a special relationship that is inferred from existing foreign keys. It does not need to be explicitly defined but it can be accessed and overwritten just as any other foreign key.
These implicit relationships can be used to merge 2 datasets together, just as if they were defined manually.
library(teal.data) td <- within( teal_data(), { ds1 <- data.frame(X = 1:10, Y = 21:30, Z = 1:10) ds2 <- data.frame(W = 10:1, V = 5:14, M = rep(1:5, 2)) ds3 <- data.frame(V = 5:14, N = 4) ds4 <- data.frame(V = 5:14, R = rnorm(10)) } ) datanames(td) <- c("ds1", "ds2", "ds3", "ds4") join_keys(td) <- join_keys( # Primary keys join_key("ds1", keys = c("X")), join_key("ds2", keys = c("V", "W")), join_key("ds3", keys = c("V")), join_key("ds4", keys = c("V")), # Foreign keys join_key("ds1", "ds2", c("X" = "W")), join_key("ds2", "ds3", c("V" = "V")), join_key("ds1", "ds4", c("X" = "B")) ) join_keys(td) join_keys(td)["ds2", "ds4"]
Note that the definition above contains no join_key
for "ds2" <-> "ds4"
teal.data
provides a set of default join keys (primary and foreign) for datasets named according to the ADaM standard. They are stored in default_cdisc_join_keys
.
names(default_cdisc_join_keys) |> sort()
When not all default keys are required users can select a smaller set of datasets by subsetting (for example: default_cdisc_join_keys[c("ADSL", "ADTTE")]
).
Please note, that for every element that is selected, its parent will also be returned (if it has one) as well as any pair-wise relationships in the resulting selection.
default_cdisc_join_keys default_cdisc_join_keys["ADSL"] default_cdisc_join_keys["ADTTE"] default_cdisc_join_keys[c("ADSL", "ADTTE", "ADRS")]
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.