This tutorial introduces the methods {dm} provides for modifying the data in the tables of a relational model. There are 6 methods:
dm_rows_insert()
- adds new rowsdm_rows_update()
- changes values in rowsdm_rows_patch()
- fills in missing valuesdm_rows_upsert()
- adds new rows or changes values if
pre-existingdm_rows_delete()
- deletes rowsdm_rows_truncate()
- removes all rows, leaving table
structure intactAll six methods take the same arguments and using them follows the same process:
in_place = FALSE
to double-checkin_place = TRUE
.To start, a dm object is created containing the tables, and rows, that
you want to change. This changeset dm is then copied into the same
source as the dm you want to modify. With the dm in the same RDBMS as
the destination dm, you call the appropriate method, such as
dm_rows_insert()
, to make your planned changes, along with an argument
of in_place = FALSE
so you can confirm you achieve the changes that
you want.
This verification can be done visually, looking at row counts and the
like, or using {dm}’s constraint checking method,
dm_examine_constraints()
. The biggest danger is damaging key relations
between data spread across multiple tables by deleting or duplicating
rows and their keys. dm_examine_constraints()
will catch errors where
primary keys are duplicated or foreign keys do not have a matching
primary key (unless the foreign key value is NA
).
With the changes confirmed, you execute the method again, this time with
the argument in_place = TRUE
to make the changes permanent. Note that
in_place = FALSE
is the default: you must opt in to actually change
data on the database.
Each method has its own requirements in order to maintain database consistency. These involve constraints on primary key values as they are how rows are identified.
| Method | Requirements |
|----------------------|----------------------------------------------------------------------------------------------------|
| dm_rows_insert()
| The primary keys must differ from existing records. |
| dm_rows_update()
| Primary keys must match for all records to be updated. |
| dm_rows_patch()
| Updates missing values in existing records. Primary keys must match for all records to be patched. |
| dm_rows_upsert()
| Updates existing records and adds new records, based on the primary key. |
| dm_rows_delete()
| Removes matching records based on the primary key. |
| dm_rows_truncate()
| Removes all records, only for tables in the changeset dm. |
To ensure the integrity of all relations during the process, all methods
automatically determine the correct processing order for the tables
involved. For operations that create records, parent tables are
processed before child tables. For dm_rows_delete()
and
dm_rows_truncate()
, child tables are processed before their parent
tables. For more details on this see vignette("howto-dm-theory")
and
vignette("howto-dm-db")
.
To demonstrate the use of these table modifying methods we will create a
simple dm object with two tables linked by a foreign key. Note the
foreign key of NA
in the child
table.
library(tidyverse)
#> ── Attaching packages ────────────────────────────────── tidyverse 1.3.1 ──
#> ✔ ggplot2 3.3.3 ✔ purrr 0.3.4
#> ✔ tibble 3.1.1 ✔ dplyr 1.0.5
#> ✔ tidyr 1.1.3 ✔ stringr 1.4.0
#> ✔ readr 1.4.0 ✔ forcats 0.5.1
#> ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ──
#> ✖ purrr::%@%() masks dm::%@%()
#> ✖ purrr::as_function() masks dm::as_function()
#> ✖ tidyr::extract() masks dm::extract()
#> ✖ dplyr::filter() masks dm::filter(), stats::filter()
#> ✖ purrr::flatten() masks dm::flatten()
#> ✖ purrr::flatten_chr() masks dm::flatten_chr()
#> ✖ purrr::flatten_dbl() masks dm::flatten_dbl()
#> ✖ purrr::flatten_int() masks dm::flatten_int()
#> ✖ purrr::flatten_lgl() masks dm::flatten_lgl()
#> ✖ purrr::flatten_raw() masks dm::flatten_raw()
#> ✖ purrr::invoke() masks dm::invoke()
#> ✖ purrr::is_null() masks dm::is_null(), testthat::is_null()
#> ✖ dplyr::lag() masks dm::lag(), stats::lag()
#> ✖ purrr::list_along() masks dm::list_along()
#> ✖ dplyr::matches() masks tidyr::matches(), dm::matches(), testthat::matches()
#> ✖ purrr::modify() masks dm::modify()
#> ✖ purrr::prepend() masks dm::prepend()
#> ✖ purrr::splice() masks dm::splice()
library(dm)
parent <- tibble(value = c("A", "B", "C"), pk = 1:3)
parent
#> # A tibble: 3 x 2
#> value pk
#> <chr> <int>
#> 1 A 1
#> 2 B 2
#> 3 C 3
child <- tibble(value = c("a", "b", "c"), pk = 1:3, fk = c(1, 1, NA))
child
#> # A tibble: 3 x 3
#> value pk fk
#> <chr> <int> <dbl>
#> 1 a 1 1
#> 2 b 2 1
#> 3 c 3 NA
demo_dm <-
dm(parent = parent, child = child) %>%
dm_add_pk(parent, pk) %>%
dm_add_pk(child, pk) %>%
dm_add_fk(child, fk, parent)
demo_dm %>%
dm_draw(view_type = "all")
{dm} doesn’t check your key values when you create a dm, we add this check:[1]
dm_examine_constraints(demo_dm)
#> ℹ All constraints satisfied.
Then we copy demo_dm
into an SQLite database. Note: the default for
the method used, copy_dm_to()
, is to create temporary tables that will
be automatically deleted when your session ends. As demo_sql
will be
the destination dm for the examples, the argument temporary = FALSE
is
used to make this distinction apparent.
library(DBI)
sqlite_db <- DBI::dbConnect(RSQLite::SQLite())
demo_sql <- copy_dm_to(sqlite_db, demo_dm, temporary = FALSE)
demo_sql
#> ── Table source ───────────────────────────────────────────────────────────
#> src: sqlite 3.35.5 []
#> ── Metadata ───────────────────────────────────────────────────────────────
#> Tables: `parent`, `child`
#> Columns: 5
#> Primary keys: 2
#> Foreign keys: 1
{dm}’s table modification methods can be piped together to create a repeatable sequence of operations that returns a dm incorporating all the changes required. This is a common use case for {dm} – building by hand a sequence of operations using temporary results until it is complete and correct, then committing the result.
dm_rows_insert()
To demonstrate dm_rows_insert()
we create a dm with tables containing
the rows to insert and copy it to sqlite_db
, the same source as
demo_sql
. For all of the dm_rows_*
methods the source and
destination dm objects must be in the same RDBMS. You will get an error
message if this is not the case.
The code below adds parent
and child
table entries for the letter
“D”. First, the changeset dm is created and temporarily copied to the
database:
new_parent <- tibble(value = "D", pk = 4)
new_parent
#> # A tibble: 1 x 2
#> value pk
#> <chr> <dbl>
#> 1 D 4
new_child <- tibble(value = "d", pk = 4, fk = 4)
new_child
#> # A tibble: 1 x 3
#> value pk fk
#> <chr> <dbl> <dbl>
#> 1 d 4 4
dm_insert_in <-
dm(parent = new_parent, child = new_child) %>%
copy_dm_to(sqlite_db, ., temporary = TRUE)
The changeset dm is then used as an argument to dm_rows_insert()
.
dm_insert_out <-
demo_sql %>%
dm_rows_insert(dm_insert_in)
#> Not persisting, use `in_place = FALSE` to turn off this message.
This gives us a warning that changes will not be persisted. Inspecting
the child
table of the resulting dm_insert_out
and demo_sql
, we
can see that’s exactly what happened. {dm} returned to us a dm object
with our inserted rows in place, but the underlying database has not
changed.
dm_insert_out$child
#> # Source: lazy query [?? x 3]
#> # Database: sqlite 3.35.5 []
#> value pk fk
#> <chr> <dbl> <dbl>
#> 1 a 1 1
#> 2 b 2 1
#> 3 c 3 NA
#> 4 d 4 4
demo_sql$child
#> # Source: table<`child`> [?? x 3]
#> # Database: sqlite 3.35.5 []
#> value pk fk
#> <chr> <int> <dbl>
#> 1 a 1 1
#> 2 b 2 1
#> 3 c 3 NA
We repeat the operation, this time with the argument in_place = TRUE
and the changes now persist in demo_sql
.
dm_insert_out <-
demo_sql %>%
dm_rows_insert(dm_insert_in, in_place = TRUE)
demo_sql$child
#> # Source: table<`child`> [?? x 3]
#> # Database: sqlite 3.35.5 []
#> value pk fk
#> <chr> <int> <dbl>
#> 1 a 1 1
#> 2 b 2 1
#> 3 c 3 NA
#> 4 d 4 4
dm_rows_update()
dm_rows_update()
works the same as dm_rows_insert()
. We create the
dm object and copy it to the same source as the destination. Here we
will change the foreign key for the row in child
containing “b” to
point to the correct row in parent
. And we will persist the changes.
updated_child <- tibble(value = "b", pk = 2, fk = 2)
updated_child
#> # A tibble: 1 x 3
#> value pk fk
#> <chr> <dbl> <dbl>
#> 1 b 2 2
dm_update_in <-
dm(child = updated_child) %>%
copy_dm_to(sqlite_db, ., temporary = TRUE)
dm_update_out <-
demo_sql %>%
dm_rows_update(dm_update_in, in_place = TRUE)
demo_sql$child
#> # Source: table<`child`> [?? x 3]
#> # Database: sqlite 3.35.5 []
#> value pk fk
#> <chr> <int> <dbl>
#> 1 a 1 1
#> 2 b 2 2
#> 3 c 3 NA
#> 4 d 4 4
dm_rows_delete()
dm_rows_delete()
is not currently implemented to work with an RDBMS,
so we will shift our demonstrations back to the local R environment.
We’ve made changes to demo_sql
so we use collect()
to copy the
current tables out of SQLite. Note that persistence is not a concern
with local dm objects. Every operation returns a new dm object
containing the changes made.
local_dm <- collect(demo_sql)
local_dm$parent
#> # A tibble: 4 x 2
#> value pk
#> <chr> <int>
#> 1 A 1
#> 2 B 2
#> 3 C 3
#> 4 D 4
local_dm$child
#> # A tibble: 4 x 3
#> value pk fk
#> <chr> <int> <dbl>
#> 1 a 1 1
#> 2 b 2 2
#> 3 c 3 NA
#> 4 d 4 4
dm_deleted <-
dm(parent = new_parent, child = new_child) %>%
dm_rows_delete(local_dm, .)
#> Not persisting, use `in_place = FALSE` to turn off this message.
#> Ignoring extra columns: value, fk
#> Ignoring extra columns: value
dm_deleted$child
#> # A tibble: 3 x 3
#> value pk fk
#> <chr> <int> <dbl>
#> 1 a 1 1
#> 2 b 2 2
#> 3 c 3 NA
dm_rows_patch()
dm_rows_patch()
updates missing values in existing records. We use it
here to fix the missing foreign key in the child
table.
patched_child <- tibble(value = "c", pk = 3, fk = 3)
patched_child
#> # A tibble: 1 x 3
#> value pk fk
#> <chr> <dbl> <dbl>
#> 1 c 3 3
dm_patched <-
dm(child = patched_child) %>%
dm_rows_patch(dm_deleted, .)
#> Not persisting, use `in_place = FALSE` to turn off this message.
dm_patched$child
#> # A tibble: 3 x 3
#> value pk fk
#> <chr> <int> <dbl>
#> 1 a 1 1
#> 2 b 2 2
#> 3 c 3 3
dm_rows_upsert()
dm_rows_upsert()
updates rows with supplied values if they exist or
inserts the supplied values as new rows if they don’t. In this example
we add the letter “D” back to our dm, and update the foreign key for
“b”.
upserted_parent <- tibble(value = "D", pk = 4)
upserted_parent
#> # A tibble: 1 x 2
#> value pk
#> <chr> <dbl>
#> 1 D 4
upserted_child <- tibble(value = c("b", "d"), pk = c(2, 4), fk = c(3, 4))
upserted_child
#> # A tibble: 2 x 3
#> value pk fk
#> <chr> <dbl> <dbl>
#> 1 b 2 3
#> 2 d 4 4
dm_upserted <-
dm(parent = upserted_parent, child = upserted_child) %>%
dm_rows_upsert(dm_patched, .)
#> Not persisting, use `in_place = FALSE` to turn off this message.
dm_upserted$parent
#> # A tibble: 4 x 2
#> value pk
#> <chr> <dbl>
#> 1 A 1
#> 2 B 2
#> 3 C 3
#> 4 D 4
dm_upserted$child
#> # A tibble: 4 x 3
#> value pk fk
#> <chr> <dbl> <dbl>
#> 1 a 1 1
#> 2 b 2 3
#> 3 c 3 3
#> 4 d 4 4
dm_rows_truncate()
dm_rows_truncate()
deletes all the rows in a table while leaving all
other related information intact, including column names, column types,
and key relations. The function derives its name from the SQL
TRUNCATE TABLE
statement, so we will return to our SQLite database to
demonstrate its use. The example below truncates only the child
table.
Note how a modified version of the destination dm is used as “changeset
dm”: the rows in the changeset dm do not matter here.
dm_trunc_in <-
demo_sql %>%
dm_select_tbl(child)
dm_trunc_in
#> ── Table source ───────────────────────────────────────────────────────────
#> src: sqlite 3.35.5 []
#> ── Metadata ───────────────────────────────────────────────────────────────
#> Tables: `child`
#> Columns: 3
#> Primary keys: 1
#> Foreign keys: 0
dm_trunc_out <-
demo_sql %>%
dm_rows_truncate(dm_trunc_in, in_place = TRUE)
demo_sql$child
#> # Source: table<`child`> [?? x 3]
#> # Database: sqlite 3.35.5 []
#> # … with 3 variables: value <chr>, pk <int>, fk <dbl>
The dm_rows_*
methods give you row-level granularity over the
modifications you need to make to your relational model. By using the
in_place
argument they all share you can construct and verify your
modifications before committing them. There are a few limitations, as
mentioned in the tutorial, but these will be addressed in future updates
to {dm}.
If this tutorial answered some questions, but opened others, these resources might be of assistance.
Is your data in an RDBMS? vignette("howto-dm-db")
offers a detailed
look at working with an existing relational data model.
If your data is in data frames, then you may want to read
vignette("howto-dm-df")
next.
If you would like to know more about relational data models in order
to get the most out of dm, check out vignette("howto-dm-theory")
.
If you’re familiar with relational data models but want to know how to
work with them in dm, then any of vignette("tech-dm-join")
,
vignette("tech-dm-filter")
, or vignette("tech-dm-zoom")
is a good
next step.
[1] Be aware that when using dm_examine_constraints()
NULL (NA
)
foreign keys are allowed and will be counted as a match. In some cases
this doesn’t make sense and non-NULL columns should be enforced by the
RDBMS. Currently {dm} does not specify or check non-NULL constraints for
columns.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.