knitr::opts_chunk$set(echo = TRUE)

Sample dataframes

Firstly, we create two sample dataframes.

dtf01 <- data.frame(
  ID = base::seq(1, 100),
  NAME = base::sample(x = c("John", "Steven", "Helena", "Adele", "JJ", "Amy"),
                      size = 100,
                      replace = TRUE),
  AGE = base::as.integer(stats::runif(n = 100, min = 21, max = 30)),
  RANDOM = stats::runif(n = 100, min = 0, max = 1),
  stringsAsFactors = FALSE
)

dtf02 <- data.frame(
  ID = base::seq(1, 100),
  CITY = base::sample(x = c("Turin", "New York", "Milan", "Shanghai", "Paris", "Boston"),
                      size = 100,
                      replace = TRUE),
  stringsAsFactors = FALSE
)

These two dataframes look like:

knitr::kable(utils::head(dtf01))
knitr::kable(utils::head(dtf02))

Now, we convert them into two dfi objects.

library(dtlng)
dfi01 <- asDfi(dtf01, str_name = "dfi01")
dfi02 <- asDfi(dtf02, str_name = "dfi02")

We are ready for some dplyr-style dataframe manipulations:

dfi03 <- dfi01 %>%
  select_("-RANDOM") %>%
  filter_(~(ID > 50)) %>%
  name("dfi03")

dfi04 <- dfi02 %>%
  filter_(~(ID > 20)) %>%
  name("dfi04")

dfi05 <- dfi03 %>%
  inner_join(dfi04, by = c("ID" = "ID")) %>%
  filter_(~(ID <= 90)) %>%
  name("dfi05")

Then, we can generate the data linea tree:

dtf_tree <- treeDtf()
knitr::kable(utils::head(dtf_tree))

and plot the data lineage for the dataframes

showLineage(dtf_tree = dtf_tree, str_type = "dataframes")

and, of course, the data lineage for every single column

showLineage(dtf_tree = dtf_tree, str_type = "columns")


ngshya/dtlng documentation built on May 23, 2019, 3:05 p.m.