knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(canny.tudor)

Can you find the type of a tibble::tibble, meaning the types plural; that is, one for each of the table columns? If you ask for base::typeof you only see "list".

Take an example

Two columns: one of int, the other of chr.

tibble::tibble(a = 1:10, b=letters[1:10])

You can access the column names using base::names and then apply the resulting names to the table using the [[ sub-setting operator, turning each column to its underlying vector each in turn. Tibbles recommend using [[ in newer code.

Finally base:typeof tells you the vector type.

Sketch implementation

types.of <- function(x)
  sapply(names(x), function(y) typeof(x[[y]]))

It answers a named vector of character describing the type of each column paired with the name of the column.

library(canny.tudor)
types.of(tibble::tibble(a=1:10, b=letters[1:10]))

Data frames as well?

Does the same approach work on non-tibble data frames? It does because it only relies on subset access operators.

df <- data.frame(x = 1.1, y = I(letters[1:10]))
canny.tudor::types.of(df)

Database frames

And finally, what about data frames lazily-loaded from a database? Do the underlying column types manifest themselves? And do they appear before or after collection of data by query execution? The simple answer is no, not without first collecting some rows. This includes no rows, or limiting rows to zero.

You might expect this result since the data frame does not yet exist, strictly speaking; only the promise exists until some SQL runs.

Try it

First create an SQLite database and set up a table, called df short for data frame. Fill it with some test data.

DBI::dbConnect(RSQLite::SQLite()) -> db
DBI::dbWriteTable(db, "df", data.frame(a=1:10, b=letters[1:10]))
dplyr::tbl(db, "df")
dplyr::tbl(db, "df") %>% types.of()

Above uses R 4.1+ built-in pipe operator; also uses dplyr to read back the table lazily and derive the types. No joy. You cannot access lazy table columns as you would an in-memory data frame.

Column subset answers NULL because the frame is not yet available locally.

dplyr::tbl(db, "df")[["a"]]
sapply(dplyr::tbl(db, "df") %>% class(), function(x) methods(class = x))

However, you can effectively select no rows to derive an empty data frame carrying empty column vectors but one empty vector for each column and with the appropriate column type.

dplyr::tbl(db, "df") %>%
  head(0) %>%
  dplyr::collect() %>%
  canny.tudor::types.of()
DBI::dbDisconnect(db)

Joy!



royratcliffe/canny.tudor documentation built on Oct. 17, 2022, 4:17 a.m.