In krlmlr/tibble: Simple Data Frames

To extend the tibble package for new types of columnar data, you need to understand how printing works. The presentation of a column in a tibble is powered by four S3 generics:

type_sum() determines what goes into the column header.
pillar_shaft() determines what goes into the body of the column.
is_vector_s3() and obj_sum() are used when rendering list columns.

If you have written an S3 or S4 class that can be used as a column, you can override these generics to make sure your data prints well in a tibble. To start, you must import the pillar package that powers the printing of tibbles. Either add pillar to the Imports: section of your DESCRIPTION, or call:

usethis::use_package("pillar")

This short vignette assumes a package that implements an S3 class "latlon" and uses roxygen2 to create documentation and the NAMESPACE file. For this vignette to work we need to attach pillar:

Prerequisites

We define a class "latlon" that encodes geographic coordinates in a complex number. For simplicity, the values are printed as degrees and minutes only.

#' @export
latlon <- function(lat, lon) {
  as_latlon(complex(real = lon, imaginary = lat))
}

#' @export
as_latlon <- function(x) {
  structure(x, class = "latlon")
}

#' @export
c.latlon <- function(x, ...) {
  as_latlon(NextMethod())
}

#' @export
`[.latlon` <- function(x, i) {
  as_latlon(NextMethod())
}

#' @export
format.latlon <- function(x, ..., formatter = deg_min) {
  x_valid <- which(!is.na(x))

  lat <- unclass(Im(x[x_valid]))
  lon <- unclass(Re(x[x_valid]))

  ret <- rep("<NA>", length(x))
  ret[x_valid] <- paste(
    formatter(lat, c("N", "S")),
    formatter(lon, c("E", "W"))
  )
  format(ret, justify = "right")
}

deg_min <- function(x, pm) {
  sign <- sign(x)
  x <- abs(x)
  deg <- trunc(x)
  x <- x - deg
  min <- round(x * 60)

  ret <- sprintf("%d°%.2d'%s", deg, min, pm[ifelse(sign >= 0, 1, 2)])
  format(ret, justify = "right")
}

#' @export
print.latlon <- function(x, ...) {
  cat(format(x), sep = "\n")
  invisible(x)
}

latlon(32.7102978, -117.1704058)

More methods are needed to make this class fully compatible with data frames, see e.g. the hms package for a more complete example.

Using in a tibble

Columns on this class can be used in a tibble right away, but the output will be less than ideal:

library(tibble)
data <- tibble(
  venue = "rstudio::conf",
  year  = 2017:2019,
  loc   = latlon(
    c(28.3411783, 32.7102978, NA),
    c(-81.5480348, -117.1704058, NA)
  ),
  paths = list(
    loc[1],
    c(loc[1], loc[2]),
    loc[2]
  )
)

data

(The paths column is a list that contains arbitrary data, in our case latlon vectors. A list column is a powerful way to attach hierarchical or unstructured data to an observation in a data frame.)

The output has three main problems:

The column type of the loc column is displayed as <S3: latlon>. This default formatting works reasonably well for any kind of object, but the generated output may be too wide and waste precious space when displaying the tibble.
The values in the loc column are formatted as complex numbers (the underlying storage), without using the format() method we have defined. This is by design.
The cells in the paths column are also displayed as <S3: latlon>.

In the remainder I'll show how to fix these problems, and also how to implement rendering that adapts to the available width.

Fixing the data type

To display <geo> as data type, we need to override the type_sum() method. This method should return a string that can be used in a column header. For your own classes, strive for an evocative abbreviation that's under 6 characters.

import::from(pillar, type_sum)

#' @importFrom pillar type_sum
#' @export
type_sum.latlon <- function(x) {
  "geo"
}

Because the value shown there doesn't depend on the data, we just return a constant. (For date-times, the column info will eventually contain information about the timezone, see #53.)

data

Rendering the value

To use our format method for rendering, we implement the pillar_shaft() method for our class. (A pillar is mainly a shaft (decorated with an ornament), with a capital above and a base below. Multiple pillars form a colonnade, which can be stacked in multiple tiers. This is the motivation behind the names in our API.)

import::from(pillar, pillar_shaft)

#' @importFrom pillar pillar_shaft
#' @export
pillar_shaft.latlon <- function(x, ...) {
  out <- format(x)
  out[is.na(x)] <- NA
  pillar::new_pillar_shaft_simple(out, align = "right")
}

The simplest variant calls our format() method, everything else is handled by pillar, in particular by the new_pillar_shaft_simple() helper. Note how the align argument affects the alignment of NA values and of the column name and type.

data

We could also use left alignment and indent only the NA values:

#' @importFrom pillar pillar_shaft
#' @export
pillar_shaft.latlon <- function(x, ...) {
  out <- format(x)
  out[is.na(x)] <- NA
  pillar::new_pillar_shaft_simple(out, align = "left", na_indent = 5)
}

data

Adaptive rendering

If there is not enough space to render the values, the formatted values are truncated with an ellipsis. This doesn't currently apply to our class, because we haven't specified a minimum width for our values:

print(data, width = 35)

If we specify a minimum width when constructing the shaft, the loc column will be truncated:

#' @importFrom pillar pillar_shaft
#' @export
pillar_shaft.latlon <- function(x, ...) {
  out <- format(x)
  out[is.na(x)] <- NA
  pillar::new_pillar_shaft_simple(out, align = "right", min_width = 10)
}

print(data, width = 35)

This may be useful for character data, but for lat-lon data we may prefer to show full degrees and remove the minutes if the available space is not enough to show accurate values. A more sophisticated implementation of the pillar_shaft() method is required to achieve this:

#' @importFrom pillar pillar_shaft
#' @export
pillar_shaft.latlon <- function(x, ...) {
  deg <- format(x, formatter = deg)
  deg[is.na(x)] <- pillar::style_na("NA")
  deg_min <- format(x)
  deg_min[is.na(x)] <- pillar::style_na("NA")
  pillar::new_pillar_shaft(
    list(deg = deg, deg_min = deg_min),
    width = pillar::get_max_extent(deg_min),
    min_width = pillar::get_max_extent(deg),
    subclass = "pillar_shaft_latlon"
  )
}

Here, pillar_shaft() returns an object of the "pillar_shaft_latlon" class created by the generic new_pillar_shaft() constructor. This object contains the necessary information to render the values, and also minimum and maximum width values. For simplicity, both formattings are pre-rendered, and the minimum and maximum widths are computed from there. Note that we also need to take care of NA values explicitly. (get_max_extent() is a helper that computes the maximum display width occupied by the values in a character vector.)

For completeness, the code that implements the degree-only formatting looks like this:

deg <- function(x, pm) {
  sign <- sign(x)
  x <- abs(x)
  deg <- round(x)

  ret <- sprintf("%d°%s", deg, pm[ifelse(sign >= 0, 1, 2)])
  format(ret, justify = "right")
}

All that's left to do is to implement a format() method for our new "pillar_shaft_latlon" class. This method will be called with a width argument, which then determines which of the formattings to choose:

#' @export
format.pillar_shaft_latlon <- function(x, width, ...) {
  if (all(crayon::col_nchar(x$deg_min) <= width)) {
    ornament <- x$deg_min
  } else {
    ornament <- x$deg
  }

  pillar::new_ornament(ornament)
}

data
print(data, width = 35)

Adding color

Both new_pillar_shaft_simple() and new_ornament() accept ANSI escape codes for coloring, emphasis, or other ways of highlighting text on terminals that support it. Some formattings are predefined, e.g. style_subtle() displays text in a light gray. For default data types, this style is used for insignificant digits. We'll be formatting the degree and minute signs in a subtle style, because they serve only as separators. You can also use the crayon package to add custom formattings to your output.

#' @importFrom pillar pillar_shaft
#' @export
pillar_shaft.latlon <- function(x, ...) {
  out <- format(x, formatter = deg_min_color)
  out[is.na(x)] <- NA
  pillar::new_pillar_shaft_simple(out, align = "left", na_indent = 5)
}

deg_min_color <- function(x, pm) {
  sign <- sign(x)
  x <- abs(x)
  deg <- trunc(x)
  x <- x - deg
  rad <- round(x * 60)
  ret <- sprintf(
    "%d%s%.2d%s%s",
    deg,
    pillar::style_subtle("°"),
    rad,
    pillar::style_subtle("'"),
    pm[ifelse(sign >= 0, 1, 2)]
  )
  ret[is.na(x)] <- ""
  format(ret, justify = "right")
}

data

Currently, ANSI escapes are not rendered in vignettes, so the display here isn't much different from earlier examples. This may change in the future.

Fixing list columns

To tweak the output in the paths column, we need to indicate that our class is an S3 vector:

import::from(pillar, is_vector_s3)

#' @importFrom pillar is_vector_s3
#' @export
is_vector_s3.latlon <- function(x) TRUE

data

This is picked up by the default implementation of obj_sum(), which then shows the type and the length in brackets. If your object is built on top of an atomic vector the default will be adequate. You, will, however, need to provide an obj_sum() method for your class if your object is vectorised and built on top of a list.

An example of an object of this type in base R is POSIXlt: it is a list with 9 components.

time <- as.POSIXlt("2018-12-20 23:29:51", tz = "CET")
x <- as.POSIXlt(time + c(0, 60, 3600))
str(unclass(x))

But it pretends to be a vector with 3 elements:

x
length(x)
str(x)

So we need to define a method that returns a character vector the same length as x:

import::from(pillar, obj_sum)

#' @importFrom pillar obj_sum
#' @export
obj_sum.POSIXlt <- function(x) {
  rep("POSIXlt", length(x))
}

Testing

If you want to test the output of your code, you can compare it with a known state recorded in a text file. For this, pillar offers the expect_known_display() expectation which requires and works best with the testthat package. Make sure that the output is generated only by your package to avoid inconsistencies when external code is updated. Here, this means that you test only the shaft portion of the pillar, and not the entire pillar or even a tibble that contains a column with your data type!

The tests work best with the testthat package:

library(testthat)

unlink("latlon.txt")
unlink("latlon-bw.txt")

The code below will compare the output of pillar_shaft(data$loc) with known output stored in the latlon.txt file. The first run warns because the file doesn't exist yet.

test_that("latlon pillar matches known output", {
  pillar::expect_known_display(
    pillar_shaft(data$loc),
    file = "latlon.txt"
  )
})

From the second run on, the printing will be compared with the file:

test_that("latlon pillar matches known output", {
  pillar::expect_known_display(
    pillar_shaft(data$loc),
    file = "latlon.txt"
  )
})

However, if we look at the file we'll notice strange things: The output contains ANSI escapes!

readLines("latlon.txt")

We can turn them off by passing crayon = FALSE to the expectation, but we need to run twice again:

library(testthat)
test_that("latlon pillar matches known output", {
  pillar::expect_known_display(
    pillar_shaft(data$loc),
    file = "latlon.txt",
    crayon = FALSE
  )
})

test_that("latlon pillar matches known output", {
  pillar::expect_known_display(
    pillar_shaft(data$loc),
    file = "latlon.txt",
    crayon = FALSE
  )
})

readLines("latlon.txt")

You may want to create a series of output files for different scenarios:

Colored vs. plain (to simplify viewing differences)
With or without special Unicode characters (if your output uses them)
Different widths

For this it is helpful to create your own expectation function. Use the tidy evaluation framework to make sure that construction and printing happens at the right time:

expect_known_latlon_display <- function(x, file_base) {
  quo <- rlang::quo(pillar::pillar_shaft(x))
  pillar::expect_known_display(
    !! quo,
    file = paste0(file_base, ".txt")
  )
  pillar::expect_known_display(
    !! quo,
    file = paste0(file_base, "-bw.txt"),
    crayon = FALSE
  )
}

test_that("latlon pillar matches known output", {
  expect_known_latlon_display(data$loc, file_base = "latlon")
})

readLines("latlon.txt")
readLines("latlon-bw.txt")

Learn more about the tidyeval framework in the dplyr vignette.

unlink("latlon.txt")
unlink("latlon-bw.txt")

krlmlr/tibble documentation built on Jan. 15, 2020, 7:56 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

krlmlr/tibble
Simple Data Frames

In krlmlr/tibble: Simple Data Frames

Prerequisites

Using in a tibble

Fixing the data type

Rendering the value

Adaptive rendering

Adding color

Fixing list columns

Testing

R Package Documentation

Browse R Packages

We want your feedback!

krlmlr/tibble Simple Data Frames

In krlmlr/tibble: Simple Data Frames

Prerequisites

Using in a tibble

Fixing the data type

Rendering the value

Adaptive rendering

Adding color

Fixing list columns

Testing

R Package Documentation

Browse R Packages

We want your feedback!

krlmlr/tibble
Simple Data Frames