Color data frame output in R terminal

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "",
  R.options=list(width=100)
)
options(crayon.enabled = TRUE)
options(crayon.colors = 256)
knitr::knit_hooks$set(output = function(x, options){
  paste0(
    '<pre class="r-output"><code>',
    fansi::sgr_to_html(x = htmltools::htmlEscape(x), warn = FALSE),
    '</code></pre>'
  )
})

## this is an ugly, ugly hack, but otherwise crayon does not LISTEN TO REASON!!!
num_colors <- function(forget=TRUE) 256
library(crayon)
assignInNamespace("num_colors", num_colors, pos="package:crayon")
library(colorDF)
library(dplyr)
library(data.table)

Quick start

library(dplyr)
data(starwars)
sw <- starwars[, c(1:3, 7:8)]
sw %>% colorDF
colorDF(sw) %>% summary

Colorful data frames

Your average terminal in which you run R is capable of displaying colors, styles and unicode characters. Wouldn't it be nice to add some color to the data frames you are displaying? For example, that factors are shown in a distinct color (no confusing of strings and factors any more!) or that significant p-values are colored in red?

This was my motivation when writing this tiny package. Of course, changing default method for printing data frames is nothing a package is allowed to do (but read on!). However, this package defines everything you need to get dynamic, colorful output when viewing data frames. There are two things about colorDF which are important:

  1. colorDF never modifies the behavior of the data frame like object or its contents (i.e. it does not redefine methods like [<-, removes row names etc.^[Strictly speaking, that is not true, as there is a [.colorDF method which serves only as a wrapper around the respective [ of the underlying data frame, tibble or data table. The only reason is exist is that otherwise the style and column type attributes will be lost.]). The only two things that change are (i) the default print method (visualization), and (ii) the ".style" and ".coltp" attributes of the object, and that only if you really change the class of the object, which is often unnecessary.
  2. Any data frame like object can be used in colorDF, and you don't need to modify these objects to use the colorful visualizations.

Yes, you can color any object that can be cast into a data frame with this or related functions! For example, you can apply it to both tibbles and data.table objects:

## works with standard data.frames
colorDF(mtcars)

## works with tidyverse tibbles
mtcars %>% as_tibble %>% colorDF

## works with data.table
colorDF(data.table(mtcars))

The output of these three commands is identical:

colorDF(mtcars)

Column types

Column types are mostly like classes, but colorDF introduces some additional distinctions, specifically "identifier" (such that character columns which contain identifiers can be shown with a particular, distinct style) and "pval", to show significant p-values in a different color (and use format.pval() for formatting). Column types are stored in the .coltp attribute of the colorDF object.

colorDF tries to guess how each column should be displayed. First it checks whether any column types have been assigned explicitely using the col_type<- function and stored in the .coltp attribute of the object. Next, it looks up whether it can guess the contents of the column by looking at the column name (ID, p-value). Finally, it determines the class of the column (character, integer, numeric, logical, factor).

To assign a particular column type, you need first to turn a data frame colorful and then modify the column type:

sw <- sw %>% as.colorDF
col_type(sw, "name") <- "identifier"
col_type(sw, "gender") <- "factor"
sw$probability <- runif(nrow(sw), 0, 0.1)
col_type(sw, "probability") <- "pval"
sw

Note that changing the column type does not change the class of the column in the data frame! colorDF never touches the data frame contents, the only operations concern the "class", ".style" and ".coltp" attributes. So while you may set a column type to "character" instead of "factor", even though it will be looking like a character type on the terminal output, the column class will still be a factor.

You can also hide a column:

sw <- colorDF(starwars)
col_type(sw, c("vehicles", "films", "starships")) <- "hidden"
sw

Styles and Themes

I am a bit confused when it comes to distinguishing the two. Themes are basically internally predefined styles. Styles are simply lists that hold information how different columns, column and row headers, separators between the columns and highlighted rows are displayed.

Themes can be set using the options(colorDF_theme="<theme name>") command or by directly specifying the option in a call to colorDF:

colorDF(sw, theme="bw")

Here is an overview of the themes. Some of them are intended for dark background and will not look great on a light background, which is why we use force_bg=TRUE to force black on white background for these themes:

colorDF_themes_show(force_bg=TRUE)

You can add your own themes using add_colorDF_theme() (see the example section on the help page).

Column styles

Styles of a colorDF object can be directly manipulated using df_style:

mtcars.c <- colorDF(mtcars)
df_style(mtcars.c, "sep") <- "; "

If interested, read the help file for df_style().

Utilities

Summaries

colorDF comes with a couple of utility functions. Firstly, it defines a summary method for colorful data frames which can also be used for any other data frame like object and which I find much more useful than the regular summary:

starwars %>% as.colorDF %>% summary

There is a directly visible (exported) version of the colorful summary called summary_colorDF:

starwars %>% summary_colorDF

As you can see, the summary is much more informative than the default summary.data.frame function. Not only this, but the object does not need to be a data frame – any list can do!

mtcars_cyl <- split(mtcars$mpg, mtcars$cyl)
sapply(mtcars_cyl, length)

The list mtcars_cyl is the miles per gallon column split by number of cylinders. We can use summary_colorDF to create a (semi)graphical summary of this list:

summary_colorDF(mtcars_cyl, numformat="g", width=90)

In fact, this is so useful (especially if an interactive graphic device is not practical, e.g. when running R over ssh/screen) that I implemented a terminal boxplotting function:

term_boxplot(Sepal.Length ~ Species, data=iris, width=90)

Highlighting

The highlight() function allows to mark selected rows from the table:

foo <- starwars %>% select(name, species, homeworld) %>% 
  highlight(.$homeworld == "Tatooine")

(Unfortunately, the HTML representation of the ANSI terminal doesn't show that one correctly).

Data frame search

The df_search() function looks through a data frame for occurence of a pattern in all columns (or a subset, if the parameter cols is used) and where the pattern matches, it colors the contents of the cell in red:

starwars %>% df_search("blue")

Setting up colorDF as the default data frame print method

You can use colorDF as the default method for displaying data frames and similar objects. For this, you need to use the colorDF:::print.colorDF function:

## for regular data frames
print.data.frame <- colorDF:::print_colorDF

## for tidyverse tibbles
print.tbl        <- colorDF:::print_colorDF

## for data.tables
print.data.table <- colorDF:::print_colorDF

This will not replace or modify the original functions from data.table or tibble packages, but merely mask these. And from now on, every data frame like object will be shown in color, but otherwise, its behavior will not change.

Should you want to go back to the original print functions, just remove these new functions:

rm(print.data.frame, print.tbl, print.data.table)

This is a bit more complicated in case of S4 objects. One such object type is a DataFrame defined in the S4Vectors package. It is commonly used in many Bioconductor packages such as DESeq2. Unfortunately, the show method defined for DataFrames is not convenient, for example it always displays a ridiculous number of significant digits, cluttering the output. print_colorDF can print these classes, as it can work on anything that can be cast into a data frame using an as.data.frame method.

To take over the output of DataFrames and all other objects inheriting from it (such as DESeqResults), we need to use the S4 convention of defining the methods:

setMethod("show", "DataFrame", function(object) colorDF::print_colorDF(object))

Since methods can be only defined for existing classes, if you want to put it in your .Rprofile, you need to first load (but not necessarily attach) the S4Vectors package:

loadNamespace("S4Vectors")
setMethod("show", "DataFrame", function(object) colorDF::print_colorDF(object))

Global options

There is a number of options which override whatever has been defined in a particular theme / style. You can view them with colorDF_options():

colorDF_options()

To change these options, use options() just like with any other global option. For example,

options(colorDF_tibble_style=TRUE)
options(colorDF_sep= " ")
options(colorDF_n=5)
colorDF(starwars)

Rmarkdown

The package is intended to be used in terminal. However, as you see above, it is possible to get the colored tables also in an rmarkdown document. For this, include the following chunk at the beginning of your document:

`r ''````r
options(crayon.enabled = TRUE)
knitr::knit_hooks$set(output = function(x, options){
  paste0(
    '<pre class="r-output"><code>',
    fansi::sgr_to_html(x = htmltools::htmlEscape(x), warn = FALSE),
    '</code></pre>'
  )
})
```

Issues

Currently, colorDF relies on the crayon library to generate the ANSI escape codes. Unfortunately, crayon is peculiar about trying to guess the terminal type. Without going into details, there are situations in which it is not possible to force crayon into using 256 colors, even if you know that it really works. One such example is this vignette: if an rmarkdown file is built from command line, crayon will ignore any setting of the crayon.colors option and use only the base colors.

The reason that the colors in this vignette appear correct is that I used an ugly hack to substitute the crayon::num_colors() function by a function that always returns 256:

`r ''````r
num_colors <- function(forget=TRUE) 256
library(crayon)
assignInNamespace("num_colors", num_colors, pos="package:crayon")
```


Try the colorDF package in your browser

Any scripts or data that you put into this service are public.

colorDF documentation built on Nov. 16, 2020, 9:15 a.m.