booktabs: Construct a nice table from a data frame
In grayclhn/dbframe-R-library: An R to SQL interface

Description Usage Arguments Details Value Implementation Unit tests Author(s) See Also Examples

Constructs an attractive LaTeX table (using booktabs) from an arbitrary data frame.

booktabs(dframe, align = "l", digits = 1, numberformat = FALSE,
         purgeduplicates = TRUE, tabular.environment = "tabularx",
         scientific = FALSE, include.rownames = FALSE,
         sanitize.text.function = function(x) x, drop = NULL,...)

`dframe`	A data frame or an object that can be coerced to a data.frame.
`align`	A character vector specifying each column's alignment. Each element should be “l”, “c”, or “r”.
`digits`	A vector of integers specifying the number of digits to display for each column.
`numberformat`	logical vector indicating which columns should be formatted as numbers. This adds padding to the left side of the numbers so that the column aligns on the decimal point, switches to math mode to turn off old-style numbering (if needed), and converts hyphens into minus signs.
`purgeduplicates`	A logical vector that labels which columns should have duplicate entries removed.
`tabular.environment`	Defaults to using tabularx in LaTeX.
`scientific`	FALSE unless you want to use scientific notation in the tables.
`include.rownames`	Logical indicating whether should the data frame's rownames should be included. Probably not.
`sanitize.text.function`	Disables escaping out backslashes, etc. This is the same as in xtable.
`drop`	A character vector indicating some columns to omit. This can be useful if those columns are used for sorting or to create derived columns.
`...`	Additional arguments to pass to `xtable`.

This function uses xtable to generate LaTeX code, then modifies the table. The containing Latex document should use the “booktabs” package.

A character object containing LaTeX code for a table.

The basic implementation is straightforward.

<<*>>=
    booktabs <- function(dframe, align = "l", digits = 1,
                         numberformat = FALSE, purgeduplicates = TRUE,
                         tabular.environment = "tabularx", 
                         include.rownames = FALSE,
                         sanitize.text.function = function(x) x, 
                         drop = NULL,...) {
      <<Define platform independent null file>>
      <<Format arguments>>
      return(<<Assemble Latex code for table>>)
    }

To assemple the Latex code, we just call xtable on the data frame, then substitute out the first line to make the table span the entire page. Note that devnull is defined elsewhere in the package.

<<Assemble Latex code for table>>=
    gsub(sprintf("\\\\begin\\{%s\\}", tabular.environment),
         sprintf("\\\\begin\\{%s\\}\\{\\\\textwidth\\}", tabular.environment),
         print(xtable(dframe, align = align, digits = digits,...),
               file = devnull, floating = FALSE,
               add.to.row = list(pos=list(-1, 0, nrow(dframe)),
                 command = c("\\toprule ", "\\midrule ", "\\bottomrule ")),
               tabular.environment = tabular.environment,
               sanitize.text.function = sanitize.text.function,
               include.rownames = include.rownames, hline.after = NULL))

The variable devnull is defined to be a platform independent /dev/null:

<<Define platform independent null file>>=
    devnull <- switch(Sys.info()["sysname"],
      Windows = "NUL", 
      Linux   = "/dev/null",
      Darwin  = "/dev/null",
      {warning("Your OS is not explicitly supported; we'll assume /dev/null exists.")
       "/dev/null"})

A little bit of routine reformatting needs to happen before calling xtable.

<<Format arguments>>=
    dframe <- as.data.frame(dframe)
    <<Drop user-specified columns>>
    <<Correct dimensions of arguments>>
    <<Pad formatting columns to accomodate xtables handling of rownames>>
    <<Reformat numeric columns>>
    <<Remove duplicates from specified columns>>

The user can choose to leave out some of the columns. This can be useful if there are columns that are important for sorting the data frame, but are not of interest on their own.

<<Drop user-specified columns>>=
    if (!is.null(drop)) {
      columnnames <- names(dframe)
      if (!all(drop %in% columnnames)) {
        warning("'drop' contains some columns not in 'dframe'")
      }
      dframe <- dframe[, setdiff(names(dframe), drop), drop = FALSE]
    }

For conveneince, we let arguments that affect column-by-column formatting be written as a single value if the same value applies to each column. In that case, we repeat the value the correct number of times.

<<Correct dimensions of arguments>>=
    ncol <- ncol(dframe) + include.rownames
    if (length(align) == 1) align <- rep(align, ncol)
    if (length(digits) == 1) digits <- rep(digits, ncol)
    if (length(numberformat) == 1) numberformat <- rep(numberformat, ncol)
    if (length(purgeduplicates) == 1)
      purgeduplicates <- rep(purgeduplicates, ncol)

The way xtable handles the row names is kind of annoying: alignment and digits need to specified for it, even if the row names will not be shown. To avoid doing that, we pad the necessary arguments if the row names aren't going to be shown.

<<Pad formatting columns to accomodate xtables handling of rownames>>=
    if (!include.rownames) {
      align <- c("l", align)
      digits <- c(0, digits)
    }

Columns that contain numbers are reformatted to align at the decimal point and use the correct minus sign.

<<Reformat numeric columns>>=
    dframe[,numberformat] <- lapply(which(numberformat), function (i) {
      emptyRows <- is.na(dframe[,i])
      rowTex <- rep("", length(emptyRows))
      rowTex[!emptyRows] <- 
        gsub("-", "\\\\!\\\\!-", sprintf("$%s$", gsub(" ", "\\\\enskip", 
           format(round(as.numeric(dframe[!emptyRows,i]), 
                        digits[i + !include.rownames])))))
      rowTex
    })

Finally, we remove duplicates in a slightly clever way

<<Remove duplicates from specified columns>>=
    repeats <- function(x) c(FALSE, x[-1] == x[seq_len(length(x) - 1)])
    purgeindex <- which(purgeduplicates)
    for (i in rev(seq_along(purgeindex))) {
      dframe[repeats(dframe[[i]]) &
             duplicated(dframe[, purgeindex[seq_len(i)], drop = FALSE]),
             purgeindex[i]] <- NA
    }

<<test-booktabs.R>>=
    library(testthat)
    library(xtable)
    filename <- tempfile(fileext = ".db")

    data(longley)
    test_that("booktabs executes at all", {
      expect_that(booktabs(longley), is_a("character"))
    })

    test_that("Columns that are labeled 'numberformat' are formatted", {
      d <- data.frame(x = c(-1.324, 0.93), y = c(10.443, 1.235))
      expect_that(booktabs(d, numberformat = TRUE, 
                           purgeduplicates = FALSE, digits = 2, align = "c"),
        prints_text("\\$\\\\\\\\\\!\\\\\\\\\\!-1.32\\$ & \\$10.44\\$"))
      expect_that(booktabs(d, numberformat = TRUE,
                           purgeduplicates = FALSE, digits = 2, align = "c"),
        prints_text("\\$\\\\\\\\enskip0.93\\$ & \\$\\\\\\\\enskip1.24\\$"))
    })

    test_that("Argument checking works as expected", {
      expect_that(booktabs(longley, drop = "WXYZ"),
        gives_warning("'drop' contains some columns not in 'dframe'"))
    })