In This Vignette

Introducing pivottabler

The pivottabler package enables pivot tables to be created and rendered/exported with just a few lines of R.

Pivot tables are constructed natively in R, either via a short one line command to build a basic pivot table or via series of R commands that gradually build a more bespoke pivot table to meet your needs.

The pivottabler package:

Since pivot tables are primarily visualisation tools, the pivottabler package offers several custom styling options as well as conditional/custom formatting capabilities so that the pivot tables can be themed/branded as needed.

Output can be rendered as:

The generated HTML, Latex and text can also be easily retrieved, e.g. to be used outside of R

The pivot tables can also be exported to Excel, including the styling/formatting.

pivottabler is a companion package to the basictabler package. pivottabler is focussed on generating pivot tables and can aggregate data. basictabler does not aggregate data but offers more control of table structure.

The latest version of the pivottabler package can be obtained directly from the package repository. Please log any questions not answered by the vignettes or any bug reports here.

Pivot Tables

Quick Example

An example of a pivot table showing the number of trains operated by different train companies is:

library(pivottabler)
# arguments:  qhpvt(dataFrame, rows, columns, calculations, ...)
qhpvt(bhmtrains, "TOC", "TrainCategory", "n()") # TOC = Train Operating Company 

This example pivot table is explained in more detail later in this vignette.

Examples of pivot tables containing other types of calculation can be found in the examples gallery later in the vignette.

Definition

Pivot tables are a common technique for summarising large tables of data into smaller and more easily understood summary tables to answer specific questions.

Starting from a specific question that requires answering, the variables relevant to the question are identified. The distinct values of the fixed variables^[The terms "fixed variables" and "measured variables" are used here as in Wickham 2014] are rendered as a mixture of row and column headings in the summary table. One or more aggregations of the (numerical) measured variables are added into the body of the table, where the row/column headings act as data groups. The summary table should then yield an answer to the original question.

In reality

The definition above is probably more difficult to understand than just looking at some examples - several are presented in this vignette. An extended definition is also provided by Wikipedia.

Pivot tables can be found in everyday use within many commercial and non-commercial organisations. Pivot tables feature prominently in applications such as Microsoft Excel, Open Office, etc. More advanced forms are found in Business Intelligence (BI) and Online Analytical Processing (OLAP) tools.

Sample Data: Trains in Birmingham

To build a series of example pivot tables, we will use the bhmtrains data frame. This contains all 83,710 trains that arrived into and/or departed from Birmingham New Street railway station between 1st December 2016 and 28th February 2017. As an example, the following are four trains that arrived into Birmingham New Street at the very start of this time period - note the data has been transposed (otherwise the table would be very wide).

library(pivottabler)
renderBasicTable(t(bhmtrains[1:4,]),rowNamesAsHeader=TRUE, 
                 columnAlignment=c("left", "left", "left", "left", "left"))

GbttArrival and GbttDeparture are the scheduled arrival and departure times of the trains at Birmingham New Street, as advertised in the Great Britain Train Timetable (GBTT). Also given are the actual arrival and departure times of the trains at Birmingham New Street. Note that all four of the trains above terminated at New Street, hence they have arrival times but no departure times. The origin and destination stations of each of the trains is also included, in the form of three letter station codes, e.g. BHM = Birmingham New Street. The trainstations data frame (used later in this vignette) includes a lookup from the code to the full station name for all stations.

The first train above:

Basic Pivot Table

Suppose we want to answer the question: How many ordinary/express passenger trains did each train operating company (TOC) operate in the three month period?

The following code will generate the relevant pivot table:

library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$renderPivot()

The code above is the verbose version of the quick-pivot example near the start of this vignette (which used the qhpvt() function). Both produce the same pivot table and output, but the verbose version helps more clearly explain the steps involved in constructing the pivot table.

Each line above works as follows:

  1. Load the namespace of the pivottabler library.
  2. Create a new pivot table instance^[pivottabler is implemented in R6 Classes so pt here is an instance of the R6 PivotTable class.].
  3. Specify the data frame that contains the data for the pivot table.
  4. Add the distinct values from the TrainCategory column in the data frame as columns in the pivot table.
  5. Add the distinct values from the TOC column in the data frame as rows in the pivot table.
  6. Specify the calculation. The summarise expression must be an expression that can be used with the dplyr summarise() function. This expression is used internally by the pivottabler package with the dplyr summarise function. pivottabler also supports data.table - see the Performance vignette for more details.
  7. Generate the pivot table.

Constructing the Basic Pivot Table

The following examples show how each line in the above example constructs the pivot table. To improve readability, each code change is highlighted.

# produces no pivot table
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$renderPivot()
# specify the column headings
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")   #    << **** LINE ADDED **** <<
pt$renderPivot()
# specify the row headings
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC")                #    << **** LINE ADDED **** <<
pt$renderPivot()
# specifying a calculation
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC")                #     **** LINE BELOW ADDED ****
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$renderPivot()

Outputting the Pivot Table as Plain Text

The pivot table can be rendered as plain text to the console by using pt:

library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$evaluatePivot()
pt

Extending the Basic Pivot Table

There follows below a progressive series of changes to the basic pivot table shown above. Each change is made by adding or changing one line of code. Again, to improve readability, each code change is highlighted.

First, adding an additional column data group to sub-divide each "TrainCategory" by "PowerType":

library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")    #    << **** CODE CHANGE **** <<
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$renderPivot()

By default, the new data group does not expand the existing "TrainCategory" total. However, an additional argument allows the total column to also be expanded:

library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType", expandExistingTotals=TRUE) # << ** CODE CHANGE ** <<
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$renderPivot()

Instead of adding "PowerType" as columns, it can also be added as rows:

library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC")
pt$addRowDataGroups("PowerType")    #    << **** CODE CHANGE **** <<
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$renderPivot()

It is possible to continue adding additional data groups. The pivottabler does not enforce a maximum depth of data groups. For example, adding the maximum scheduled speed to the rows:

library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC")
pt$addRowDataGroups("PowerType")
pt$addRowDataGroups("SchedSpeedMPH")    #    << **** CODE CHANGE **** <<
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$renderPivot()

As more data groups are added, the pivot table can seem overwhelmed with totals. It is possible to selectively show/hide totals using the addTotal argument. Totals can be renamed using the totalCaption argument. Both of these options are demonstrated below.

library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC", totalCaption="Grand Total")    #    << **** CODE CHANGE **** <<
pt$addRowDataGroups("PowerType")
pt$addRowDataGroups("SchedSpeedMPH", addTotal=FALSE)      #    << **** CODE CHANGE **** <<
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$renderPivot()

This can then be rendered in outline layout:

library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC", 
                    outlineBefore=list(isEmpty=FALSE, 
                                       groupStyleDeclarations=list(color="blue")), 
                    outlineTotal=list(groupStyleDeclarations=list(color="blue")))
pt$addRowDataGroups("PowerType", addTotal=FALSE)
pt$addRowDataGroups("SchedSpeedMPH", addTotal=FALSE)
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$renderPivot()

Outline Layout

Outline layout renders row data groups as headings:

library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC",
                    outlineBefore=list(groupStyleDeclarations=list(color="blue")),
                    outlineAfter=list(isEmpty=FALSE,
                                      mergeSpace="dataGroupsOnly",
                                      caption="Total ({value})",
                                      groupStyleDeclarations=list("font-style"="italic")),
                    outlineTotal=list(groupStyleDeclarations=list(color="blue"),
                                      cellStyleDeclarations=list("color"="blue")))
pt$addRowDataGroups("PowerType", addTotal=FALSE)
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$renderPivot()

Quick-Pivot Functions

To construct basic pivot tables quickly, three functions are provided that can construct pivot tables with one line of R:

These functions do not offer all of the options that are available when constructing a pivot table using the more verbose syntax.

The arguments to all three functions are essentially the same:

Specifying "=" in either the rows or columns vectors sets the position of the calculations in the row/column headings.

A basic example of quickly printing a pivot table to the console:

library(pivottabler)
qpvt(bhmtrains, "TOC", "TrainCategory", "n()")

A slightly more complex pivot table being quickly rendered as a HTML widget, where the calculation headings are on the rows:

library(pivottabler)
qhpvt(bhmtrains, c("=", "TOC"), c("TrainCategory", "PowerType"),
    c("Number of Trains"="n()", "Maximum Speed"="max(SchedSpeedMPH, na.rm=TRUE)"))

A quick pivot table with a format specified:

library(pivottabler)
qhpvt(bhmtrains, "TOC", "TrainCategory", "mean(SchedSpeedMPH, na.rm=TRUE)", format="%.0f")

A quick pivot table with two calculations that are formatted differently:

library(pivottabler)
qhpvt(bhmtrains, "TOC", "TrainCategory", 
     c("Mean Speed"="mean(SchedSpeedMPH, na.rm=TRUE)", "Std Dev Speed"="sd(SchedSpeedMPH, na.rm=TRUE)"), 
     formats=list("%.0f", "%.1f"))

In the above pivot table, the "Total" would be better renamed to something like "All" or "Overall" since a total for a mean or standard deviation does not make complete sense.

Totals can be controlled using the totals argument. This works as follows:

Returning to the previous quick pivot example, the totals can now be renamed to "All ..." using:

library(pivottabler)
qhpvt(bhmtrains, "TOC", "TrainCategory", 
     c("Mean Speed"="mean(SchedSpeedMPH, na.rm=TRUE)", "Std Dev Speed"="sd(SchedSpeedMPH, na.rm=TRUE)"), 
     formats=list("%.0f", "%.1f"), totals=list("TOC"="All TOCs", "TrainCategory"="All Categories"))

Examples Gallery

This section shows some examples from the other vignettes as a quick overview of some of the other capabilities of the pivottabler package. The R scripts to create each example below can be found in the other vignettes.

Multiple Levels & Formatted Data Groups

# derive the date of each train (from the arrival/dep times),
# then the month of each train from the date of each train
library(dplyr)
library(lubridate)
trains <- mutate(bhmtrains, 
   GbttDate=if_else(is.na(GbttArrival), GbttDeparture, GbttArrival),
   GbttMonth=make_date(year=year(GbttDate), month=month(GbttDate), day=1))

library(pivottabler)
pt <- PivotTable$new()
pt$addData(trains)
pt$addColumnDataGroups("GbttMonth", dataFormat=list(format="%B %Y"))
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$renderPivot()

See the Data Groups vignette for more details.

Multiple Calculations & Calculations on Rows

library(pivottabler)
library(dplyr)
library(lubridate)

# derive some additional data
trains <- mutate(bhmtrains,
   ArrivalDelta=difftime(ActualArrival, GbttArrival, units="mins"),
   ArrivalDelay=ifelse(ArrivalDelta<0, 0, ArrivalDelta))

# create the pivot table
pt <- PivotTable$new()
pt$addData(trains)
pt$addColumnDataGroups("TOC", totalCaption="All TOCs")   #  << ***** CODE CHANGE ***** <<
pt$defineCalculation(calculationName="TotalTrains", caption="Total Trains", 
                     summariseExpression="n()")
pt$defineCalculation(calculationName="MinArrivalDelay", caption="Min Arr. Delay", 
                     summariseExpression="min(ArrivalDelay, na.rm=TRUE)")
pt$defineCalculation(calculationName="MaxArrivalDelay", caption="Max Arr. Delay", 
                     summariseExpression="max(ArrivalDelay, na.rm=TRUE)")
pt$defineCalculation(calculationName="MeanArrivalDelay", caption="Mean Arr. Delay", 
                     summariseExpression="mean(ArrivalDelay, na.rm=TRUE)", format="%.1f")
pt$defineCalculation(calculationName="MedianArrivalDelay", caption="Median Arr. Delay", 
                     summariseExpression="median(ArrivalDelay, na.rm=TRUE)")
pt$defineCalculation(calculationName="IQRArrivalDelay", caption="Delay IQR", 
                     summariseExpression="IQR(ArrivalDelay, na.rm=TRUE)")
pt$defineCalculation(calculationName="SDArrivalDelay", caption="Delay Std. Dev.", 
                     summariseExpression="sd(ArrivalDelay, na.rm=TRUE)", format="%.1f")

pt$addRowCalculationGroups()                             #  << ***** CODE CHANGE ***** <<
pt$renderPivot()

See the Calculations vignette for more details.

Custom Calculations

library(pivottabler)
library(dplyr)
library(lubridate)

# derive some additional data
trains <- mutate(bhmtrains,
   GbttDateTime=if_else(is.na(GbttArrival), GbttDeparture, GbttArrival),
   GbttDate=make_date(year=year(GbttDateTime), month=month(GbttDateTime), day=day(GbttDateTime)),
   GbttMonth=make_date(year=year(GbttDateTime), month=month(GbttDateTime), day=1),
   ArrivalDelta=difftime(ActualArrival, GbttArrival, units="mins"),
   ArrivalDelay=ifelse(ArrivalDelta<0, 0, ArrivalDelta),
   DelayedByMoreThan5Minutes=ifelse(ArrivalDelay>5,1,0))

# custom calculation function
getWorstSingleDayPerformance <- function(pivotCalculator, netFilters, format, baseValues, cell) {
  # get the data frame
  trains <- pivotCalculator$getDataFrame("trains")
  # apply the TOC and month filters coming from the headers in the pivot table
  filteredTrains <- pivotCalculator$getFilteredDataFrame(trains, netFilters)
  # calculate the percentage of trains more than five minutes late by date
  dateSummary <- filteredTrains %>%
    group_by(GbttDate) %>%
    summarise(DelayedPercent = sum(DelayedByMoreThan5Minutes, na.rm=TRUE) / n() * 100) %>%
    arrange(desc(DelayedPercent))
  # top value
  tv <- dateSummary$DelayedPercent[1]
  date <- dateSummary$GbttDate[1]             #     <<  CODE CHANGE  <<
  # build the return value
  value <- list()
  value$rawValue <- tv
  value$formattedValue <- paste0(format(      #     <<  CODE CHANGE (AND BELOW)  <<
    date, format="%a %d"), ":  ", pivotCalculator$formatValue(tv, format=format))
  return(value)
}

# create the pivot table
pt <- PivotTable$new()
pt$addData(trains, "trains")
pt$addColumnDataGroups("GbttMonth", dataFormat=list(format="%B %Y"))
pt$addRowDataGroups("TOC", totalCaption="All TOCs")
pt$defineCalculation(calculationName="WorstSingleDayDelay", format="%.1f %%",
                     type="function", calculationFunction=getWorstSingleDayPerformance)
pt$renderPivot()

See the Calculations vignette for more details.

Outline Layout

library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC",
                    outlineBefore=list(isEmpty=FALSE, 
                                       groupStyleDeclarations=list(color="blue")),
                    outlineTotal=list(groupStyleDeclarations=list(color="blue"),
                                      cellStyleDeclarations=list("color"="blue")))
pt$addRowDataGroups("PowerType", addTotal=FALSE)
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$renderPivot()
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$defineCalculation(calculationName="NumberOfTrains", caption="Number of Trains",
                     summariseExpression="n()")
pt$defineCalculation(calculationName="MaximumSpeedMPH", caption="Maximum Speed (MPH)",
                     summariseExpression="max(SchedSpeedMPH, na.rm=TRUE)")
pt$addColumnDataGroups("PowerType")
pt$addRowCalculationGroups(outlineBefore=list(isEmpty=FALSE, mergeSpace="dataGroupsOnly", 
                                              groupStyleDeclarations=list(color="blue"), 
                                              cellStyleDeclarations=list(color="blue")))
pt$addRowDataGroups("TOC", addTotal=FALSE)
pt$renderPivot()
df <- data.frame(
  Level1 = rep("Net entrepreneurial income", times=12),
  Level2 = c(rep("Net operating surplus", 9), rep("Interests and rents", 3)),
  Level3 = c(rep("Factor income", 8),"Compensation of employees","Paid rent",
             "Paid interest","Received interest"),
  Level4 = c(rep("Net value added", 6), rep("Taxes and subsidies", 2), rep(NA, 4)),
  Level5 = c(rep("Gross value added", 5),"Depreciation","Other taxes on production",
             "Other subsidies (non-product specific)", rep(NA, 4)),
  Level6 = c(rep("Production of the agricultural industry", 4),
             "Intermediate services", rep(NA, 7)),
  Level7 = c("Crop production","Livestock production",
             "Production of agricultural services","Other production", rep(NA, 8)),
  MaxGroupLevel = c(7,7,7,7,6,5,5,5,3,3,3,3),
  Budget2019 = c(4150.39,4739.2,625.6,325.8,-6427,-2049.3,
                 -145.4,2847.3,-1149,-221.2,-307.6,12.8),
  Actual2019 = c(3978.8,4341.1,603.7,343,-6063.9,-2079.6,
                 -136.8,2578.6,-1092.9,-203.3,-327.6,14.1),
  Budget2020 = c(4210.9,4857.7,676.6,405.8,-6299,-2086.7,
                 -145.4,2920.6,-1245,-236.5,-244.7,10.1),
  Actual2020 = c(4373.7,5307.6,693.9,408.2,-7065.3,-1985,
                 -154.2,3063,-1229.3,-268.2,-250.3,11.1)
)

library(pivottabler)
pt <- PivotTable$new() 
pt$setDefault(addTotal=FALSE, outlineBefore=list(isEmpty=FALSE))
pt$addData(df)
pt$addRowDataGroups("Level1", outlineBefore=TRUE, 
        onlyAddOutlineChildGroupIf="MaxGroupLevel>1")
pt$addRowDataGroups("Level2", outlineBefore=TRUE, 
        onlyAddOutlineChildGroupIf="MaxGroupLevel>2", 
        dataSortOrder="custom", 
        customSortOrder=c("Net operating surplus", "Interests and rents"))
pt$addRowDataGroups("Level3", outlineBefore=TRUE, 
        onlyAddOutlineChildGroupIf="MaxGroupLevel>3", 
        dataSortOrder="custom", 
        customSortOrder=c("Factor income", "Compensation of employees",
                          "Paid rent", "Paid interest", "Received interest"))
pt$addRowDataGroups("Level4", outlineBefore=TRUE, 
        onlyAddOutlineChildGroupIf="MaxGroupLevel>4")
pt$addRowDataGroups("Level5", outlineBefore=TRUE, 
        onlyAddOutlineChildGroupIf="MaxGroupLevel>5",
        dataSortOrder="custom", 
        customSortOrder=c("Gross value added", "Depreciation",
                          "Other taxes on production", 
                          "Other subsidies (non-product specific)"))
pt$addRowDataGroups("Level6", outlineBefore=TRUE, 
        onlyAddOutlineChildGroupIf="MaxGroupLevel>6",
        dataSortOrder="custom", 
        customSortOrder=c("Production of the agricultural industry", 
                          "Intermediate Services"))
pt$addRowDataGroups("Level7", dataSortOrder="custom", 
                    customSortOrder=c("Crop production", "Livestock production",
                    "Production of agricultural services", "Other production"))
pt$defineCalculation(calculationName="Budget", 
                     summariseExpression="sum(Budget2020)")
pt$defineCalculation(calculationName="Actual", 
                     summariseExpression="sum(Actual2020)")
pt$defineCalculation(calculationName="Variance", 
                     summariseExpression="sum(Actual2020)-sum(Budget2020)", 
                     format="%.1f")
pt$evaluatePivot()

# get the row groups relating to outline groups 
# (above leaf level) and leaf level groups
grps <- pt$findRowDataGroups(outlineGroups="only", 
                             outlineLinkedGroupExists=FALSE)
grps <- c(grps, pt$findRowDataGroups(atLevel=7))
# set the styling of these groups so the text isn't bold
pt$setStyling(groups=grps, declarations =list("font-weight"="normal"))

# find the cells corresponding to these groups
rowNumbers <- sapply(grps, pt$findGroupRowNumbers)
cells <- pt$getCells(rowNumbers=rowNumbers)
# set the styling of these cells to be normal cells 
# instead of the darker outline styling
pt$setStyling(cells=cells, baseStyleName="Cell")

# apply the red style for negative variance
cells <- pt$findCells(calculationNames="Variance", 
                      minValue=-1000, maxValue=0, 
                      includeNull=FALSE, includeNA=FALSE)
pt$setStyling(cells=cells, declarations=list("color"="#9C0006"))
# apply the green style for positive variance
cells <- pt$findCells(calculationNames="Variance", 
                      minValue=0, maxValue=10000, 
                      includeNull=FALSE, includeNA=FALSE)
pt$setStyling(cells=cells, declarations=list("color"="#006100"))

# draw the pivot table
pt$renderPivot()

See the Regular Layout vignette for more details.

Results as a Matrix

library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$evaluatePivot()
pt$asDataMatrix()

See the Outputs vignette for more details, including other conversion options such as converting a pivot table to a data frame.

Latex Output

Latex Output

See the Latex Output vignette for more details.

Styling

# define the colours
blue1Colors <- list(
  headerBackgroundColor = "rgb(68, 114, 196)",
  headerColor = "rgb(255, 255, 255)",
  cellBackgroundColor = "rgb(255, 255, 255)",
  cellColor = "rgb(0, 0, 0)",
  totalBackgroundColor = "rgb(186, 202, 233)",
  totalColor = "rgb(0, 0, 0)",
  borderColor = "rgb(48, 84, 150)"
)
# define the theme
theme <- getSimpleColoredTheme(parentPivot=pt, colors=blue1Colors, fontName="Verdana, Arial")
# create the pivot table
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$theme <- theme
pt$renderPivot(styleNamePrefix="t3")

See the Styling vignette for more details.

Finding and Formatting Data Groups

library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
highlight <- PivotStyle$new(pt, "cellHighlight", list("background-color"="#FFFF00"))
groups <- pt$findColumnDataGroups(
  variableValues=list("TrainCategory"="Ordinary Passenger"), 
  includeDescendantGroup=TRUE)
groupCount <- lapply(groups, function(grp) {grp$style <- highlight})
pt$renderPivot()

See the Finding and Formatting vignette for more details.

Finding and Formatting Cells

library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
pt$addColumnDataGroups("PowerType")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$evaluatePivot()
highlight <- PivotStyle$new(pt, "cellHighlight", list("background-color"="#00FF00"))
cells <- pt$getCells(specifyCellsAsList=TRUE, rowNumbers=2, columnNumbers=4, cellCoordinates=list(c(5, 7)))
cellCount <- lapply(cells, function(cell) {cell$style <- highlight})
pt$renderPivot()

See the Finding and Formatting vignette for more details.

Conditional Formatting

# calculate arrival delay information
library(dplyr)
library(lubridate)
library(pivottabler)

stations <- mutate(trainstations, CrsCodeChr=as.character(CrsCode))

topOrigins <- bhmtrains %>%
  mutate(OriginChr=as.character(Origin)) %>%
  filter(Origin != "BHM") %>%
  group_by(OriginChr) %>%
  summarise(TotalTrains = n()) %>%
  ungroup() %>%
  top_n(20, TotalTrains)

trains <- bhmtrains %>%
  mutate(OriginChr=as.character(Origin), DestinationChr=as.character(Destination)) %>%
  inner_join(topOrigins, by=c("OriginChr"="OriginChr")) %>%
  inner_join(stations, by=c("OriginChr"="CrsCodeChr")) %>%
  inner_join(stations, by=c("DestinationChr"="CrsCodeChr")) %>%
  select(TOC, TrainCategory, PowerType, Origin=StationName.x, 
         GbttArrival, ActualArrival, GbttDeparture, ActualDeparture) %>%
  mutate(GbttDateTime=if_else(is.na(GbttArrival), GbttDeparture, GbttArrival),
         GbttHourOfDay=hour(GbttDateTime),
         ArrivalDeltaMins=difftime(ActualArrival, GbttArrival, units="mins"),
         ArrivalDelayMins=ifelse(ArrivalDeltaMins<0, 0, ArrivalDeltaMins)) %>%
  filter(GbttHourOfDay %in% c(5, 6, 7, 8, 9, 10)) %>%
  select(TOC, TrainCategory, PowerType, Origin, GbttHourOfDay, ArrivalDelayMins)

# create the pivot table
library(pivottabler)
pt <- PivotTable$new()
pt$addData(trains)
pt$addColumnDataGroups("GbttHourOfDay")
pt$addRowDataGroups("Origin")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()", visible=FALSE)
pt$defineCalculation(calculationName="TotalDelayMins", 
                     summariseExpression="sum(ArrivalDelayMins, na.rm=TRUE)", visible=FALSE)
pt$defineCalculation(calculationName="AvgDelayMins", type="calculation", 
                     basedOn=c("TotalDelayMins", "TotalTrains"),
                     calculationExpression="values$TotalDelayMins/values$TotalTrains",
                     format="%.1f")
pt$evaluatePivot()

# colour scale helper functions
scaleNumber <- function(n1, n2, vMin, vMax, value) {
  if(n1==n2) return(n1)
  v <- value
  if(v < vMin) v <- vMin
  if(v > vMax) v <- vMax
  if(n1<n2) {
    return(n1+((v-vMin)/(vMax-vMin)*(n2-n1)))
  }
  else {
    return(n1-((v-vMin)/(vMax-vMin)*(n1-n2)))
  }
}
scale2Colours <- function(clr1, clr2, vMin, vMax, value) {
  r <- round(scaleNumber(clr1$r, clr2$r, vMin, vMax, value))
  g <- round(scaleNumber(clr1$g, clr2$g, vMin, vMax, value))
  b <- round(scaleNumber(clr1$b, clr2$b, vMin, vMax, value))
  return(paste0("#",format(as.hexmode(r), width=2), format(as.hexmode(g), width=2), format(as.hexmode(b), width=2)))
}
scale3Colours <- function(clr1, clr2, clr3, vMin, vMid, vMax, value) {
  if(value <= vMid) return(scale2Colours(clr1, clr2, vMin, vMid, value))
  else return(scale2Colours(clr2, clr3, vMid, vMax, value))
}
hexToClr <- function(hexclr) {
  clr <- list()
  clr$r <- strtoi(paste0("0x", substr(hexclr, 2, 3)))
  clr$g <- strtoi(paste0("0x", substr(hexclr, 4, 5)))
  clr$b <- strtoi(paste0("0x", substr(hexclr, 6, 7)))
  return(clr)
}

# colour constants
textClrGreen <- hexToClr("#006100")
textClrYellow <- hexToClr("#9C5700")
textClrRed <- hexToClr("#9C0006")
backClrGreen <- hexToClr("#C6EFCE")
backClrYellow <- hexToClr("#FFEB9C")
backClrRed <- hexToClr("#FFC7CE")

# specify some conditional formatting, calculating the appropriate text colour and back colour for each cell.
cells <- pt$findCells(includeNull=FALSE, includeNA=FALSE)
for(i in 1:length(cells)) {
  cell <- cells[[i]]
  value <- cell$rawValue
  textClr <- scale3Colours(textClrGreen, textClrYellow, textClrRed, 0.5, 2, 4, value)
  backClr <- scale3Colours(backClrGreen, backClrYellow, backClrRed, 0.5, 2, 4, value)
  cell$style <- PivotStyle$new(pt, "", list("background-color"=backClr, "color"=textClr))
}
pt$renderPivot()

See the Finding and Formatting vignette for more details.

Mixing Data Groups and/or Calculations

library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
cg1 <- pt$columnGroup$addChildGroup(variableName="TrainCategory", values="Express Passenger")
cg2 <- pt$columnGroup$addChildGroup(variableName="PowerType", values="DMU")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$defineCalculation(calculationGroupName="calcGrp2", calculationName="MaxSpeedMPH", 
                     summariseExpression="max(SchedSpeedMPH, na.rm=TRUE)")
cg3 <- cg1$addChildGroup(caption="Count")
cg4 <- cg2$addChildGroup(caption="Maximum Speed")
cg3$addCalculationGroups("default")
cg4$addCalculationGroups("calcGrp2")
pt$renderPivot()

See the Irregular Layout vignette for more details.

Combining Pivot Tables

library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$addColumnDataGroups("TrainCategory")
# Rows: TOC breakdown
grp1a <- pt$addRowGroup(caption="By TOC", isOutline=TRUE, isEmpty=TRUE, 
                        sortAnchor="next", styleDeclarations=list(color="blue"))
grp1b <- pt$addRowGroup()
grp1b$addDataGroups("TOC", addTotal=FALSE)
# Rows: Power Type breakdown
grp2a <- pt$addRowGroup(caption="By Power Type", isOutline=TRUE, isEmpty=TRUE, 
                        sortAnchor="next", styleDeclarations=list(color="blue"))
grp2b <- pt$addRowGroup()
grp2b$addDataGroups("PowerType", addTotal=FALSE)
# Rows: Total
grp3 <- pt$addRowGroup(caption="Total", isOutline=TRUE, 
                       styleDeclarations=list(color="blue"))
# Finish...
pt$defineCalculation(calculationName="TotalTrains", summariseExpression="n()")
pt$renderPivot()

See the Irregular Layout vignette for more details.

Further Reading

The full set of vignettes is:

  1. Introduction
  2. Data Groups
  3. Calculations
  4. Regular Layout
  5. Outputs
  6. Latex Output
  7. Styling
  8. Finding and Formatting
  9. Cell Context
  10. Navigating a Pivot Table
  11. Irregular Layout
  12. Performance
  13. Excel Export
  14. Shiny
  15. Appendix: Details
  16. Appendix: Calculations
  17. Appendix: Class Overview


cbailiss/pivottabler documentation built on Oct. 14, 2023, 9:38 a.m.