ggslopegraph: Create Slopegraph from a data frame using ggplot2

Description Usage Arguments Details Value See Also Examples

View source: R/ggslopegraph.R

Description

Convert an R data frame (containing a panel dataset, where rows are observations and columns are time periods) into an Edward Tufte-inspired Slopegraph using ggplot2

Usage

1
2
3
4
5
6
7
8
ggslopegraph(data, main = NULL, xlab = "", ylab = "",
  xlabels = names(data), xlim = c(-1L, ncol(data) + 2L),
  ylim = range(data, na.rm = TRUE), labpos.left = 0.8,
  labpos.right = ncol(data) + 0.2, leftlabels = NULL, rightlabels = NULL,
  xbreaks = seq_along(xlabels), ybreaks = NULL, yrev = ylim[1] > ylim[2],
  decimals = 0L, col.lines = "black", col.lab = "black",
  col.num = "black", lwd = 0.5, offset.x = NULL, cex.lab = 3L,
  cex.num = 3L, na.span = FALSE)

Arguments

data

An observation-by-period data.frame, with at least two columns. Missing values are allowed.

main

A character string specifying a title. Passed to ggtitle.

xlab

A character string specifying an x-axis label. Passed to scale_x_continuous.

ylab

A character string specifying an y-axis label. Passed to scale_y_continuous, or scale_y_reverse if yrev = TRUE.

xlabels

The labels to use for the slopegraph periods. Default is names(data).

xlim

A two-element numeric vector specifying the y-axis limits.

ylim

A two-element numeric vector specifying the y-axis limits.

labpos.left

A numeric value specifying the x-axis position of the left-side observation labels. If NULL, labels are omitted.

labpos.right

A numeric value specifying the x-axis position of the right-side observation labels. If NULL, labels are omitted.

leftlabels

The parameter for the rightside observation labels. Default is using row indexes.

rightlabels

The parameter for the rightside observation labels. Default is using row indexes.

xbreaks

Passed to breaks in scale_x_continuous.

ybreaks

Passed to breaks in scale_y_continuous.

yrev

A logical indicating whether to use scale_y_reverse rather than the default scale_y_continuous.

decimals

The number of decimals to display for values in the plot. Default is 0 (none).

col.lines

A vector of colors for the slopegraph lines. Default is par('fg').

col.lab

A vector of colors for the observation labels. Default is par('fg').

col.num

A vector of colors for the number values. Default is par('fg'). If NA, labels are not drawn.

lwd

A vector of line width values for the slopegraph lines.

offset.x

A small offset for segments, to be used when positioning the numeric values. Default is NULL (set automatically based on the data.

cex.lab

A numeric value indicating the size of row labels. Default is 3. See geom_text.

cex.num

A numeric value indicating the size of numeric labels. Default is 3. See geom_text.

na.span

A logical indicating whether line segments should span periods with missing values. The default is FALSE, such that some segments are not drawn.

Details

A slopegraph is an interesting visualization because it involves the representation of a simple observation-by-period matrix of data values as a plot but the production of that visualization entails a number of data transformations that are not immediately obvious from the visual simplicity of the graph itself.

Specifically, a slopegraph involves three distinct visual components, each of which must be drawn using a slightly different data structure. Those elements are: (1) the observation labels, (2) the numeric value labels of each observation-period data point, and (3) the line segments connecting the numeric labels. To draw these three elements requires transforming the input into three different data structures.

First, to draw the observation labels requires constructing a new data frame containing the observation labels (from the input data frame's rownames attribute), the constant x-left and x-right label positions, and the vertical positions of the left- and right-side labels.

Second, to draw the numeric value labels requires creating a “tidy” data frame based upon the positions of the values in the input data frame. Specifically, a tidy representation of the data is a two-column data frame containing: (1) the column value of each data point (identified by col) to specify horizontal position, and (2) the value of the data point itself which is also its vertical position. This consists of a basic wide-to-long reshape procedure (using reshape).

Third, to draw the line segments requires creating a “tidy” data frame that consists of one row for each segment, by identifying row-adjacent values and identifying variables for x1 and x2 and y1 and y2 end-points of each segment. Another “row” identifying variable is needed to relationally map this data frame back to the original observations (e.g., to color the segments). This step is performed by segmentize.

Value

A ggplot object.

See Also

For a base graphics version, use slopegraph.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
require("ggplot2")
## Tufte's Cancer Graph (to the correct scale)
data(cancer)
ggslopegraph(cancer, col.lines = 'gray', 
  xlabels = c('5 Year','10 Year','15 Year','20 Year'))

## Tufte's GDP Graph
data(gdp)
ggslopegraph(gdp, col.line='gray', xlabels = c('1970','1979'), 
    main = 'Current Receipts of Goverment\nas a Percentage of Gross Domestic Product') + 
  theme_bw()

## Ranking of U.S. State populations
data(states)
ggslopegraph(states, 
  main = 'Relative Rank of U.S. State Populations, 1790-1870', 
  yrev = TRUE)

cls <- rep("black", nrow(states))
cls[rownames(states) == "South Carolina"] <- "red"
cls[rownames(states) == "Tennessee"] <- "blue"
ggslopegraph(states, main = 'Relative Rank of U.S. State Populations, 1790-1870', 
             yrev = TRUE, col.lines = cls, col.lab = cls)

## ranking of U.S. Bachelors Degrees fields
data(bachelors)
bachelors[] <- lapply(bachelors, function(x) rank(x))
names(bachelors) <- substring(names(bachelors), 3, 7)
ggslopegraph(bachelors, offset.x = 0, xlim = c(1, 25), col.num = NA, labpos.left = NULL)

leeper/slopegraph documentation built on May 21, 2019, 1:39 a.m.