GGally

# R output pre blocks are styled by default to indicate output
knitr::opts_chunk$set(
  comment = NA,
  cache = TRUE,
  fig.height = 8,
  fig.width = 10
)
suppressMessages(suppressWarnings(library(GGally)))

# shorthand for rd_link() - see ?packagedocs::rd_link for more information
rdl <- function(x) packagedocs::rd_link(deparse(substitute(x)))

GGally

Welcome to the GGally documentation page.

The following topic sections are alphabetically sorted.

GGally::ggcoef

The purpose of this function is to quickly plot the coefficients of a model.

Joseph Larmarange

May 16, 2016

Quick coefficients plot

To work automatically, this function requires the r rdl(broom) package. Simply call r rdl(ggcoef) with a model object. It could be the result of r rdl(lm), r rdl(glm) or any other model covered by r rdl(broom) and its r rdl(tidy) method^[See http://www.rdocumentation.org/packages/broom.].

reg <- lm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width, data = iris)
ggcoef(reg)

In the case of a logistic regression (or any other model for which coefficients are usually exponentiated), simply indicated exponentiate = TRUE. Note that a logarithmic scale will be used for the x-axis.

d <- as.data.frame(Titanic)
log.reg <- glm(Survived ~ Sex + Age + Class, family = binomial, data = d, weights = d$Freq)
ggcoef(log.reg, exponentiate = TRUE)

Customizing the plot

You can use conf.int, vline and exclude_intercept to display or not confidence intervals as error bars, a vertical line for x = 0 (or x = 1 if coeffcients are exponentiated) and the intercept.

ggcoef(reg, vline = FALSE, conf.int = FALSE, exclude_intercept = TRUE)

See the help page of r rdl(ggcoef) for the full list of arguments that could be used to personalize how error bars and the vertical line are plotted.

ggcoef(
  log.reg,
  exponentiate = TRUE,
  vline_color = "red",
  vline_linetype =  "solid",
  errorbar_color = "blue",
  errorbar_height = .25
)

Additional parameters will be passed to r rdl(geom_point).

ggcoef(log.reg, exponentiate = TRUE, color = "purple", size = 5, shape = 18)

Finally, you can also customize the aesthetic mapping of the points.

library(ggplot2)
ggcoef(log.reg, exponentiate = TRUE, mapping = aes(x = estimate, y = term, size = p.value)) +
  scale_size_continuous(trans = "reverse")

Custom data frame

You can also pass a custom data frame to r rdl(ggcoef). The following variables are expected:

cust <- data.frame(
  term = c("male vs. female", "30-49 vs. 18-29", "50+ vs. 18-29", "urban vs. rural"),
  estimate = c(.456, 1.234, 1.897, 1.003),
  conf.low = c(.411, 1.042, 1.765, 0.678),
  conf.high = c(.498, 1.564, 2.034, 1.476),
  variable = c("sex", "age", "age", "residence")
)
cust$term <- factor(cust$term, cust$term)
ggcoef(cust, exponentiate = TRUE)
ggcoef(
  cust,
  exponentiate = TRUE,
  mapping = aes(x = estimate, y = term, colour = variable),
  size = 5
)

GGally::ggduo

Barret Schloerke

July 4, 2016

The purpose of this function is to display two grouped data in a plot matrix. This is useful for canonical correlation analysis, multiple time series analysis, and regression analysis.

Canonical Correlation Analysis

This example is derived from

R Data Analysis Examples | Canonical Correlation Analysis. UCLA: Institute for Digital Research and Education. from http://www.stats.idre.ucla.edu/r/dae/canonical-correlation-analysis (accessed May 22, 2017).

Example 1. A researcher has collected data on three psychological variables, four academic variables (standardized test scores) and gender for 600 college freshman. She is interested in how the set of psychological variables relates to the academic variables and gender. In particular, the researcher is interested in how many dimensions (canonical variables) are necessary to understand the association between the two sets of variables."

data(psychademic)
str(psychademic)
(psych_variables <- attr(psychademic, "psychology"))
(academic_variables <- attr(psychademic, "academic"))

First, look at the within correlation using r rdl(ggpairs).

ggpairs(psychademic, psych_variables, title = "Within Psychological Variables")
ggpairs(psychademic, academic_variables, title = "Within Academic Variables")

Next, look at the between correlation using r rdl(ggduo).

ggduo(
  psychademic, psych_variables, academic_variables,
  types = list(continuous = "smooth_lm"),
  title = "Between Academic and Psychological Variable Correlation",
  xlab = "Psychological",
  ylab = "Academic"
)

Since ggduo does not have a upper section to display the correlation values, we may use a custom function to add the information in the continuous plots. The strips may be removed as each group name may be recovered in the outer axis labels.

lm_with_cor <- function(data, mapping, ..., method = "pearson") {
  x <- eval(mapping$x, data)
  y <- eval(mapping$y, data)
  cor <- cor(x, y, method = method)
  ggally_smooth_lm(data, mapping, ...) +
    ggplot2::geom_label(
      data = data.frame(
        x = min(x, na.rm = TRUE),
        y = max(y, na.rm = TRUE),
        lab = round(cor, digits = 3)
      ),
      mapping = ggplot2::aes(x = x, y = y, label = lab),
      hjust = 0, vjust = 1,
      size = 5, fontface = "bold",
      inherit.aes = FALSE # do not inherit anything from the ...
    )
}
ggduo(
  psychademic, rev(psych_variables), academic_variables,
  mapping = aes(color = sex),
  types = list(continuous = wrap(lm_with_cor, alpha = 0.25)),
  showStrips = FALSE,
  title = "Between Academic and Psychological Variable Correlation",
  xlab = "Psychological",
  ylab = "Academic",
  legend = c(5,2)
) +
  theme(legend.position = "bottom")

Multiple Time Series Analysis

While displaying multiple time series vertically over time, such as + facet_grid(time ~ .), r rdl(ggduo) can handle both continuous and discrete data. r rdl(ggplot2) does not mix discrete and continuous data on the same axis.

library(ggplot2)

data(pigs)
pigs_dt <- pigs[-(2:3)] # remove year and quarter
pigs_dt$profit_group <- as.numeric(pigs_dt$profit > mean(pigs_dt$profit))
qplot(
  time, value,
  data = reshape::melt.data.frame(pigs_dt, "time"),
  geom = c("smooth", "point")
) +
  facet_grid(variable ~ ., scales = "free_y")

Instead, we may use ggts to display the data. ggts changes the default behavior of ggduo of columnLabelsX to equal NULL and allows for mixed variable types.

# make the profit group as a factor value
profit_groups <- c(
  "1" = "high",
  "0" = "low"
)
pigs_dt$profit_group <- factor(
  profit_groups[as.character(pigs_dt$profit_group)],
  levels = unname(profit_groups),
  ordered = TRUE
)
ggts(pigs_dt, "time", 2:7)
# remove the binwidth warning
pigs_types <- list(
  comboHorizontal = wrap(ggally_facethist, binwidth = 1)
)
ggts(pigs_dt, "time", 2:7, types = pigs_types)
# add color and legend
pigs_mapping <- aes(color = profit_group)
ggts(pigs_dt, pigs_mapping, "time", 2:7, types = pigs_types, legend = c(6,1))

Produce more meaningful labels, add a legend, and remove profit group strips.

pm <- ggts(
  pigs_dt, pigs_mapping,
  1, 2:7,
  types = pigs_types,
  legend = c(6,1),
  columnLabelsY = c(
    "number of\nfirst birth sows",
    "sell price over\nfeed cost",
    "sell count over\nheard size",
    "meat head count",
    "breading\nheard size",
    "profit\ngroup"
  ),
  showStrips = FALSE
) +
  labs(fill = "profit group") +
  theme(
    legend.position = "bottom",
    strip.background = element_rect(
      fill = "transparent", color = "grey80"
    )
  )
pm

Regression Analysis

Since r rdl(ggduo) may take custom functions just like r rdl(ggpairs), we will make a custom function that displays the residuals with a red line at 0 and all other y variables will receive a simple linear regression plot.

Note: the marginal residuals are calculated before plotting and the y_range is found to display all residuals on the same scale.

swiss <- datasets::swiss

# add a 'fake' column
swiss$Residual <- seq_len(nrow(swiss))

# calculate all residuals prior to display
residuals <- lapply(swiss[2:6], function(x) {
  summary(lm(Fertility ~ x, data = swiss))$residuals
})
# calculate a consistent y range for all residuals
y_range <- range(unlist(residuals))

# custom function to display continuous data. If the y variable is "Residual", do custom work.
lm_or_resid <- function(data, mapping, ..., line_color = "red", line_size = 1) {
  if (as.character(mapping$y) != "Residual") {
    return(ggally_smooth_lm(data, mapping, ...))
  }

  # make residual data to display
  resid_data <- data.frame(
    x = data[[as.character(mapping$x)]],
    y = residuals[[as.character(mapping$x)]]
  )

  ggplot(data = data, mapping = mapping) +
    geom_hline(yintercept = 0, color = line_color, size = line_size) +
    ylim(y_range) +
    geom_point(data = resid_data, mapping = aes(x = x, y = y), ...)

}

# plot the data
ggduo(
  swiss,
  2:6, c(1,7),
  types = list(continuous = lm_or_resid)
)

# change line to be thicker and blue and the points to be slightly transparent
ggduo(
  swiss,
  2:6, c(1,7),
  types = list(
    continuous = wrap(lm_or_resid,
      alpha = 0.7,
      line_color = "blue",
      line_size = 3
    )
  )
)

GGally::glyphs

Hadley Wickham, Charlotte Wickham, Di Cook, Heike Hofmann

Nov 6, 2015

This function rearranges data to be able to construct a glyph plot

library(ggplot2)
data(nasa)
temp.gly <- glyphs(nasa, "long", "day", "lat", "surftemp", height=2.5)
ggplot(temp.gly, ggplot2::aes(gx, gy, group = gid)) +
  add_ref_lines(temp.gly, color = "grey90") +
  add_ref_boxes(temp.gly, color = "grey90") +
  geom_path() +
  theme_bw() +
  labs(x = "", y = "")

This shows a glyphplot of monthly surface temperature for 6 years over Central America. You can see differences from one location to another, that in large areas temperature doesn't change much. There are large seasonal trends in the top left over land.

Rescaling in different ways puts emphasis on different components, see the examples in the referenced paper. And with ggplot2 you can make a map of the geographic area underlying the glyphs.

References

Wickham, H., Hofmann, H., Wickham, C. and Cook, D. (2012) Glyph-maps for Visually Exploring Temporal Patterns in Climate Data and Models, Environmetrics, 23(5):151-182.

GGally::ggmatrix

Barret Schloerke

Oct 29, 2015

r rdl(ggmatrix) is a function for managing multiple plots in a matrix-like layout. It was designed to adapt to any number of columns and rows. This allows for very customized plot matrices.

Generic Example

The examples below use plots labeled 1 to 6 to distinguish where the plots are being placed.

plotList <- list()
for (i in 1:6) {
  plotList[[i]] <- ggally_text(paste("Plot #", i, sep = ""))
}

# bare minimum of plotList, nrow, and ncol
pm <- ggmatrix(plotList, 2, 3)
pm

# provide more information
pm <- ggmatrix(
  plotList,
  nrow = 2, ncol = 3,
  xAxisLabels = c("A", "B", "C"),
  yAxisLabels = c("D", "E"),
  title = "Matrix Title"
)
pm

# display plots in column order
pm <- ggmatrix(
  plotList,
  nrow = 2, ncol = 3,
  xAxisLabels = c("A", "B", "C"),
  yAxisLabels = c("D", "E"),
  title = "Matrix Title",
  byrow = FALSE
)
pm

Matrix Subsetting

Individual plots may be retrieved from the plot matrix and can be placed in the plot matrix.

pm <- ggmatrix(
  plotList,
  nrow = 2, ncol = 3,
  xAxisLabels = c("A", "B", "C"),
  yAxisLabels = c("D", "E"),
  title = "Matrix Title"
)
pm
p2 <- pm[1,2]
p3 <- pm[1,3]
p2
p3
pm[1,2] <- p3
pm[1,3] <- p2
pm

Themes

library(ggplot2)
pm <- ggmatrix(
  plotList,
  nrow = 2, ncol = 3,
  xAxisLabels = c("A", "B", "C"),
  yAxisLabels = c("D", "E"),
  title = "Matrix Title",
  byrow = FALSE
)
pm <- pm + theme_bw()
pm

Axis Control

The X and Y axis have booleans to turn on/off the individual plot's axes on the bottom and left sides of the plot matrix. To save time, showAxisPlotLabels can be set to override showXAxisPlotLabels and showYAxisPlotLabels.

pm <- ggmatrix(
  plotList, nrow = 2, ncol = 3,
  xAxisLabels = c("A", "B", "C"),
  yAxisLabels = c("D", "E"),
  title = "No Left Plot Axis",
  showYAxisPlotLabels = FALSE
)
pm
pm <- ggmatrix(
  plotList, nrow = 2, ncol = 3,
  xAxisLabels = c("A", "B", "C"),
  yAxisLabels = c("D", "E"),
  title = "No Bottom Plot Axis",
  showXAxisPlotLabels = FALSE
)
pm
pm <- ggmatrix(
  plotList, nrow = 2, ncol = 3,
  xAxisLabels = c("A", "B", "C"),
  yAxisLabels = c("D", "E"),
  title = "No Plot Axes",
  showAxisPlotLabels = FALSE
)
pm

Strips Control

By default, the plots in the top row and the right most column will display top-side and right-side strips respectively (showStrips = NULL). If all strips need to appear in each plot, showStrips may be set to TRUE. If all strips should not be displayed, showStrips may be set to FALSE.

data(tips, package = "reshape")
plotList <- list(
  qplot(total_bill, tip, data = subset(tips, smoker == "No" & sex == "Female")) +
    facet_grid(time ~ day),
  qplot(total_bill, tip, data = subset(tips, smoker == "Yes" & sex == "Female")) +
    facet_grid(time ~ day),
  qplot(total_bill, tip, data = subset(tips, smoker == "No" & sex == "Male")) +
    facet_grid(time ~ day),
  qplot(total_bill, tip, data = subset(tips, smoker == "Yes" & sex == "Male")) +
    facet_grid(time ~ day)
)


pm <- ggmatrix(
  plotList, nrow = 2, ncol = 2,
  yAxisLabels = c("Female", "Male"),
  xAxisLabels = c("Non Smoker", "Smoker"),
  title = "Total Bill vs Tip",
  showStrips = NULL # default
)
pm
pm <- ggmatrix(
  plotList, nrow = 2, ncol = 2,
  yAxisLabels = c("Female", "Male"),
  xAxisLabels = c("Non Smoker", "Smoker"),
  title = "Total Bill vs Tip",
  showStrips = TRUE
)
pm
pm <- ggmatrix(
  plotList, nrow = 2, ncol = 2,
  yAxisLabels = c("Female", "Male"),
  xAxisLabels = c("Non Smoker", "Smoker"),
  title = "Total Bill vs Tip",
  showStrips = FALSE
)
pm

GGally::ggnetworkmap

Amos Elberg

January 10, 2015

r rdl(ggnetworkmap) is a function for plotting elegant maps using r rdl(ggplot2). It builds on r rdl(ggnet) by allowing to draw a network over a map, and is particularly intended for use with r rdl(ggmap).

Example: US airports

This example is based on a tutorial by Nathan Yau at Flowing Data.

suppressMessages(library(network))
suppressMessages(library(sna))
suppressMessages(library(maps))
suppressMessages(library(ggplot2))
airports <- read.csv("http://datasets.flowingdata.com/tuts/maparcs/airports.csv", header = TRUE)
rownames(airports) <- airports$iata

# select some random flights
set.seed(1234)
flights <- data.frame(
  origin = sample(airports[200:400, ]$iata, 200, replace = TRUE),
  destination = sample(airports[200:400, ]$iata, 200, replace = TRUE)
)

# convert to network
flights <- network(flights, directed = TRUE)

# add geographic coordinates
flights %v% "lat" <- airports[ network.vertex.names(flights), "lat" ]
flights %v% "lon" <- airports[ network.vertex.names(flights), "long" ]

# drop isolated airports
delete.vertices(flights, which(degree(flights) < 2))

# compute degree centrality
flights %v% "degree" <- degree(flights, gmode = "digraph")

# add random groups
flights %v% "mygroup" <- sample(letters[1:4], network.size(flights), replace = TRUE)

# create a map of the USA
usa <- ggplot(map_data("usa"), aes(x = long, y = lat)) +
  geom_polygon(aes(group = group), color = "grey65",
               fill = "#f9f9f9", size = 0.2)

delete.vertices(flights, which(flights %v% "lon" < min(usa$data$long)))
delete.vertices(flights, which(flights %v% "lon" > max(usa$data$long)))
delete.vertices(flights, which(flights %v% "lat" < min(usa$data$lat)))
delete.vertices(flights, which(flights %v% "lat" > max(usa$data$lat)))

# overlay network data to map
ggnetworkmap(usa, flights, size = 4, great.circles = TRUE,
             node.group = mygroup, segment.color = "steelblue",
             ring.group = degree, weight = degree)

Example: Twitter spambots

This next example uses data from a Twitter spam community identified while exploring and trying to clear-up a group of tweets. After coloring the nodes based on their centrality, the odd structure stood out clearly.

data(twitter_spambots)
# create a world map
world <- fortify(map("world", plot = FALSE, fill = TRUE))
world <- ggplot(world, aes(x = long, y = lat)) +
  geom_polygon(aes(group = group), color = "grey65",
               fill = "#f9f9f9", size = 0.2)

# view global structure
ggnetworkmap(world, twitter_spambots)

Is the network really concentrated in the U.S.? Probably not. One of the odd things about the network, is a much higher proportion of the users gave locations that could be geocoded, than Twitter users generally.

Let's see the network topology

ggnetworkmap(net = twitter_spambots, arrow.size = 0.5)

Coloring nodes according to degree centrality can highlight network structures.

# compute indegree and outdegree centrality
twitter_spambots %v% "indegree" <- degree(twitter_spambots, cmode = "indegree")
twitter_spambots %v% "outdegree" <- degree(twitter_spambots, cmode = "outdegree")

ggnetworkmap(net = twitter_spambots,
             arrow.size = 0.5,
             node.group = indegree,
             ring.group = outdegree, size = 4) +
  scale_fill_continuous("Indegree", high = "red", low = "yellow") +
  labs(color = "Outdegree")

Some Twitter attributes have been included as vertex attributes.

# show some vertex attributes associated with each account
ggnetworkmap(net = twitter_spambots,
             arrow.size = 0.5,
             node.group = followers,
             ring.group = friends,
             size = 4,
             weight = indegree,
             label.nodes = TRUE, vjust = -1.5) +
  scale_fill_continuous("Followers", high = "red", low = "yellow") +
  labs(color = "Friends") +
  scale_color_continuous(low = "lightgreen", high = "darkgreen")

GGally::ggnostic

Barret Schloerke

Oct 1, 2016

r rdl(ggnostic) is a display wrapper to r rdl(ggduo) that displays full model diagnostics for each given explanatory variable. By default, r rdl(ggduo) displays the residuals, leave-one-out model sigma value, leverage points, and Cook's distance against each explanatory variable. The rows of the plot matrix can be expanded to include fitted values, standard error of the fitted values, standardized residuals, and any of the response variables. If the model is a linear model, stars are added according to the anova significance of each explanatory variable.

Most diagnostic plots contain reference line(s) to help determine if the model is fitting properly

Life Expectancy Model Fitting

Looking at the dataset r rdl(datasets::state.x77), we will fit a multiple regression model for Life Expectancy.

# make a data.frame and fix column names
state <- as.data.frame(state.x77)
colnames(state)[c(4, 6)] <- c("Life.Exp", "HS.Grad")
str(state)

# fit full model
model <- lm(Life.Exp ~ ., data = state)
# reduce to "best fit" model with
model <- step(model, trace = FALSE)
summary(model)

Next, we look at the variables for any high (|value| > 0.8) correlation values and general interaction behavior.

# look at variables for high correlation (none)
ggscatmat(state, columns = c("Population", "Murder", "HS.Grad", "Frost"))

All variables appear to be ok. Next, we look at the model diagnostics.

# look at model diagnostics
ggnostic(model)

Let's remove the largest data point first to try and define a better model.

# very high life expectancy
state[11, ]
state_no_hawaii <- state[-11, ]
model_no_hawaii <- lm(Life.Exp ~ Population + Murder + HS.Grad + Frost, data = state_no_hawaii)
ggnostic(model_no_hawaii)

There are no more outrageous Cook's distance values. The model without Hawaii appears to be a good fitting model.

summary(model)
summary(model_no_hawaii)

Since there is only a marginal improvement by removing Hawaii, the original model should be used to explain life expectancy.

Full diagnostic plot matrix example

The following lines of code will display different modle diagnostic plot matrices for the same statistical model. The first one is of the default settings. The second adds color according to the species. Finally, the third displays all possible columns and uses r rdl(ggally_smooth) to display the fitted points and response variables.

flea_model <- step(lm(head ~ ., data = flea), trace = FALSE)
summary(flea_model)
# default output
ggnostic(flea_model)
# color'ed output
ggnostic(flea_model, mapping = ggplot2::aes(color = species))
# full color'ed output
ggnostic(
  flea_model,
  mapping = ggplot2::aes(color = species),
  columnsY = c("head", ".fitted", ".se.fit", ".resid", ".std.resid", ".hat", ".sigma", ".cooksd"),
  continuous = list(default = ggally_smooth, .fitted = ggally_smooth)
)

GGally::ggpairs

Barret Schloerke

Oct 29, 2015

r rdl(ggpairs) is a special form of a r rdl(ggmatrix) that produces a pairwise comparison of multivariate data. By default, r rdl(ggpairs) provides two different comparisons of each pair of columns and displays either the density or count of the respective variable along the diagonal. With different parameter settings, the diagonal can be replaced with the axis values and variable labels.

There are many hidden features within ggpairs. Please take a look at the examples below to get the most out of ggpairs.

Columns and Mapping

The columns displayed default to all columns of the provided data. To subset to only a few columns, use the columns parameter.

data(tips, package = "reshape")
pm <- ggpairs(tips)
pm
## too many plots for this example.

## reduce the columns being displayed
## these two lines of code produce the same plot matrix
pm <- ggpairs(tips, columns = c(1, 6, 2))
pm <- ggpairs(tips, columns = c("total_bill", "time", "tip"), columnLabels = c("Total Bill", "Time of Day", "Tip"))
pm

Aesthetics can be applied to every subplot with the mapping parameter.

library(ggplot2)
pm <- ggpairs(tips, mapping = aes(color = sex), columns = c("total_bill", "time", "tip"))
pm

Since the plots are default plots (or are helper functions from GGally), the aesthetic color is altered to be appropriate. Looking at the example above, 'tip' vs 'total_bill' (pm[3,1]) needs the color aesthetic, while 'time' vs 'total_bill' needs the fill aesthetic. If custom functions are supplied, no aesthetic alterations will be done.

Matrix Sections

There are three major sections of the pairwise matrix: lower, upper, and diag. The lower and upper may contain three plot types: continuous, combo, and discrete. The 'diag' only contains either continuous or discrete.

To make adjustments to each section, a list of information may be supplied. The list can be comprised of the following elements:

library(ggplot2)
pm <- ggpairs(
  tips, columns = c("total_bill", "time", "tip"),
  lower = list(
    continuous = "smooth",
    combo = "facetdensity",
    mapping = aes(color = time)
  )
)
pm

A section list may be set to the character string "blank" or NULL if the section should be skipped when printed.

pm <- ggpairs(
  tips, columns = c("total_bill", "time", "tip"),
  upper = "blank",
  diag = NULL
)
pm

Custom Functions

The ggally_NAME functions do not provide all graphical options. Instead of supplying a character string to a continuous, combo, or discrete element within upper, lower, or diag, a custom function may be given.

The custom function should follow the api of

custom_function <- function(data, mapping, ...){
  # produce ggplot2 object here
}

There is no requirement to what happens within the function, as long as a ggplot2 object is returned.

my_bin <- function(data, mapping, ..., low = "#132B43", high = "#56B1F7") {
  ggplot(data = data, mapping = mapping) +
    geom_bin2d(...) +
    scale_fill_gradient(low = low, high = high)
}
pm <- ggpairs(
  tips, columns = c("total_bill", "time", "tip"),
  lower = list(
    continuous = my_bin
  )
)
pm

Function Wrapping

The examples above use default parameters to each of the subplots. One of the immediate parameters to be set it binwidth. This parameters is only needed in the lower, combination plots where one variable is continuous while the other variable is discrete.

To change the default parameter binwidth setting, we will r rdl(wrap) the function. r rdl(wrap) first parameter should be a character string or a custom function. The remaining parameters supplied to wrap will be supplied to the function at run time.

pm <- ggpairs(
  tips, columns = c("total_bill", "time", "tip"),
  lower = list(
    combo = wrap("facethist", binwidth = 1),
    continuous = wrap(my_bin, binwidth = c(5, 0.5), high = "red")
  )
)
pm

To get finer control over parameters, please look into custom functions.

Plot Matrix Subsetting

Please look at the vignette for ggmatrix on plot matrix manipulations.

Small ggpairs example:

pm <- ggpairs(tips, columns = c("total_bill", "time", "tip"))
# retrieve the third row, first column plot
p <- pm[3,1]
p <- p + aes(color = time)
p
pm[3,1] <- p
pm

Themes

Please look at the vignette for ggmatrix on plot matrix manipulations.

Small ggpairs example:

pmBW <- pm + theme_bw()
pmBW

References

John W Emerson, Walton A Green, Barret Schloerke, Jason Crowley, Dianne Cook, Heike Hofmann, Hadley Wickham. The Generalized Pairs Plot. Journal of Computational and Graphical Statistics, vol. 22, no. 1, pp. 79-91, 2012.

GGally::ggscatmat

Di Cook, Mengjia Ni

Nov 6, 2015

The primary function is r rdl(ggscatmat). It is similar to r rdl(ggpairs) but only works for purely numeric multivariate data. It is faster than ggpairs, because less choices need to be made. It creates a matrix with scatterplots in the lower diagonal, densities on the diagonal and correlations written in the upper diagonal. Syntax is to enter the dataset, the columns that you want to plot, a color column, and an alpha level.

data(flea)
ggscatmat(flea, columns = 2:4, color="species", alpha=0.8)

In this plot, you can see that the three different species vary a little from each other in these three variables. Heptapot (blue) has smaller values on the variable "tars1" than the other two. The correlation between the three variables is similar for all species.

References

John W Emerson, Walton A Green, Barret Schloerke, Jason Crowley, Dianne Cook, Heike Hofmann, Hadley Wickham. The Generalized Pairs Plot. Journal of Computational and Graphical Statistics, vol. 22, no. 1, pp. 79-91, 2012.

GGally::ggsurv

Edwin Thoen

April, 4, 2016

This function produces Kaplan-Meier plots using r rdl(ggplot2). As a first argument, r rdl(ggsurv) needs a r rdl(survfit) object, created by the r rdl(survival) package. Default settings differ for single stratum and multiple strata objects.

Single Stratum

require(ggplot2)
require(survival)
require(scales)

data(lung, package = "survival")
sf.lung <- survival::survfit(Surv(time, status) ~ 1, data = lung)
ggsurv(sf.lung)

Multiple Stratum

The legend color positions matches the survival order or each stratum, where the stratums that end at a lower value or time have a position that is lower in the legend.

sf.sex <- survival::survfit(Surv(time, status) ~ sex, data = lung)
pl.sex <- ggsurv(sf.sex)
pl.sex

Alterations

Since a ggplot2 object is returned, plot objects may be altered after the original creation.

Adjusting the legend

pl.sex +
  ggplot2::guides(linetype = FALSE) +
  ggplot2::scale_colour_discrete(
    name   = 'Sex',
    breaks = c(1, 2),
    labels = c('Male', 'Female')
  )

Adjust the limits

data(kidney, package = "survival")
sf.kid <- survival::survfit(Surv(time, status) ~ disease, data = kidney)
pl.kid <- ggsurv(sf.kid, plot.cens = FALSE)
pl.kid

# Zoom in to first 80 days
pl.kid + ggplot2::coord_cartesian(xlim = c(0, 80), ylim = c(0.45, 1))

Add text and remove the legend

pl.kid +
  ggplot2::annotate(
    "text",
    label  = c("PKD", "Other", "GN", "AN"),
    x      = c(90, 125, 5, 60),
    y      = c(0.8, 0.65, 0.55, 0.30),
    size   = 5,
    colour = scales::hue_pal(
      h         = c(0, 360) + 15,
      c         = 100,
      l         = 65,
      h.start   = 0,
      direction = 1
    )(4)
  ) +
  ggplot2::guides(color = FALSE, linetype = FALSE)


Try the GGally package in your browser

Any scripts or data that you put into this service are public.

GGally documentation built on Aug. 3, 2017, 1:02 a.m.