# R output pre blocks are styled by default to indicate output knitr::opts_chunk$set( comment = NA, cache = TRUE, fig.height = 8, fig.width = 10 ) suppressMessages(suppressWarnings(library(GGally))) # shorthand for rd_link() - see ?packagedocs::rd_link for more information rdl <- function(x) packagedocs::rd_link(deparse(substitute(x)))

Welcome to the GGally documentation page.

The following topic sections are alphabetically sorted.

The purpose of this function is to quickly plot the coefficients of a model.

To work automatically, this function requires the `r rdl(broom)`

package. Simply call `r rdl(ggcoef)`

with a model object. It could be the result of `r rdl(lm)`

, `r rdl(glm)`

or any other model covered by `r rdl(broom)`

and its `r rdl(tidy)`

method^[See http://www.rdocumentation.org/packages/broom.].

reg <- lm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width, data = iris) ggcoef(reg)

In the case of a logistic regression (or any other model for which coefficients are usually exponentiated), simply indicated `exponentiate = TRUE`

. Note that a logarithmic scale will be used for the x-axis.

d <- as.data.frame(Titanic) log.reg <- glm(Survived ~ Sex + Age + Class, family = binomial, data = d, weights = d$Freq) ggcoef(log.reg, exponentiate = TRUE)

You can use `conf.int`

, `vline`

and `exclude_intercept`

to display or not confidence intervals as error bars, a vertical line for `x = 0`

(or `x = 1`

if coeffcients are exponentiated) and the intercept.

ggcoef(reg, vline = FALSE, conf.int = FALSE, exclude_intercept = TRUE)

See the help page of `r rdl(ggcoef)`

for the full list of arguments that could be used to personalize how error bars and the vertical line are plotted.

ggcoef( log.reg, exponentiate = TRUE, vline_color = "red", vline_linetype = "solid", errorbar_color = "blue", errorbar_height = .25 )

Additional parameters will be passed to `r rdl(geom_point)`

.

ggcoef(log.reg, exponentiate = TRUE, color = "purple", size = 5, shape = 18)

Finally, you can also customize the aesthetic mapping of the points.

library(ggplot2) ggcoef(log.reg, exponentiate = TRUE, mapping = aes(x = estimate, y = term, size = p.value)) + scale_size_continuous(trans = "reverse")

You can also pass a custom data frame to `r rdl(ggcoef)`

. The following variables are expected:

`term`

(except if you customize the mapping)`estimate`

(except if you customize the mapping)`conf.low`

and`conf.high`

(only if you want to display error bars)

cust <- data.frame( term = c("male vs. female", "30-49 vs. 18-29", "50+ vs. 18-29", "urban vs. rural"), estimate = c(.456, 1.234, 1.897, 1.003), conf.low = c(.411, 1.042, 1.765, 0.678), conf.high = c(.498, 1.564, 2.034, 1.476), variable = c("sex", "age", "age", "residence") ) cust$term <- factor(cust$term, cust$term) ggcoef(cust, exponentiate = TRUE) ggcoef( cust, exponentiate = TRUE, mapping = aes(x = estimate, y = term, colour = variable), size = 5 )

The purpose of this function is to display two grouped data in a plot matrix. This is useful for canonical correlation analysis, multiple time series analysis, and regression analysis.

This example is derived from

`R Data Analysis Examples | Canonical Correlation Analysis. UCLA: Institute for Digital Research and Education. from http://www.stats.idre.ucla.edu/r/dae/canonical-correlation-analysis (accessed May 22, 2017).`

`Example 1. A researcher has collected data on three psychological variables, four academic variables (standardized test scores) and gender for 600 college freshman. She is interested in how the set of psychological variables relates to the academic variables and gender. In particular, the researcher is interested in how many dimensions (canonical variables) are necessary to understand the association between the two sets of variables."`

data(psychademic) str(psychademic) (psych_variables <- attr(psychademic, "psychology")) (academic_variables <- attr(psychademic, "academic"))

First, look at the within correlation using `r rdl(ggpairs)`

.

ggpairs(psychademic, psych_variables, title = "Within Psychological Variables") ggpairs(psychademic, academic_variables, title = "Within Academic Variables")

Next, look at the between correlation using `r rdl(ggduo)`

.

ggduo( psychademic, psych_variables, academic_variables, types = list(continuous = "smooth_lm"), title = "Between Academic and Psychological Variable Correlation", xlab = "Psychological", ylab = "Academic" )

Since ggduo does not have a upper section to display the correlation values, we may use a custom function to add the information in the continuous plots. The strips may be removed as each group name may be recovered in the outer axis labels.

lm_with_cor <- function(data, mapping, ..., method = "pearson") { x <- eval(mapping$x, data) y <- eval(mapping$y, data) cor <- cor(x, y, method = method) ggally_smooth_lm(data, mapping, ...) + ggplot2::geom_label( data = data.frame( x = min(x, na.rm = TRUE), y = max(y, na.rm = TRUE), lab = round(cor, digits = 3) ), mapping = ggplot2::aes(x = x, y = y, label = lab), hjust = 0, vjust = 1, size = 5, fontface = "bold", inherit.aes = FALSE # do not inherit anything from the ... ) } ggduo( psychademic, rev(psych_variables), academic_variables, mapping = aes(color = sex), types = list(continuous = wrap(lm_with_cor, alpha = 0.25)), showStrips = FALSE, title = "Between Academic and Psychological Variable Correlation", xlab = "Psychological", ylab = "Academic", legend = c(5,2) ) + theme(legend.position = "bottom")

While displaying multiple time series vertically over time, such as `+ facet_grid(time ~ .)`

, `r rdl(ggduo)`

can handle both continuous and discrete data. `r rdl(ggplot2)`

does not mix discrete and continuous data on the same axis.

library(ggplot2) data(pigs) pigs_dt <- pigs[-(2:3)] # remove year and quarter pigs_dt$profit_group <- as.numeric(pigs_dt$profit > mean(pigs_dt$profit)) qplot( time, value, data = reshape::melt.data.frame(pigs_dt, "time"), geom = c("smooth", "point") ) + facet_grid(variable ~ ., scales = "free_y")

Instead, we may use `ggts`

to display the data. `ggts`

changes the default behavior of ggduo of `columnLabelsX`

to equal `NULL`

and allows for mixed variable types.

# make the profit group as a factor value profit_groups <- c( "1" = "high", "0" = "low" ) pigs_dt$profit_group <- factor( profit_groups[as.character(pigs_dt$profit_group)], levels = unname(profit_groups), ordered = TRUE ) ggts(pigs_dt, "time", 2:7) # remove the binwidth warning pigs_types <- list( comboHorizontal = wrap(ggally_facethist, binwidth = 1) ) ggts(pigs_dt, "time", 2:7, types = pigs_types) # add color and legend pigs_mapping <- aes(color = profit_group) ggts(pigs_dt, pigs_mapping, "time", 2:7, types = pigs_types, legend = c(6,1))

Produce more meaningful labels, add a legend, and remove profit group strips.

pm <- ggts( pigs_dt, pigs_mapping, 1, 2:7, types = pigs_types, legend = c(6,1), columnLabelsY = c( "number of\nfirst birth sows", "sell price over\nfeed cost", "sell count over\nheard size", "meat head count", "breading\nheard size", "profit\ngroup" ), showStrips = FALSE ) + labs(fill = "profit group") + theme( legend.position = "bottom", strip.background = element_rect( fill = "transparent", color = "grey80" ) ) pm

Since `r rdl(ggduo)`

may take custom functions just like `r rdl(ggpairs)`

, we will make a custom function that displays the residuals with a red line at 0 and all other y variables will receive a simple linear regression plot.

Note: the marginal residuals are calculated before plotting and the y_range is found to display all residuals on the same scale.

swiss <- datasets::swiss # add a 'fake' column swiss$Residual <- seq_len(nrow(swiss)) # calculate all residuals prior to display residuals <- lapply(swiss[2:6], function(x) { summary(lm(Fertility ~ x, data = swiss))$residuals }) # calculate a consistent y range for all residuals y_range <- range(unlist(residuals)) # custom function to display continuous data. If the y variable is "Residual", do custom work. lm_or_resid <- function(data, mapping, ..., line_color = "red", line_size = 1) { if (as.character(mapping$y) != "Residual") { return(ggally_smooth_lm(data, mapping, ...)) } # make residual data to display resid_data <- data.frame( x = data[[as.character(mapping$x)]], y = residuals[[as.character(mapping$x)]] ) ggplot(data = data, mapping = mapping) + geom_hline(yintercept = 0, color = line_color, size = line_size) + ylim(y_range) + geom_point(data = resid_data, mapping = aes(x = x, y = y), ...) } # plot the data ggduo( swiss, 2:6, c(1,7), types = list(continuous = lm_or_resid) ) # change line to be thicker and blue and the points to be slightly transparent ggduo( swiss, 2:6, c(1,7), types = list( continuous = wrap(lm_or_resid, alpha = 0.7, line_color = "blue", line_size = 3 ) ) )

This function rearranges data to be able to construct a glyph plot

library(ggplot2) data(nasa) temp.gly <- glyphs(nasa, "long", "day", "lat", "surftemp", height=2.5) ggplot(temp.gly, ggplot2::aes(gx, gy, group = gid)) + add_ref_lines(temp.gly, color = "grey90") + add_ref_boxes(temp.gly, color = "grey90") + geom_path() + theme_bw() + labs(x = "", y = "")

This shows a glyphplot of monthly surface temperature for 6 years over Central America. You can see differences from one location to another, that in large areas temperature doesn't change much. There are large seasonal trends in the top left over land.

Rescaling in different ways puts emphasis on different components, see the examples in the referenced paper. And with ggplot2 you can make a map of the geographic area underlying the glyphs.

Wickham, H., Hofmann, H., Wickham, C. and Cook, D. (2012)
Glyph-maps for Visually Exploring Temporal Patterns in Climate Data and Models, **Environmetrics**, *23*(5):151-182.

`r rdl(ggmatrix)`

is a function for managing multiple plots in a matrix-like layout. It was designed to adapt to any number of columns and rows. This allows for very customized plot matrices.

The examples below use plots labeled 1 to 6 to distinguish where the plots are being placed.

plotList <- list() for (i in 1:6) { plotList[[i]] <- ggally_text(paste("Plot #", i, sep = "")) } # bare minimum of plotList, nrow, and ncol pm <- ggmatrix(plotList, 2, 3) pm # provide more information pm <- ggmatrix( plotList, nrow = 2, ncol = 3, xAxisLabels = c("A", "B", "C"), yAxisLabels = c("D", "E"), title = "Matrix Title" ) pm # display plots in column order pm <- ggmatrix( plotList, nrow = 2, ncol = 3, xAxisLabels = c("A", "B", "C"), yAxisLabels = c("D", "E"), title = "Matrix Title", byrow = FALSE ) pm

Individual plots may be retrieved from the plot matrix and can be placed in the plot matrix.

pm <- ggmatrix( plotList, nrow = 2, ncol = 3, xAxisLabels = c("A", "B", "C"), yAxisLabels = c("D", "E"), title = "Matrix Title" ) pm p2 <- pm[1,2] p3 <- pm[1,3] p2 p3 pm[1,2] <- p3 pm[1,3] <- p2 pm

library(ggplot2) pm <- ggmatrix( plotList, nrow = 2, ncol = 3, xAxisLabels = c("A", "B", "C"), yAxisLabels = c("D", "E"), title = "Matrix Title", byrow = FALSE ) pm <- pm + theme_bw() pm

The X and Y axis have booleans to turn on/off the individual plot's axes on the bottom and left sides of the plot matrix. To save time, `showAxisPlotLabels`

can be set to override `showXAxisPlotLabels`

and `showYAxisPlotLabels`

.

pm <- ggmatrix( plotList, nrow = 2, ncol = 3, xAxisLabels = c("A", "B", "C"), yAxisLabels = c("D", "E"), title = "No Left Plot Axis", showYAxisPlotLabels = FALSE ) pm pm <- ggmatrix( plotList, nrow = 2, ncol = 3, xAxisLabels = c("A", "B", "C"), yAxisLabels = c("D", "E"), title = "No Bottom Plot Axis", showXAxisPlotLabels = FALSE ) pm pm <- ggmatrix( plotList, nrow = 2, ncol = 3, xAxisLabels = c("A", "B", "C"), yAxisLabels = c("D", "E"), title = "No Plot Axes", showAxisPlotLabels = FALSE ) pm

By default, the plots in the top row and the right most column will display top-side and right-side strips respectively (`showStrips = NULL`

). If all strips need to appear in each plot, `showStrips`

may be set to `TRUE`

. If all strips should not be displayed, `showStrips`

may be set to `FALSE`

.

data(tips, package = "reshape") plotList <- list( qplot(total_bill, tip, data = subset(tips, smoker == "No" & sex == "Female")) + facet_grid(time ~ day), qplot(total_bill, tip, data = subset(tips, smoker == "Yes" & sex == "Female")) + facet_grid(time ~ day), qplot(total_bill, tip, data = subset(tips, smoker == "No" & sex == "Male")) + facet_grid(time ~ day), qplot(total_bill, tip, data = subset(tips, smoker == "Yes" & sex == "Male")) + facet_grid(time ~ day) ) pm <- ggmatrix( plotList, nrow = 2, ncol = 2, yAxisLabels = c("Female", "Male"), xAxisLabels = c("Non Smoker", "Smoker"), title = "Total Bill vs Tip", showStrips = NULL # default ) pm pm <- ggmatrix( plotList, nrow = 2, ncol = 2, yAxisLabels = c("Female", "Male"), xAxisLabels = c("Non Smoker", "Smoker"), title = "Total Bill vs Tip", showStrips = TRUE ) pm pm <- ggmatrix( plotList, nrow = 2, ncol = 2, yAxisLabels = c("Female", "Male"), xAxisLabels = c("Non Smoker", "Smoker"), title = "Total Bill vs Tip", showStrips = FALSE ) pm

`r rdl(ggnetworkmap)`

is a function for plotting elegant maps using `r rdl(ggplot2)`

. It builds on `r rdl(ggnet)`

by allowing to draw a network over a map, and is particularly intended for use with `r rdl(ggmap)`

.

This example is based on a tutorial by Nathan Yau at Flowing Data.

suppressMessages(library(network)) suppressMessages(library(sna)) suppressMessages(library(maps)) suppressMessages(library(ggplot2)) airports <- read.csv("http://datasets.flowingdata.com/tuts/maparcs/airports.csv", header = TRUE) rownames(airports) <- airports$iata # select some random flights set.seed(1234) flights <- data.frame( origin = sample(airports[200:400, ]$iata, 200, replace = TRUE), destination = sample(airports[200:400, ]$iata, 200, replace = TRUE) ) # convert to network flights <- network(flights, directed = TRUE) # add geographic coordinates flights %v% "lat" <- airports[ network.vertex.names(flights), "lat" ] flights %v% "lon" <- airports[ network.vertex.names(flights), "long" ] # drop isolated airports delete.vertices(flights, which(degree(flights) < 2)) # compute degree centrality flights %v% "degree" <- degree(flights, gmode = "digraph") # add random groups flights %v% "mygroup" <- sample(letters[1:4], network.size(flights), replace = TRUE) # create a map of the USA usa <- ggplot(map_data("usa"), aes(x = long, y = lat)) + geom_polygon(aes(group = group), color = "grey65", fill = "#f9f9f9", size = 0.2) delete.vertices(flights, which(flights %v% "lon" < min(usa$data$long))) delete.vertices(flights, which(flights %v% "lon" > max(usa$data$long))) delete.vertices(flights, which(flights %v% "lat" < min(usa$data$lat))) delete.vertices(flights, which(flights %v% "lat" > max(usa$data$lat))) # overlay network data to map ggnetworkmap(usa, flights, size = 4, great.circles = TRUE, node.group = mygroup, segment.color = "steelblue", ring.group = degree, weight = degree)

This next example uses data from a Twitter spam community identified while exploring and trying to clear-up a group of tweets. After coloring the nodes based on their centrality, the odd structure stood out clearly.

data(twitter_spambots)

# create a world map world <- fortify(map("world", plot = FALSE, fill = TRUE)) world <- ggplot(world, aes(x = long, y = lat)) + geom_polygon(aes(group = group), color = "grey65", fill = "#f9f9f9", size = 0.2) # view global structure ggnetworkmap(world, twitter_spambots)

Is the network really concentrated in the U.S.? Probably not. One of the odd things about the network, is a much higher proportion of the users gave locations that could be geocoded, than Twitter users generally.

Let's see the network topology

ggnetworkmap(net = twitter_spambots, arrow.size = 0.5)

Coloring nodes according to degree centrality can highlight network structures.

# compute indegree and outdegree centrality twitter_spambots %v% "indegree" <- degree(twitter_spambots, cmode = "indegree") twitter_spambots %v% "outdegree" <- degree(twitter_spambots, cmode = "outdegree") ggnetworkmap(net = twitter_spambots, arrow.size = 0.5, node.group = indegree, ring.group = outdegree, size = 4) + scale_fill_continuous("Indegree", high = "red", low = "yellow") + labs(color = "Outdegree")

Some Twitter attributes have been included as vertex attributes.

# show some vertex attributes associated with each account ggnetworkmap(net = twitter_spambots, arrow.size = 0.5, node.group = followers, ring.group = friends, size = 4, weight = indegree, label.nodes = TRUE, vjust = -1.5) + scale_fill_continuous("Followers", high = "red", low = "yellow") + labs(color = "Friends") + scale_color_continuous(low = "lightgreen", high = "darkgreen")

`r rdl(ggnostic)`

is a display wrapper to `r rdl(ggduo)`

that displays full model diagnostics for each given explanatory variable. By default, `r rdl(ggduo)`

displays the residuals, leave-one-out model sigma value, leverage points, and Cook's distance against each explanatory variable. The rows of the plot matrix can be expanded to include fitted values, standard error of the fitted values, standardized residuals, and any of the response variables. If the model is a linear model, stars are added according to the anova significance of each explanatory variable.

Most diagnostic plots contain reference line(s) to help determine if the model is fitting properly

**residuals:**- Type key:
`".resid"`

- A solid line is located at the expected value of 0 with dashed lines at the 95% confidence interval. ($0 \pm 1.96 * \sigma$)
- Plot function:
`r rdl(ggally_nostic_resid)`

. - See also
`r rdl(stats::residuals)`

- Type key:
**standardized residuals:**- Type key:
`".std.resid"`

- Same as residuals, except the standardized residuals equal the regular residuals divided by sigma. The dashed lines are located at $0 \pm 1.96 * 1$.
- Plot function:
`r rdl(ggally_nostic_std_resid)`

. - See also
`r rdl(stats::rstandard)`

- Type key:
**leave-one-out model sigma:**- Type key:
`".sigma"`

- A solid line is located at the full model's sigma value.
- Plot function:
`r rdl(ggally_nostic_sigma)`

. - See also
`r rdl(stats::influence)`

's value on`sigma`

- Type key:
**leverage points:**- Type key:
`".hat"`

- The expected value for the diagonal of a hat matrix is $p / n$. Points are considered leverage points if they are large than $2 * p / n$, where the higher line is drawn.
- Plot function:
`r rdl(ggally_nostic_hat)`

. - See also
`r rdl(stats::influence)`

's value on`hat`

- Type key:
**Cook's distance:**- Type key:
`".cooksd"`

- Points that are larger than $4 / n$ line are considered highly influential points. Plot function:
`r rdl(ggally_nostic_cooksd)`

. See also`r rdl(stats::cooks.distance)`

- Type key:
**fitted points:**- Type key:
`".fitted"`

- No reference lines by default.
- Default plot function:
`r rdl(ggally_points)`

. - See also
`r rdl(stats::predict)`

- Type key:
**standard error of fitted points**:- Type key:
`".se.fit"`

- No reference lines by default.
- Plot function:
`r rdl(ggally_nostic_se_fit)`

. - See also
`r rdl(stats::fitted)`

- Type key:
**response variables:**- Type key: (response name in data.frame)
- No reference lines by default.
- Default plot function:
`r rdl(ggally_points)`

.

Looking at the dataset `r rdl(datasets::state.x77)`

, we will fit a multiple regression model for Life Expectancy.

# make a data.frame and fix column names state <- as.data.frame(state.x77) colnames(state)[c(4, 6)] <- c("Life.Exp", "HS.Grad") str(state) # fit full model model <- lm(Life.Exp ~ ., data = state) # reduce to "best fit" model with model <- step(model, trace = FALSE) summary(model)

Next, we look at the variables for any high (|value| > 0.8) correlation values and general interaction behavior.

# look at variables for high correlation (none) ggscatmat(state, columns = c("Population", "Murder", "HS.Grad", "Frost"))

All variables appear to be ok. Next, we look at the model diagnostics.

```
# look at model diagnostics
ggnostic(model)
```

- The residuals appear to be normally distributed. There are a couple residual outliers, but 2.5 outliers are expected.
- There are 5 leverage points according the diagonal of the hat matrix
- There are 2 leverage points according to Cook's distance. One is
**much**larger than the other.

Let's remove the largest data point first to try and define a better model.

# very high life expectancy state[11, ] state_no_hawaii <- state[-11, ] model_no_hawaii <- lm(Life.Exp ~ Population + Murder + HS.Grad + Frost, data = state_no_hawaii) ggnostic(model_no_hawaii)

There are no more outrageous Cook's distance values. The model without Hawaii appears to be a good fitting model.

summary(model) summary(model_no_hawaii)

Since there is only a marginal improvement by removing Hawaii, the original model should be used to explain life expectancy.

The following lines of code will display different modle diagnostic plot matrices for the same statistical model. The first one is of the default settings. The second adds color according to the `species`

. Finally, the third displays all possible columns and uses `r rdl(ggally_smooth)`

to display the fitted points and response variables.

flea_model <- step(lm(head ~ ., data = flea), trace = FALSE) summary(flea_model) # default output ggnostic(flea_model) # color'ed output ggnostic(flea_model, mapping = ggplot2::aes(color = species)) # full color'ed output ggnostic( flea_model, mapping = ggplot2::aes(color = species), columnsY = c("head", ".fitted", ".se.fit", ".resid", ".std.resid", ".hat", ".sigma", ".cooksd"), continuous = list(default = ggally_smooth, .fitted = ggally_smooth) )

`r rdl(ggpairs)`

is a special form of a `r rdl(ggmatrix)`

that produces a pairwise comparison of multivariate data. By default, `r rdl(ggpairs)`

provides two different comparisons of each pair of columns and displays either the density or count of the respective variable along the diagonal. With different parameter settings, the diagonal can be replaced with the axis values and variable labels.

There are many hidden features within ggpairs. Please take a look at the examples below to get the most out of ggpairs.

The `columns`

displayed default to all columns of the provided `data`

. To subset to only a few columns, use the `columns`

parameter.

data(tips, package = "reshape") pm <- ggpairs(tips) pm ## too many plots for this example. ## reduce the columns being displayed ## these two lines of code produce the same plot matrix pm <- ggpairs(tips, columns = c(1, 6, 2)) pm <- ggpairs(tips, columns = c("total_bill", "time", "tip"), columnLabels = c("Total Bill", "Time of Day", "Tip")) pm

Aesthetics can be applied to every subplot with the `mapping`

parameter.

library(ggplot2) pm <- ggpairs(tips, mapping = aes(color = sex), columns = c("total_bill", "time", "tip")) pm

Since the plots are default plots (or are helper functions from GGally), the aesthetic color is altered to be appropriate. Looking at the example above, 'tip' vs 'total_bill' (pm[3,1]) needs the `color`

aesthetic, while 'time' vs 'total_bill' needs the `fill`

aesthetic. If custom functions are supplied, no aesthetic alterations will be done.

There are three major sections of the pairwise matrix: `lower`

, `upper`

, and `diag`

. The `lower`

and `upper`

may contain three plot types: `continuous`

, `combo`

, and `discrete`

. The 'diag' only contains either `continuous`

or `discrete`

.

`continuous`

: both X and Y are continuous variables`combo`

: one X and Y variable is discrete while the other is continuous`discrete`

: both X and Y are discrete variables

To make adjustments to each section, a list of information may be supplied. The list can be comprised of the following elements:

`continuous`

:- a character string representing the tail end of a
`ggally_NAME`

function, or a custom function - current valid
`upper$continuous`

and`lower$continuous`

character strings:`'points'`

,`'smooth'`

,`'density'`

,`'cor'`

,`'blank'`

- current valid
`diag$continuous`

character strings:`'densityDiag'`

,`'barDiag'`

,`'blankDiag'`

- a character string representing the tail end of a
`combo`

:- a character string representing the tail end of a
`ggally_NAME`

function, or a custom function. (not applicable for a`diag`

list) - current valid
`upper$combo`

and`lower$combo`

character strings:`'box'`

,`'dot'`

,`'facethist'`

,`'facetdensity'`

,`'denstrip'`

,`'blank'`

- a character string representing the tail end of a
`discrete`

:- a character string representing the tail end of a
`ggally_NAME`

function, or a custom function - current valid
`upper$discrete`

and`lower$discrete`

character strings:`'ratio'`

,`'facetbar'`

,`'blank'`

- current valid
`diag$discrete`

character strings:`'barDiag'`

,`'blankDiag'`

- a character string representing the tail end of a
`mapping`

: if mapping is provided, only the section's mapping will be overwritten

library(ggplot2) pm <- ggpairs( tips, columns = c("total_bill", "time", "tip"), lower = list( continuous = "smooth", combo = "facetdensity", mapping = aes(color = time) ) ) pm

A section list may be set to the character string `"blank"`

or `NULL`

if the section should be skipped when printed.

pm <- ggpairs( tips, columns = c("total_bill", "time", "tip"), upper = "blank", diag = NULL ) pm

The `ggally_NAME`

functions do not provide all graphical options. Instead of supplying a character string to a `continuous`

, `combo`

, or `discrete`

element within `upper`

, `lower`

, or `diag`

, a custom function may be given.

The custom function should follow the api of

custom_function <- function(data, mapping, ...){ # produce ggplot2 object here }

There is no requirement to what happens within the function, as long as a ggplot2 object is returned.

my_bin <- function(data, mapping, ..., low = "#132B43", high = "#56B1F7") { ggplot(data = data, mapping = mapping) + geom_bin2d(...) + scale_fill_gradient(low = low, high = high) } pm <- ggpairs( tips, columns = c("total_bill", "time", "tip"), lower = list( continuous = my_bin ) ) pm

The examples above use default parameters to each of the subplots. One of the immediate parameters to be set it `binwidth`

. This parameters is only needed in the lower, combination plots where one variable is continuous while the other variable is discrete.

To change the default parameter `binwidth`

setting, we will `r rdl(wrap)`

the function. `r rdl(wrap)`

first parameter should be a character string or a custom function. The remaining parameters supplied to wrap will be supplied to the function at run time.

pm <- ggpairs( tips, columns = c("total_bill", "time", "tip"), lower = list( combo = wrap("facethist", binwidth = 1), continuous = wrap(my_bin, binwidth = c(5, 0.5), high = "red") ) ) pm

To get finer control over parameters, please look into custom functions.

Please look at the vignette for ggmatrix on plot matrix manipulations.

Small ggpairs example:

pm <- ggpairs(tips, columns = c("total_bill", "time", "tip")) # retrieve the third row, first column plot p <- pm[3,1] p <- p + aes(color = time) p pm[3,1] <- p pm

Please look at the vignette for ggmatrix on plot matrix manipulations.

Small ggpairs example:

pmBW <- pm + theme_bw() pmBW

John W Emerson, Walton A Green, Barret Schloerke, Jason Crowley, Dianne Cook, Heike Hofmann, Hadley Wickham.
**The Generalized Pairs Plot**.
*Journal of Computational and Graphical Statistics*, vol. 22, no. 1, pp. 79-91, 2012.

The primary function is `r rdl(ggscatmat)`

. It is similar to `r rdl(ggpairs)`

but only works for purely numeric multivariate data. It is faster than ggpairs, because less choices need to be made. It creates a matrix with scatterplots in the lower diagonal, densities on the diagonal and correlations written in the upper diagonal. Syntax is to enter the dataset, the columns that you want to plot, a color column, and an alpha level.

data(flea) ggscatmat(flea, columns = 2:4, color="species", alpha=0.8)

In this plot, you can see that the three different species vary a little from each other in these three variables. Heptapot (blue) has smaller values on the variable "tars1" than the other two. The correlation between the three variables is similar for all species.

John W Emerson, Walton A Green, Barret Schloerke, Jason Crowley, Dianne Cook, Heike Hofmann, Hadley Wickham.
**The Generalized Pairs Plot**.
*Journal of Computational and Graphical Statistics*, vol. 22, no. 1, pp. 79-91, 2012.

This function produces Kaplan-Meier plots using `r rdl(ggplot2)`

.
As a first argument, `r rdl(ggsurv)`

needs a `r rdl(survfit)`

object, created by the
`r rdl(survival)`

package. Default settings differ for single stratum and
multiple strata objects.

require(ggplot2) require(survival) require(scales) data(lung, package = "survival") sf.lung <- survival::survfit(Surv(time, status) ~ 1, data = lung) ggsurv(sf.lung)

The legend color positions matches the survival order or each stratum, where the stratums that end at a lower value or time have a position that is lower in the legend.

sf.sex <- survival::survfit(Surv(time, status) ~ sex, data = lung) pl.sex <- ggsurv(sf.sex) pl.sex

Since a ggplot2 object is returned, plot objects may be altered after the original creation.

pl.sex + ggplot2::guides(linetype = FALSE) + ggplot2::scale_colour_discrete( name = 'Sex', breaks = c(1, 2), labels = c('Male', 'Female') )

data(kidney, package = "survival") sf.kid <- survival::survfit(Surv(time, status) ~ disease, data = kidney) pl.kid <- ggsurv(sf.kid, plot.cens = FALSE) pl.kid # Zoom in to first 80 days pl.kid + ggplot2::coord_cartesian(xlim = c(0, 80), ylim = c(0.45, 1))

pl.kid + ggplot2::annotate( "text", label = c("PKD", "Other", "GN", "AN"), x = c(90, 125, 5, 60), y = c(0.8, 0.65, 0.55, 0.30), size = 5, colour = scales::hue_pal( h = c(0, 360) + 15, c = 100, l = 65, h.start = 0, direction = 1 )(4) ) + ggplot2::guides(color = FALSE, linetype = FALSE)

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.