ggduo: ggduo - A ggplot2 generalized pairs plot for two columns sets...

Description Usage Arguments Details Examples

View source: R/ggpairs.R

Description

Make a matrix of plots with a given data set with two different column sets

Usage

1
2
3
4
5
6
7
8
9
ggduo(data, mapping = NULL, columnsX = 1:ncol(data),
  columnsY = 1:ncol(data), title = NULL, types = list(continuous =
  "smooth_loess", comboVertical = "box_no_facet", comboHorizontal = "facethist",
  discrete = "ratio"), axisLabels = c("show", "none"),
  columnLabelsX = colnames(data[columnsX]),
  columnLabelsY = colnames(data[columnsY]), labeller = "label_value",
  switch = NULL, xlab = NULL, ylab = NULL, showStrips = NULL,
  legend = NULL, cardinality_threshold = 15, progress = NULL,
  legends = stop("deprecated"))

Arguments

data

data set using. Can have both numerical and categorical data.

mapping

aesthetic mapping (besides x and y). See aes(). If mapping is numeric, columns will be set to the mapping value and mapping will be set to NULL.

columnsX, columnsY

which columns are used to make plots. Defaults to all columns.

title, xlab, ylab

title, x label, and y label for the graph

types

see Details

axisLabels

either "show" to display axisLabels or "none" for no axis labels

columnLabelsX, columnLabelsY

label names to be displayed. Defaults to names of columns being used.

labeller

labeller for facets. See labellers. Common values are "label_value" (default) and "label_parsed".

switch

switch parameter for facet_grid. See ggplot2::facet_grid. By default, the labels are displayed on the top and right of the plot. If "x", the top labels will be displayed to the bottom. If "y", the right-hand side labels will be displayed to the left. Can also be set to "both"

showStrips

boolean to determine if each plot's strips should be displayed. NULL will default to the top and right side plots only. TRUE or FALSE will turn all strips on or off respectively.

legend

May be the two objects described below or the default NULL value. The legend position can be moved by using ggplot2's theme element pm + theme(legend.position = "bottom")

a numeric vector of length 2

provides the location of the plot to use the legend for the plot matrix's legend. Such as legend = c(3,5) which will use the legend from the plot in the third row and fifth column

a single numeric value

provides the location of a plot according to the display order. Such as legend = 3 in a plot matrix with 2 rows and 5 columns displayed by column will return the plot in position c(1,2)

a object from grab_legend()

a predetermined plot legend that will be displayed directly

cardinality_threshold

maximum number of levels allowed in a character / factor column. Set this value to NULL to not check factor columns. Defaults to 15

progress

NULL (default) for a progress bar in interactive sessions with more than 15 plots, TRUE for a progress bar, FALSE for no progress bar, or a function that accepts at least a plot matrix and returns a new progress::progress_bar. See ggmatrix_progress.

legends

deprecated

Details

types is a list that may contain the variables 'continuous', 'combo', 'discrete', and 'na'. Each element of the list may be a function or a string. If a string is supplied, it must implement one of the following options:

continuous

exactly one of ('points', 'smooth', 'smooth_loess', 'density', 'cor', 'blank'). This option is used for continuous X and Y data.

comboHorizontal

exactly one of ('box', 'box_no_facet', 'dot', 'dot_no_facet', 'facethist', 'facetdensity', 'denstrip', 'blank'). This option is used for either continuous X and categorical Y data or categorical X and continuous Y data.

comboVertical

exactly one of ('box', 'box_no_facet', 'dot', 'dot_no_facet', 'facethist', 'facetdensity', 'denstrip', 'blank'). This option is used for either continuous X and categorical Y data or categorical X and continuous Y data.

discrete

exactly one of ('facetbar', 'ratio', 'blank'). This option is used for categorical X and Y data.

na

exactly one of ('na', 'blank'). This option is used when all X data is NA, all Y data is NA, or either all X or Y data is NA.

If 'blank' is ever chosen as an option, then ggduo will produce an empty plot.

If a function is supplied as an option, it should implement the function api of function(data, mapping, ...){#make ggplot2 plot}. If a specific function needs its parameters set, wrap(fn, param1 = val1, param2 = val2) the function with its parameters.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
 # small function to display plots only if it's interactive
 p_ <- GGally::print_if_interactive

 data(baseball, package = "plyr")

 # Keep players from 1990-1995 with at least one at bat
 # Add how many singles a player hit
 # (must do in two steps as X1b is used in calculations)
 dt <- transform(
   subset(baseball, year >= 1990 & year <= 1995 & ab > 0),
   X1b = h - X2b - X3b - hr
 )
 # Add
 #  the player's batting average,
 #  the player's slugging percentage,
 #  and the player's on base percentage
 # Make factor a year, as each season is discrete
 dt <- transform(
   dt,
   batting_avg = h / ab,
   slug = (X1b + 2*X2b + 3*X3b + 4*hr) / ab,
   on_base = (h + bb + hbp) / (ab + bb + hbp),
   year = as.factor(year)
 )


 pm <- ggduo(
   dt,
   c("year", "g", "ab", "lg"),
   c("batting_avg", "slug", "on_base"),
   mapping = ggplot2::aes(color = lg)
 )
 # Prints, but
 #   there is severe over plotting in the continuous plots
 #   the labels could be better
 #   want to add more hitting information
 p_(pm)

 # address overplotting issues and add a title
 pm <- ggduo(
   dt,
   c("year", "g", "ab", "lg"),
   c("batting_avg", "slug", "on_base"),
   columnLabelsX = c("year", "player game count", "player at bat count", "league"),
   columnLabelsY = c("batting avg", "slug %", "on base %"),
   title = "Baseball Hitting Stats from 1990-1995",
   mapping = ggplot2::aes(color = lg),
   types = list(
     # change the shape and add some transparency to the points
     continuous = wrap("smooth_loess", alpha = 0.50, shape = "+")
   ),
   showStrips = FALSE
 );

 p_(pm)



# Example derived from:
## R Data Analysis Examples | Canonical Correlation Analysis.  UCLA: Institute for Digital
##   Research and Education.
##   from http://www.stats.idre.ucla.edu/r/dae/canonical-correlation-analysis
##   (accessed May 22, 2017).
# "Example 1. A researcher has collected data on three psychological variables, four
#  academic variables (standardized test scores) and gender for 600 college freshman.
#  She is interested in how the set of psychological variables relates to the academic
#  variables and gender. In particular, the researcher is interested in how many
#  dimensions (canonical variables) are necessary to understand the association between
#  the two sets of variables."
data(psychademic)
summary(psychademic)

(psych_variables <- attr(psychademic, "psychology"))
(academic_variables <- attr(psychademic, "academic"))

## Within correlation
p_(ggpairs(psychademic, columns = psych_variables))
p_(ggpairs(psychademic, columns = academic_variables))

## Between correlation
loess_with_cor <- function(data, mapping, ..., method = "pearson") {
  x <- eval(mapping$x, data)
  y <- eval(mapping$y, data)
  cor <- cor(x, y, method = method)
  ggally_smooth_loess(data, mapping, ...) +
    ggplot2::geom_label(
      data = data.frame(
        x = min(x, na.rm = TRUE),
        y = max(y, na.rm = TRUE),
        lab = round(cor, digits = 3)
      ),
      mapping = ggplot2::aes(x = x, y = y, label = lab),
      hjust = 0, vjust = 1,
      size = 5, fontface = "bold",
      inherit.aes = FALSE # do not inherit anything from the ...
    )
}
pm <- ggduo(
  psychademic,
  rev(psych_variables), academic_variables,
  types = list(continuous = loess_with_cor),
  showStrips = FALSE
)
suppressWarnings(p_(pm)) # ignore warnings from loess

# add color according to sex
pm <- ggduo(
  psychademic,
  mapping = ggplot2::aes(color = sex),
  rev(psych_variables), academic_variables,
  types = list(continuous = loess_with_cor),
  showStrips = FALSE,
  legend = c(5,2)
)
suppressWarnings(p_(pm))


# add color according to sex
pm <- ggduo(
  psychademic,
  mapping = ggplot2::aes(color = motivation),
  rev(psych_variables), academic_variables,
  types = list(continuous = loess_with_cor),
  showStrips = FALSE,
  legend = c(5,2)
) +
  ggplot2::theme(legend.position = "bottom")
suppressWarnings(p_(pm))

Example output

 locus_of_control    self_concept        motivation             read     
 Min.   :-2.23000   Min.   :-2.620000   Length:600         Min.   :28.3  
 1st Qu.:-0.37250   1st Qu.:-0.300000   Class :character   1st Qu.:44.2  
 Median : 0.21000   Median : 0.030000   Mode  :character   Median :52.1  
 Mean   : 0.09653   Mean   : 0.004917                      Mean   :51.9  
 3rd Qu.: 0.51000   3rd Qu.: 0.440000                      3rd Qu.:60.1  
 Max.   : 1.36000   Max.   : 1.190000                      Max.   :76.0  
     write            math          science          sex           
 Min.   :25.50   Min.   :31.80   Min.   :26.00   Length:600        
 1st Qu.:44.30   1st Qu.:44.50   1st Qu.:44.40   Class :character  
 Median :54.10   Median :51.30   Median :52.60   Mode  :character  
 Mean   :52.38   Mean   :51.85   Mean   :51.76                     
 3rd Qu.:59.90   3rd Qu.:58.38   3rd Qu.:58.65                     
 Max.   :67.10   Max.   :75.50   Max.   :74.20                     
[1] "locus_of_control" "self_concept"     "motivation"      
[1] "read"    "write"   "math"    "science" "sex"    

GGally documentation built on May 18, 2018, 1:08 a.m.