ggparcoord: ggparcoord - A ggplot2 Parallel Coordinate Plot

Description Usage Arguments Details Value Author(s) Examples

Description

A function for plotting static parallel coordinate plots, utilizing the ggplot2 graphics package.

Usage

1
2
3
4
5
ggparcoord(data, columns = 1:ncol(data), groupColumn = NULL,
  scale = "std", scaleSummary = "mean", centerObsID = 1,
  missing = "exclude", order = columns, showPoints = FALSE,
  splineFactor = FALSE, alphaLines = 1, boxplot = FALSE,
  shadeBox = NULL, mapping = NULL, title = "")

Arguments

data

the dataset to plot

columns

a vector of variables (either names or indices) to be axes in the plot

groupColumn

a single variable to group (color) by

scale

method used to scale the variables (see Details)

scaleSummary

if scale=="center", summary statistic to univariately center each variable by

centerObsID

if scale=="centerObs", row number of case plot should univariately be centered on

missing

method used to handle missing values (see Details)

order

method used to order the axes (see Details)

showPoints

logical operator indicating whether points should be plotted or not

splineFactor

logical or numeric operator indicating whether spline interpolation should be used. Numeric values will be multiplied by the number of columns, TRUE will default to cubic interpolation, AsIs to set the knot count directly and 0, FALSE, or non-numeric values will not use spline interpolation.

alphaLines

value of alpha scaler for the lines of the parcoord plot or a column name of the data

boxplot

logical operator indicating whether or not boxplots should underlay the distribution of each variable

shadeBox

color of underlaying box which extends from the min to the max for each variable (no box is plotted if shadeBox == NULL)

mapping

aes string to pass to ggplot object

title

character string denoting the title of the plot

Details

scale is a character string that denotes how to scale the variables in the parallel coordinate plot. Options:

missing is a character string that denotes how to handle missing values. Options:

order is either a vector of indices or a character string that denotes how to order the axes (variables) of the parallel coordinate plot. Options:

Value

ggplot object that if called, will print

Author(s)

Jason Crowley crowley.jason.s@gmail.com, Barret Schloerke schloerke@gmail.com, Di Cook dicook@iastate.edu, Heike Hofmann hofmann@iastate.edu, Hadley Wickham h.wickham@gmail.com

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
 # small function to display plots only if it's interactive
p_ <- function(pm) {
  if (interactive()) {
    print(pm)
  }
  invisible()
}

# use sample of the diamonds data for illustrative purposes
data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1], 100), ]

# basic parallel coordinate plot, using default settings
p <- ggparcoord(data = diamonds.samp, columns = c(1, 5:10))
p_(p)

# this time, color by diamond cut
p <- ggparcoord(data = diamonds.samp, columns = c(1, 5:10), groupColumn = 2)
p_(p)

# underlay univariate boxplots, add title, use uniminmax scaling
p <- ggparcoord(data = diamonds.samp, columns = c(1, 5:10), groupColumn = 2,
  scale = "uniminmax", boxplot = TRUE, title = "Parallel Coord. Plot of Diamonds Data")
p_(p)

# utilize ggplot2 aes to switch to thicker lines
p <- ggparcoord(data = diamonds.samp, columns = c(1, 5:10), groupColumn = 2,
  title ="Parallel Coord. Plot of Diamonds Data", mapping = ggplot2::aes(size = 1)) +
  ggplot2::scale_size_identity()
p_(p)

# basic parallel coord plot of the msleep data, using 'random' imputation and
# coloring by diet (can also use variable names in the columns and groupColumn
# arguments)
data(msleep, package="ggplot2")
p <- ggparcoord(data = msleep, columns = 6:11, groupColumn = "vore", missing =
  "random", scale = "uniminmax")
p_(p)

# center each variable by its median, using the default missing value handler,
# 'exclude'
p <- ggparcoord(data = msleep, columns = 6:11, groupColumn = "vore", scale =
  "center", scaleSummary = "median")
p_(p)

# with the iris data, order the axes by overall class (Species) separation using
# the anyClass option
p <- ggparcoord(data = iris, columns = 1:4, groupColumn = 5, order = "anyClass")
p_(p)

# add points to the plot, add a title, and use an alpha scalar to make the lines
# transparent
p <- ggparcoord(data = iris, columns = 1:4, groupColumn = 5, order = "anyClass",
  showPoints = TRUE, title = "Parallel Coordinate Plot for the Iris Data",
  alphaLines = 0.3)
p_(p)

# color according to a column
iris2 <- iris
iris2$alphaLevel <- c("setosa" = 0.2, "versicolor" = 0.3, "virginica" = 0)[iris2$Species]
p <- ggparcoord(data = iris2, columns = 1:4, groupColumn = 5, order = "anyClass",
  showPoints = TRUE, title = "Parallel Coordinate Plot for the Iris Data",
  alphaLines = "alphaLevel")
p_(p)

## Use splines on values, rather than lines (all produce the same result)
columns <- c(1, 5:10)
p <- ggparcoord(diamonds.samp, columns, groupColumn = 2, splineFactor = TRUE)
p_(p)
p <- ggparcoord(diamonds.samp, columns, groupColumn = 2, splineFactor = 3)
p_(p)
splineFactor <- length(columns) * 3
p <- ggparcoord(diamonds.samp, columns, groupColumn = 2, splineFactor = I(splineFactor))
p_(p)

lrutter/RNASeqVisualization documentation built on May 21, 2019, 7:52 a.m.