scatterHex | R Documentation |
scatter plot where observations are grouped into hexagonal bins and then summarized
scatterHex(
data_frame,
x.by,
y.by,
color.by = NULL,
bins = 30,
color.method = NULL,
split.by = NULL,
rows.use = NULL,
color.panel = dittoColors(),
colors = seq_along(color.panel),
x.adjustment = NULL,
y.adjustment = NULL,
color.adjustment = NULL,
x.adj.fxn = NULL,
y.adj.fxn = NULL,
color.adj.fxn = NULL,
multivar.split.dir = c("col", "row"),
split.nrow = NULL,
split.ncol = NULL,
split.adjust = list(),
min.density = NA,
max.density = NA,
min.color = "#F0E442",
max.color = "#0072B2",
min.opacity = 0.2,
max.opacity = 1,
min = NA,
max = NA,
rename.color.groups = NULL,
xlab = x.by,
ylab = y.by,
main = "make",
sub = NULL,
theme = theme_bw(),
do.contour = FALSE,
contour.color = "black",
contour.linetype = 1,
do.ellipse = FALSE,
do.label = FALSE,
labels.size = 5,
labels.highlight = TRUE,
labels.repel = TRUE,
labels.split.by = split.by,
labels.repel.adjust = list(),
add.trajectory.by.groups = NULL,
add.trajectory.curves = NULL,
trajectory.group.by,
trajectory.arrow.size = 0.15,
add.xline = NULL,
xline.linetype = "dashed",
xline.color = "black",
add.yline = NULL,
yline.linetype = "dashed",
yline.color = "black",
legend.show = TRUE,
legend.color.title = "make",
legend.color.breaks = waiver(),
legend.color.breaks.labels = waiver(),
legend.density.title = "Observations",
legend.density.breaks = waiver(),
legend.density.breaks.labels = waiver(),
show.grid.lines = TRUE,
data.out = FALSE
)
data_frame |
A data_frame where columns are features and rows are observations you might wish to visualize. |
x.by , y.by |
Single strings denoting the name of a column of |
color.by |
Single string denoting the name of a column of |
bins |
Numeric or numeric vector giving the number of hexagonal bins in the x and y directions. Set to 30 by default. |
color.method |
Single string that specifies how Continuous: String naming a function for how target data should be summarized for each bin.
Can be any function that inputs (summarizes) a numeric vector and outputs a single numeric value.
Default is Discrete: A string signifying whether the color should (default) be simply based on the "max" grouping of the bin, or based on the "max.prop"ortion of observations belonging to any grouping. |
split.by |
1 or 2 strings denoting the name(s) of column(s) of When 2 columns are named, c(row,col), the first is used as rows and the second is used for columns of the resulting facet grid. When 1 column is named, shape control can be achieved with |
rows.use |
String vector of rownames of Alternatively, a Logical vector, the same length as the number of rows in |
color.panel |
String vector which sets the colors to draw from when A named vector can be used if names are matched to the distinct values of the |
colors |
Integer vector, the indexes / order, of colors from Useful for quickly swapping around colors of the default set (when not using names for color matching). |
x.adjustment , y.adjustment , color.adjustment |
A recognized string indicating whether numeric
Ignored if the target data is not numeric as these known adjustments target numeric data only. In order to leave the unedited data available for use in other features, the adjusted data are put in a new column and that new column is used for plotting. |
x.adj.fxn , y.adj.fxn , color.adj.fxn |
If you wish to apply a function to edit the For example, In order to leave the unedited data available for use in other features, the adjusted data are put in a new column and that new column is used for plotting. |
multivar.split.dir |
"row" or "col", sets the direction of faceting used for 'var' values when:
|
split.nrow , split.ncol |
Integers which set the dimensions of faceting/splitting when faceting by a single feature. |
split.adjust |
A named list which allows extra parameters to be pushed through to the faceting function call. List elements should be valid inputs to the faceting functions, e.g. 'list(scales = "free")'. For options, when giving 1 column to |
min.density , max.density |
Number which sets the min/max values used for the density scale. Used no matter whether density is represented through opacity or color. |
min.color , max.color |
color for the min/max values of the color scale. |
min.opacity , max.opacity |
Scalar between [0,1] which sets the minimum or maximum opacity used for the density legend (when color is used for |
min , max |
Number which sets the values associated with the minimum or maximum color for |
rename.color.groups |
String vector which sets new names for the identities of |
xlab , ylab |
Strings which set the labels for the axes. To remove, set to |
main |
String, sets the plot title. The default title is either "Density", |
sub |
String, sets the plot subtitle. |
theme |
A ggplot theme which will be applied before internal adjustments.
Default = |
do.contour |
Logical. Whether density-based contours should be displayed. |
contour.color |
String that sets the color of the |
contour.linetype |
String or numeric which sets the type of line used for |
do.ellipse |
Logical. Whether |
do.label |
Logical. Whether to add text labels near the center (median) of |
labels.size |
Number which sets the size of labels text when |
labels.highlight |
Logical. Whether labels should have a box behind them when |
labels.repel |
Logical, that sets whether the labels' placements will be adjusted with ggrepel to avoid intersections between labels and plot bounds when |
labels.split.by |
String of one or two column names which controls the facet-split calculations for label placements.
Defaults to |
labels.repel.adjust |
A named list which allows extra parameters to be pushed through to ggrepel function calls.
List elements should be valid inputs to the |
add.trajectory.by.groups |
List of vectors representing trajectory paths, each from start-group to end-group, where vector contents are the group-names indicated by the |
add.trajectory.curves |
List of matrices, each representing coordinates for a trajectory path, from start to end, where matrix columns represent x and y coordinates of the paths. |
trajectory.group.by |
String denoting the name of a column of |
trajectory.arrow.size |
Number representing the size of trajectory arrows, in inches. Default = 0.15. |
add.xline |
numeric value(s) where one or multiple vertical line(s) should be added. |
xline.linetype |
String which sets the type of line for |
xline.color |
String that sets the color(s) of the |
add.yline |
numeric value(s) where one or multiple vertical line(s) should be added. |
yline.linetype |
String which sets the type of line for |
yline.color |
String that sets the color(s) of the |
legend.show |
Logical. Whether any legend should be displayed. Default = |
legend.density.title , legend.color.title |
Strings which set the title for the legends. |
legend.density.breaks , legend.color.breaks |
Numeric vector which sets the discrete values to label in the density and color.by legends. |
legend.density.breaks.labels , legend.color.breaks.labels |
String vector, with same length as |
show.grid.lines |
Logical which sets whether grid lines should be shown within the plot space. |
data.out |
Logical. When set to |
This function first makes any requested adjustments to data in the given data_frame
, internally only, such as scaling the color.by
-column if color.adjustment
was given "z-score"
.
Next, data_frame is then subset to only target rows based on the rows.use
input.
Finally, a hex plot is created using this dataframe:
If color.by
is not rovided, coloring is based on the density of observations within each hex bin.
When color.by
is provided, density is represented through opacity while coloring is based on a summarization, chosen with the color.method
input, of the target color.by
data.
If split.by
was used, the plot will be split into a matrix of panels based on the associated groupings.
A ggplot object where colored hexagonal bins are used to summarize observations in a scatter plot.
Alternatively, if data.out=TRUE
, a list containing three slots is output:
the plot (named 'plot'),
a data.table containing the updated underlying data for target rows (named 'data'),
and a list providing mappings of final column names in 'data' to given plot aesthetics (named 'cols_used'), because modification of newly made columns is required for many features.
Colors: min.color
and max.color
adjust the colors for continuous data.
For discrete color.by
plotting with color.method = "max"
, colors are instead adjusted with color.panel
and/or colors
& the labels of the groupings can be changed using rename.color.groups
.
Titles and axes labels can be adjusted with main
, sub
, xlab
, ylab
, and legend.color.title
and legend.density.title
arguments.
Legends can also be adjusted in other ways, using variables that all start with "legend.
" for easy tab completion lookup.
Other tweaks and features can be added as well.
Each is accessible through 'tab' autocompletion starting with "do.
"---
or "add.
"---
,
and if additional inputs are involved in implementing or tweaking these, the associated inputs will start with the "---.
":
If do.contour
is provided, density gradient contour lines will be overlaid with color and linetype adjustable via contour.color
and contour.linetype
.
If add.trajectory.by.groups
is provided a list of vectors (each vector being group names from start-group-name to end-group-name), and a column name pointing to the relevant grouping information is provided to trajectory.group.by
,
then median centers of the groups will be calculated and arrows will be overlayed to show trajectory inference paths.
If add.trajectory.curves
is provided a list of matrices (each matrix containing x, y coordinates from start to end), paths and arrows will be overlayed to show trajectory inference curves.
Arrow size is controlled with the trajectory.arrow.size
input.
Daniel Bunis with some code adapted from Giuseppe D'Agostino
scatterPlot
for making non-hex-binned scatter plots showing each individual data point.
It is often best to investigate your data with both the individual and hex-bin methods, then pick whichever is the best representation for your particular goal.
example("dittoExampleData", echo = FALSE)
# The minimal inputs for scatterHex are the 'data_frame', and 2 column names,
# given to 'x.by' and 'y.by', indicating which data to use for the x and y
# axes, respectively.
scatterHex(
example_df, x.by = "PC1", y.by = "PC2")
# 'color.by' can also be given a column name in order to represent that
# column's data in the color of the hexes.
# Note: This capability requires the suggested package 'ggplot.multistats'.
if (requireNamespace("ggplot.multistats", quietly = TRUE)) {
scatterHex(
example_df, x.by = "PC1", y.by = "PC2",
color.by = "groups")
}
if (requireNamespace("ggplot.multistats", quietly = TRUE)) {
scatterHex(
example_df, x.by = "PC1", y.by = "PC2",
color.by = "gene1")
}
# Data can be "split" or faceted by a discrete variable as well.
scatterHex(example_df, x.by = "PC1", y.by = "PC2",
split.by = "timepoint") # single split.by element
scatterHex(example_df, x.by = "PC1", y.by = "PC2",
split.by = c("groups","SNP")) # row and col split.by elements
# Modify the look with intuitive inputs
scatterHex(example_df, x.by = "PC1", y.by = "PC2",
show.grid.lines = FALSE,
ylab = NULL, xlab = "PC2 by PC1",
main = "Plot Title",
sub = "subtitle",
legend.density.title = "Items")
# 'max.density' is one of these intuitively named inputs that can be
# extremely useful for saying "I only can for opacity to be decreased
# in regions with exceptionally low observation numbers."
# (A good value for this in "real" data might be 10 or 50 or higher, but for
# our sparse example data, we need to do a lot to show this off at all!)
if (requireNamespace("ggplot.multistats", quietly = TRUE)) {
scatterHex(
example_df, x.by = "PC1", y.by = "PC2",
color.by = "gene1", bins = 10,
sub = "Default density scale")
}
if (requireNamespace("ggplot.multistats", quietly = TRUE)) {
scatterHex(
example_df, x.by = "PC1", y.by = "PC2",
color.by = "gene1", bins = 10,
sub = "Density capped low for ignoring sparse regions",
max.density = 2)
}
# You can restrict to only certain data points using the 'rows.use' input.
# The input can be given rownames, indexes, or a logical vector
scatterHex(example_df, x.by = "PC1", y.by = "PC2",
sub = "show only first 40 observations, by index",
rows.use = 1:40)
scatterHex(example_df, x.by = "PC1", y.by = "PC2",
sub = "show only 3 obs, by name (plotting gets a bit wonky for few points)",
rows.use = c("obs1", "obs2", "obs25"))
scatterHex(example_df, x.by = "PC1", y.by = "PC2",
sub = "show groups A,B,D only, by logical",
rows.use = example_df$groups!="C")
# Many extra features are easy to add as well:
# Each is started via an input starting with 'do.FEATURE*' or 'add.FEATURE*'
# And when tweaks for that feature are possible, those inputs will start be
# named starting with 'FEATURE*'. For example, color.by groups can be labeled
# with 'do.label = TRUE' and the tweaks for this feature are given with inputs
# 'labels.size', 'labels.highlight', and 'labels.repel':
if (requireNamespace("ggplot.multistats", quietly = TRUE)) {
scatterHex(example_df, x.by = "PC1", y.by = "PC2", color.by = "groups",
sub = "default labeling",
do.label = TRUE) # Turns on the labeling feature
}
if (requireNamespace("ggplot.multistats", quietly = TRUE)) {
scatterHex(example_df, x.by = "PC1", y.by = "PC2", color.by = "groups",
sub = "tweaked labeling",
do.label = TRUE, # Turns on the labeling feature
labels.size = 8, # Adjust the text size of labels
labels.highlight = FALSE, # Removes white background behind labels
labels.repel = FALSE) # Turns off anti-overlap location adjustments
}
# Faceting can also be used to show multiple continuous variables side-by-side
# by giving a vector of column names to 'color.by'.
# This can also be combined with 1 'split.by' variable, with direction then
# controlled via 'multivar.split.dir':
if (requireNamespace("ggplot.multistats", quietly = TRUE)) {
scatterHex(example_df, x.by = "PC1", y.by = "PC2", bins = 10,
color.by = c("gene1", "gene2"))
}
if (requireNamespace("ggplot.multistats", quietly = TRUE)) {
scatterHex(example_df, x.by = "PC1", y.by = "PC2", bins = 10,
color.by = c("gene1", "gene2"),
split.by = "groups")
}
if (requireNamespace("ggplot.multistats", quietly = TRUE)) {
scatterHex(example_df, x.by = "PC1", y.by = "PC2", bins = 10,
color.by = c("gene1", "gene2"),
split.by = "groups",
multivar.split.dir = "row")
}
# Sometimes, it can be useful for external editing or troubleshooting purposes
# to see the underlying data that was directly used for plotting.
# 'data.out = TRUE' can be provided in order to obtain not just plot ("plot"),
# but also the "data" and "cols_used" returned as a list.
out <- scatterHex(example_df, x.by = "PC1", y.by = "PC2",
rows.use = 1:40,
data.out = TRUE)
out$plot
summary(out$data)
out$cols_use
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.