magbin | R Documentation |
Allows for 2D binning (counts) and summary statistics on 2D bins (medians etc).
magbin(x, y, z = NULL, xlim = NULL, ylim = NULL, zlim = NULL, Nbin = 50, step = NULL,
log = '', unlog = log, clustering = 10, dustlim = 0.1, shape = "hex",
plot = TRUE, colramp = hcl.colors(21), colstretch = "lin", sizestretch = "lin",
colref = "count", sizeref = "none", funstat = function(x) median(x, na.rm=TRUE),
direction = 'h', offset = 0, jitterseed = 666, projden = FALSE, projsig = FALSE, ...)
x |
Numeric vector or matrix/data.frame; x values to 2D bin. If x is a two (or more) column matrix or data.frame and y/z is missing as an argument, then the first column is used for x and the second/third column for y/z. |
y |
Numeric vector; the y coordinates of points in the plot, optional if x is an appropriate structure. |
z |
Numeric vector; the z coordinates of points in the plot (optional), optional if x is an appropriate structure. |
xlim |
Numeric vector; the x limits to use for the data. Default of NULL calculates the range based on the provided x data vector. If length equals 1 then the argument is taken to mean the sigma range to select for plotting and the clipping is done by |
ylim |
Numeric vector; the y limits to use for the data. Default of NULL calculates the range based on the provided y data vector. If length equals 1 then the argument is taken to mean the sigma range to select for plotting and the clipping is done by |
zlim |
Numeric vector; the z limits to use for the data. Default of NULL calculates the range based on the provided z data vector. If length equals 1 then the argument is taken to mean the sigma range to select for plotting and the clipping is done by |
Nbin |
Integer vector; The number of bins to (approximately) split the x/y axes into. If length 1 then this value is used by both (default is 50 bin in x/y), if length 2 then Nbin[1] is used for the x-axis and Nbin[2] is used for the y-axis. |
step |
Numeric vector; grid steps in x and y directions. If NULL then this is c(diff(xlim), diff(ylim))/Nbin. If length 1, then this value is repeated. Overrides Nbin if supplied. |
log |
Character scalar; log axis arguments to be passed to used. E.g. use 'x', 'y', 'xy' or 'yx' as appropriate. Default ” assumes no logging of any axes. For convenience you can specify the 'z' axis too, which somewhat replaces the colstretch argument. Note that in all cases the x/y/z data is explictly logged, which means the plotting window does not know it is in logged space (via the par()$xlag and par()$ylog structures). This means is you want to add points etc to the plot you will need to apply log10 yourself, so a point with coordinates [10^2,10^-3] should be plotted at [2,-3]. |
unlog |
Character scalar; determines if x/y axis labels should be unlogged (z is ignored here). By default inherits log, since that is usually what you would want. |
clustering |
Numeric scalar; excess counts in densist bin relative to Uniform data. This is to optimise the binning, so can probably be ignored. |
dustlim |
Numeric scalar; if between 0 and 1 then the 2D bin count quantile to switch to showing the individual points (which visually look like 'dust'), if larger than 1 then the exact counts threshold. If this is NA or 0 then all cells are shown. |
shape |
Character scalar; type of binning, either hex/hexagon; sq/square; tri/triangle or trihex. 'trihex' is a triangle tessellation that is also arranged to have hexagonal packing (so 6 triangles can form a hexagon). |
plot |
Logical; create a plot? If FALSE then just the binning output list is created. |
colramp |
Vector; a colour scaling to use. Must be a vector and not a function. |
colstretch |
Character scalar; colour stretch, either linear (lin, default) or logarithmic (log, good for large dynamic ranges). |
sizestretch |
Character scalar; size stretch, either linear (lin, default) or logarithmic (log, good for large dynamic ranges). |
colref |
Character scalar; colour reference for call, either it should be based on the counts (count, default) or the z-axis statistic (zstat)? |
sizeref |
Character scalar; size reference for call, either it should be ignored (none, so all are the same size and closely packed), based on the counts (count) or the z-axis statistic (zstat)? |
funstat |
Function; function to use to compute a statistic over the z axis. The default is |
direction |
Character scalar; should there be a shape side aligned horizontally ('h', the default) or vertically ('v')? This is only relevant for hexagon and triangle bin shapes, and has the effect of leading the eye differently with some scatter structures. |
offset |
Numeric/character scalar; only relevant for shape='sq' or shape='tri'. Either a numeric value specifying the offset (relative to step) to apply to alternating rows (direction='h') or columns (direction='v'); or 'jitter' which means the rows or columns are randomly jittered (only used for shape='sq' bins. This option is useful for visually breaking up strong patterns in certain types of data. |
jitterseed |
Integer scalar; the random seed to use for jittering (means you can recreate your plots exactly if the seed is the same). This argument is only used for shape='sq' bins. |
projden |
Logical; do you want projected density PDFs to be displayed above and to the side of the standard |
projsig |
Logical; if projden = TRUE then this will optionally add lines to show the pseudo 1-sigma range (15.9% to 84.1% quantiles). |
... |
Dots to be passed to |
Mostly run for the side effect of making a nice plot, but the output bin statistics might also be useful.
Re performance, magbin
works pretty well on a modern computer for up to ~1e6 points, taking only a few seconds to run usually. Beyond this you might need to carefully tune the performance with clutering otherwise it might run very slower and/or you run out of memory.
List of class 'magbin' containing:
bins |
Bin x / y / count / and zstat info |
dust |
Dust x / y / z info |
groups |
Links input x and y data to the nearest grid cell by row number of bins |
xlim |
x limits |
ylim |
y limits |
step |
step size |
dustlim |
dustlim |
shape |
shape |
direction |
direction |
plot.magbin
, maghist
set.seed(666)
xydata = cbind(rnorm(1e4), rnorm(1e4))
magbin(xydata, shape='hexagon') #default
magbin(xydata, shape='hexagon', Nbin=25) #A bit coarser
magbin(xydata, shape='square')
magbin(xydata, shape='triangle')
magbin(xydata, shape='trihex')
magbin(xydata, shape='hexagon', direction='v')
magbin(xydata, shape='triangle', direction='v')
magbin(xydata, shape='trihex', direction='v')
magbin(xydata, shape='hexagon', step=c(0.2,0.4)) #different aspect ratio hexagons
magbin(xydata, z=xydata[,1]^2-xydata[,2]^2, colref='zstat', sizeref='count')
magbin(xydata, z=xydata[,1]^2-xydata[,2]^2, colref='zstat', sizeref='count',
funstat=mad)
magbin(xydata, z=xydata[,1]^2-xydata[,2]^2, colref='zstat', sizeref='count',
funstat=function(x){quantile(x,0.9)})
xydata = cbind(10^rnorm(1e4), 10^rnorm(1e4))
magbin(xydata, log='xy')
magbin(xydata, z=xydata[,1]*xydata[,2], colref='zstat', sizeref='count',
log='xyz')
magbin(xydata, log='xy', unlog='xy', xlim=3, ylim=3)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.