hist2d: 2-Dimensional Histogram

Description Usage Arguments Examples

Description

2D Histogram is an alternative to traditional scatter plot. Similar with histogram, it constructs bins of regular size, and count the number of observations found in each bin. However, 2 dimensions are involved in the binning. It also use colour gradient instead of bar height to represent the count number. This plotting technique will be more useful when many of the points are overlapped, or even stacked in the scatter plot (overplotting).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
hist2d(
  dt,
  x,
  y,
  title,
  labX,
  labY,
  limX,
  limY,
  facet = 0,
  z = NULL,
  widthBin,
  nBin,
  hasLine = F,
  statsLine = c("count", "rsquare", "signif"),
  palette = "Reds",
  theme = "minimal",
  hasGrid = F,
  trans = "identity"
)

Arguments

dt

Data in the class of data.table. Currently, has name .x or .y

x, y

The name of the columns in the dt that will be used as coordinates of the 2D histogram. The columns selected must be of numeric vectors. These are (x,y) coordinates when constructing a scatter plot. If x and/or y is not supplied, the first and/or second column of dt will be used.

labX, labY

Axis label(s) for the plot output. If not supplied, the value(s) of x or y will be used.

limX, limY

The limits of coordinates of x or y axis that will be shown in the plot. It does not change the regression line if hasLine is TRUE. If not supplied, The values will be the minimum and the maximum value of corresponding dimension.

facet

Any of {0,1,2}. It refers to the ggplot2's faceting technique. Multiple 2D histograms will be displayed as a sequence of panels which enables grouping comparison. The value represents the number of additional categorical variable(s).

z

A vector of column name(s) of dt that will be used for faceting. The length of the vector must match the number in facet. Please, set it to NULL when facet==0.

widthBin

If not supplied, Freedman and Diaconis’s rule is applied to each dimension.

nBin

The number of bins to span the length from minimum to maximum point of each coordinate. If it is a single integer n, both x and y will be spanned by n bins each. If it is an integer vector of length 2 (i.e. c(m,n)), m bins will spanned the x coordineates, vice versa.

hasLine

logical with FALSE as the default. If TRUE, draw and annotate a simple linear regression line of y against x

palette

Color scheme choices as specified in http://colorbrewer2.org

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
library(data.table)
library(ggplot2)
library(scales)

set.seed(240193)

N=10000

dtA=data.table(x=runif(N,-3,3),y=runif(N,-3,3),z="A")

dtB=data.table(x=rnorm(N),y=rnorm(N),z="B")

dtC=data.table(x=seq(-3,3,length.out = N))
dtC[,":="(y=0.5*x+rnorm(N,sd=0.5),z="C"),]

dtD=data.table(x=c(rnorm(N/2,-1,.75),rnorm(N/2,1,.75)),
               y=c(rnorm(N/2,1,.75),rnorm(N/2,-1,.75)),
               z="D")

dt1=rbindlist(list(dtA,dtB,dtC,dtD))

hist2d(dt1,facet=1)

dt2=data.table(height=c(1.47,1.50,1.52,1.55,1.57,
                        1.60,1.63,1.65,1.68,1.70,
                        1.73,1.75,1.78,1.80,1.83),
               mass=c(52.21,53.12,54.48,55.84,57.20,
                      58.57,59.93,61.29,63.11,64.47,
                      66.28,68.10,69.92,72.19,74.46))
slr(dt2,"x","y","z")

artidata/artidata.viz documentation built on May 4, 2020, 3:06 p.m.