offsetX: Offset data using quasirandom noise to avoid overplotting
In vipor: Plot Categorical Data Using Quasirandom Noise and Density Estimates

offsetX

R Documentation

Offset data using quasirandom noise to avoid overplotting

Description

Arranges data points using quasirandom noise (van der Corput sequence), pseudorandom noise or alternatively positioning extreme values within a band to the left and right to form beeswarm/one-dimensional scatter/strip chart style plots. That is a plot resembling a cross between a violin plot (showing the density distribution) and a scatter plot (showing the individual points). This function returns a vector of the offsets to be used in plotting.

Usage

offsetX(y, x = rep(1, length(y)), width = 0.4, varwidth = FALSE, ...)

offsetSingleGroup(
  y,
  maxLength = NULL,
  method = c("quasirandom", "pseudorandom", "smiley", "maxout", "frowney", "minout",
    "tukey", "tukeyDense"),
  nbins = NULL,
  adjust = 1
)

Arguments

`y`	vector of data points
`x`	a grouping factor for y (optional)
`width`	the maximum spacing away from center for each group of points. Since points are spaced to left and right, the maximum width of the cluster will be approximately width*2 (0 = no offset, default = 0.4)
`varwidth`	adjust the width of each group based on the number of points in the group
`...`	additional arguments to offsetSingleGroup
`maxLength`	multiply the offset by sqrt(length(y)/maxLength) if not NULL. The sqrt is to match boxplot (allows comparison of order of magnitude different ns, scale with standard error)
`method`	method used to distribute the points: quasirandom: points are distributed within a kernel density estimate of the distribution with offset determined by quasirandom Van der Corput noise pseudorandom: points are distributed within a kernel density estimate of the distribution with offset determined by pseudorandom noise a la jitter maxout: points are distributed within a kernel density with points in a band distributed with highest value points on the outside and lowest in the middle minout: points are distributed within a kernel density with points in a band distributed with highest value points in the middle and lowest on the outside tukey: points are distributed as described in Tukey and Tukey "Strips displaying empirical distributions: I. textured dot strips" tukeyDense: points are distributed as described in Tukey and Tukey but are constrained with the kernel density estimate
`nbins`	the number of points used to calculate density (defaults to 1000 for quasirandom and pseudorandom and 100 for others)
`adjust`	adjust the bandwidth used to calculate the kernel density (smaller values mean tighter fit, larger values looser fit, default is 1)

Value

a vector with of x-offsets of the same length as y

Examples

## Generate fake data
dat <- list(rnorm(50), rnorm(500), c(rnorm(100), rnorm(100,5)), rcauchy(100))
names(dat) <- c("Normal", "Dense Normal", "Bimodal", "Extremes")

## Plot each distribution with a variety of parameters
par(mfrow=c(4,1), mar=c(2,4, 0.5, 0.5))
sapply(names(dat),function(label) {
  y<-dat[[label]]
  
  offsets <- list(
    'Default'=offsetX(y),
    'Smoother'=offsetX(y, adjust=2),
    'Tighter'=offsetX(y, adjust=0.1),
    'Thinner'=offsetX(y, width=0.1)
  )
  ids <- rep(1:length(offsets), sapply(offsets,length))
  
  plot(unlist(offsets) + ids, rep(y, length(offsets)), 
       ylab=label, xlab='', xaxt='n', pch=21, las=1)
  axis(1, 1:4, c("Default", "Adjust=2", "Adjust=0.1", "Width=10%"))
})

vipor documentation built on May 29, 2024, 7:09 a.m.