eda_qq: QQ and MD plots

View source: R/eda_qq.R

eda_qqR Documentation

QQ and MD plots

Description

eda_qq Generates an empirical or Normal QQ plot as well as a Tukey mean-difference plot.

Usage

eda_qq(
  x,
  y = NULL,
  fac = NULL,
  norm = FALSE,
  p = 1L,
  tukey = FALSE,
  md = FALSE,
  q.type = 5,
  fx = NULL,
  fy = NULL,
  plot = TRUE,
  show.par = TRUE,
  grey = 0.6,
  pch = 21,
  p.col = "grey50",
  p.fill = "grey80",
  size = 0.8,
  alpha = 0.8,
  q = TRUE,
  b.val = c(0.25, 0.75),
  l.val = c(0.125, 0.875),
  xlab = NULL,
  ylab = NULL,
  title = NULL,
  t.size = 1.2,
  ...
)

Arguments

x

Vector for first variable or a dataframe.

y

Vector for second variable or column defining the continuous variable if x is a dataframe.

fac

Column defining the grouping variable if x is a dataframe.

norm

Boolean determining if a Normal QQ plot is to be generated.

p

Power transformation to apply to both sets of values.

tukey

Boolean determining if a Tukey transformation should be adopted (FALSE adopts a Box-Cox transformation).

md

Boolean determining if Tukey mean-difference plot should be generated.

q.type

An integer between 1 and 9 selecting one of the nine quantile algorithms. (See quantiletile function).

fx

Formula to apply to x variable. This is computed after any transformation is applied to the x variable.

fy

Formula to apply to y variable. This is computed after any transformation is applied to the y variable.

plot

Boolean determining if plot should be generated.

show.par

Boolean determining if parameters such as power transformation or formula should be displayed.

grey

Grey level to apply to plot elements (0 to 1 with 1 = black).

pch

Point symbol type.

p.col

Color for point symbol.

p.fill

Point fill color passed to bg (Only used for pch ranging from 21-25).

size

Point size (0-1)

alpha

Point transparency (0 = transparent, 1 = opaque). Only applicable if rgb() is not used to define point colors.

q

Boolean determining if grey quantile boxes should be plotted.

b.val

Quantiles to define the quantile box parameters. Defaults to the IQR. Two values are needed.

l.val

Quantiles to define the quantile line parameters. Defaults to the mid 75% of values. Two values are needed.

xlab

X label for output plot. Ignored if x is a dataframe.

ylab

Y label for output plot. Ignored if x is a dataframe.

title

Title to add to plot.

t.size

Title size.

...

Not used

Details

When the function is used to generate an empirical QQ plot, the plot will displays the IQR via grey boxes for both x and y values. The box widths can be changed via the b.val argument. The plot will also display the mid 75% of values via light colored dashed lines. The line positions can be changed via the l.val argument. The middle dashed line represents each batch's median value. Console output prints the suggested multiplicative and additive offsets. See the QQ plot vignette for an introduction on its use and interpretation.

The function can also be used to generate a Normal QQ plot when the norm argument is set to TRUE. In such a case, the line parameters l.val are overridden and are set to +/- 1 standard deviations. Note that the "suggested offsets" output is disabled, nor can you generate an M-D version of the Normal QQ plot. Also note that the formula argument is ignored in this mode.

Value

Returns a list with the following components:

  • x: X values. May be interpolated to smallest quantile batch. Values will reflect power transformation defined in p.

  • b: Y values. May be interpolated to smallest quantile batch. Values will reflect power transformation defined in p.

  • p: Re-expression applied to original values.

  • fx: Formula applied to x variable.

  • fy: Formula applied to y variable.

Examples


# Passing data as a dataframe
 singer <- lattice::singer
 dat <- singer[singer$voice.part  %in% c("Bass 2", "Tenor 1"), ]
 eda_qq(dat, height, voice.part)

# Passing data as two separate vector objects
 bass2 <- subset(singer, voice.part == "Bass 2", select = height, drop = TRUE )
 tenor1 <- subset(singer, voice.part == "Tenor 1", select = height, drop = TRUE )

 eda_qq(bass2, tenor1)

 # There seems to be an additive offset of about 2 inches
 eda_qq(bass2, tenor1, fx = "x - 2")

 # We can fine-tune by generating the Tukey mean-difference plot
 eda_qq(bass2, tenor1, fx = "x - 2", md = TRUE)

 # An offset of another 0.5 inches seems warranted
 # We can sat that overall, bass2 singers are 2.5 inches taller than  tenor1.
 # The offset is additive.
 eda_qq(bass2, tenor1, fx = "x - 2.5", md = TRUE)

 # Example 2: Sepal width
 setosa <- subset(iris, Species == "setosa", select = Petal.Width, drop = TRUE)
 virginica <- subset(iris, Species == "virginica", select = Petal.Width, drop = TRUE)

 eda_qq(setosa, virginica)

 # The points are not completely parallel to the  1:1 line suggesting a
 # multiplicative offset. The slope may be difficult to eyeball. The function
 # outputs a suggested slope and intercept. We can start with that
 eda_qq(setosa, virginica, fx = "x *  1.7143")

 # Now let's add the suggested additive offset.
 eda_qq(setosa, virginica, fx = "x *  1.7143  + 1.6286")

 # We can confirm this value via the mean-difference plot
 # Overall, we have both a multiplicative and additive offset between the
 # species' petal widths.
 eda_qq(setosa, virginica, fx = "x *  1.7143 + 1.6286", md = TRUE)

 # Function can also generate a Normal QQ plot
 eda_qq(bass2, norm = TRUE)

mgimond/tukeyedar documentation built on March 19, 2024, 8:44 a.m.