niceBox: draw a box plot
In ZachHunter/NicePlots.R: Nice Plots for Data Exploration

niceBox

R Documentation

draw a box plot

Description

draws a box plot with optional scatter plot overlays, subgrouping options and log scale support.

Usage

niceBox(
  x,
  by = NULL,
  groupLabels = NULL,
  main = NULL,
  sub = NULL,
  ylab = NULL,
  theme = basicTheme,
  minorTick = FALSE,
  guides = TRUE,
  outliers = 1.5,
  pointSize = 1,
  width = 1,
  pointShape = 16,
  plotColors = list(bg = "open"),
  logScale = FALSE,
  trim = FALSE,
  pointMethod = "jitter",
  axisText = c(NULL, NULL),
  showCalc = FALSE,
  calcType = "wilcox",
  drawBox = TRUE,
  yLim = NULL,
  rotateLabels = FALSE,
  rotateY = FALSE,
  add = FALSE,
  minorGuides = NULL,
  extendTicks = TRUE,
  subgroup = FALSE,
  subgroupLabels = NULL,
  highlightLabels = NULL,
  expLabels = TRUE,
  sidePlot = FALSE,
  drawPoints = TRUE,
  pointHighlights = FALSE,
  pointLaneWidth = 0.7,
  flipFacts = FALSE,
  na.rm = FALSE,
  verbose = FALSE,
  legend = FALSE,
  logAdjustment = 1,
  ...
)

Arguments

`x`	numeric vector or data frame; The input to `prepCategoryWindow` can be a numeric vector a data frame of numeric vectors.
`by`	factor or data frame of factors; used as the primary grouping factor and the factor levels will be used as group names if `groupLabels` is not specified. If `by` is a data frame and `subgroup=TRUE`, the second column is assumed to be a secondary grouping factor, breaking out the data into sub-categories within each major group determined by the levels of the first column.
`groupLabels`	character vector; overrides the factor levels of `by` to label the groups
`main`	character; title for the graph which is supplied to the `main` argument.
`sub`	character; subtitle for the graph which is supplied to the `sub` argument. If `NULL` and `showCalc=TRUE` it will be used to display the output form `calcStats`.
`ylab`	character; y-axis label.
`theme`	list object; Themes are are an optional way of storing graphical preset options that are compatible with all nicePlot graphing functions.
`minorTick`	positive integer; number of minor tick-marks to draw between each pair of major ticks-marks.
`guides`	logical; will draw guidelines at the major tick-marks if set to `TRUE`. Color of the guidelines is determined by `plotColors$guides`.
`outliers`	positive numeric; number of interquartile ranges (IQR) past the Q1 (25%) and Q3 (75%) cumulative distribution values. Outliers are often defined as `1.5 \times IQR` and extreme outliers are more than `3 \times IQR` away from the inner 50% data range.
`pointSize`	positive integer; sets the cex multiplier for point size.
`width`	numeric; scaling factor controlling the width of the boxes.
`pointShape`	positive integer; sets pty for plotting data points. Can be a vector to support additional graphical customization.
`plotColors`	list; a named list of vectors of colors that set the color options for all NicePlot functions. Names left unspecified will be added and set to default values automatically.
`logScale`	positive numeric; the base for the for log scale data transformation calculated as `log(x+1,logScale)`.
`trim`	positive numeric; passed to `threshold` argument of `quantileTrim` if any data points are so extreme that they should be removed before plotting and downstream analysis. Set to `FALSE` to disable.
`pointMethod`	character; method to be used for ploting dots. Can be set to "jitter", "linear", "beeswarm" or "distribution".
`axisText`	character; a length two character vector containing text to be prepended or appended to the major tick labels, respectively.
`showCalc`	logical; if a p-value can be easily calculated for your data, it will be displayed using the `sub` annotation setting.
`calcType`	character; should match one of 'none', 'wilcox', 'Tukey','t.test','anova' which will determine which, if any statistical test should be performed on the data.
`drawBox`	logical; should the boxes be drawn. The median bar will be drawn regardless.
`yLim`	numeric vector; manually set the limits of the plotting area (eg. `yLim=c(min,max)`). Used to format the y-axis by default but will modify the x-axis if `side=TRUE`.
`rotateLabels`	logical; sets `las=2` for the x-axis category labels. Will affect y-axis if `side=TRUE`. Note that this may not work well if long names or with subgrouped data.
`rotateY`	logical; sets `las=2` for the y-axis major tick-mark labels. Will affect x-axis if `side=TRUE`.
`add`	logical; causes plotting to be added to the existing plot rather the start a new one.
`minorGuides`	logical; draws guidelines at minor tick-marks
`extendTicks`	logical; extends minor tick-marks past the first and last major tick to the edge of the graph provided there is enough room. Works for both log-scale and regular settings.
`subgroup`	logical; use additional column in `by` to group the data within each level of the major factor.
`subgroupLabels`	character vector; sets the labels used for the `subgroup` factor. Defaults to the levels of the factor.
`highlightLabels`	character; An optional character vector to override the factor labels associated with point highlights if active.
`expLabels`	logical; prints the major tick labels is `logScale^{x}` instead of the raw value
`sidePlot`	logical; switches the axis to plot horizontally instead of vertically.
`drawPoints`	logical; draws a dot plot overlay of the data for each box.
`pointHighlights`	logical; will use additional factors in `by` to highlight points in the dot plot
`pointLaneWidth`	numeric; This controls how far data point dots can move along the categorical axis when plotting. Used for `pointMethod` options 'jitter', 'beeswarm', and 'distribution'.
`flipFacts`	logical; When a dataframe of values is given, column names are used as a secondary grouping factor by default. Setting `flipFacts=TRUE` makes the column names the primary factor and `by` the secondary factor.
`na.rm`	logical; Should `NA`s be removed from the data set? Both data input and the factor input from `by` with be checked.
`verbose`	logical; Prints summary and p-value calculations to the screen. All data is silently by the function returned either way.
`legend`	logical/character; Draw a legend in the plot margins. If a character string is given it will overide the factor name default for the legend title.
`logAdjustment`	= numeric; This number is added to the input data prior to log transformation. Default value is 1.
`...`	additional options for S3 method variants

Details

This box plot function offers extensive log scale support, outlier detection, data point overlay options, data subsetting with a secondary factor, and data point highlighting with a tertiary factor. The complicated part of using this function is handling its many options. A wrapper function to set up and run it with preset options may be a good idea if you are using it along. The function niceDots is an example of this. Briefly put, the by argument can be a data frame of factors and the function will work through the columns in order as needed. If x is a numeric vector, then by should be a factor to group it into categories. If by is a data frame of factors and subgroup=TRUE, then the first column for by is used as the grouping factor and the second column is used as the sub-grouping factor. If pointHighlights==TRUE, and subgroup=TRUE, the the third column of by is used to highlight points data point overlay (assuming drawPoints=TRUE). If subgroup=FALSE and subgroup=TRUE, then the second column of by is used to control the point highlighting. If x itself is a data frame of numeric vectors, subgroup is automatically set to false and each column of x is plotted like a sub-group and grouped by the first column of by. Data point highlighting with pointHighlights=TRUE can still be used when x is a data frame and the highlighting factor will be drawn from the second column of by. Please note that the p-values can not always be calculated and are for general exploratory use only. More careful analysis is necessary to determine statistical significance. This function is as S3 generic and can be extended to provide class specific functionality. To further facilitate data exploration, outputs from statistical testing and data set summaries are printed to the console.

Examples

data(iris)
mCols<-makeColorMatrix()
myCols<-list(fill=c(mCols[1,3],mCols[2,3],mCols[3,3]),lines="darkblue")
Lab<-"Sepal Length"
niceBox(iris$Sepal.Length,iris$Species,minorTick=4,showCalc=TRUE,
    calcType="anova",ylab=Lab,main="Sepal Length by Species",plotColors=myCols)


plot(density(iris$Petal.Length))
lengthFact<-factor(iris$Petal.Length>2.82,labels=c("short","long"))


Title<-"Sepal Length by Species and Petal Length"
factorFrame<-data.frame(Species=iris$Species,PetalLength=lengthFact)
niceBox(iris$Sepal.Length, by=factorFrame, minorTick=4,subgroup=TRUE,
    ylab=Lab,main=Title,plotColors=myCols)

ZachHunter/NicePlots.R documentation built on Sept. 23, 2023, 4:04 a.m.