niceDots: draw a dot plot

View source: R/np_niceDots.R

niceDotsR Documentation

draw a dot plot

Description

draws a categorical dot plot with optional data highlighting, log scale support and optional mean/median/distribution overlays.

Usage

niceDots(
  x,
  by = NULL,
  groupLabels = NULL,
  drawPoints = TRUE,
  errorBars = TRUE,
  barWidth = 0.33,
  barType = c("bar", "dot"),
  barThickness = 2,
  aggFun = c("mean", "median", "none"),
  errFun = c("se", "sd", "range"),
  errorMultiple = 2,
  main = NULL,
  sub = NULL,
  ylab = NULL,
  minorTick = FALSE,
  theme = basicTheme,
  guides = TRUE,
  outliers = 1.5,
  pointSize = 1,
  width = NULL,
  pointShape = NULL,
  plotColors = NULL,
  logScale = FALSE,
  trim = FALSE,
  pointMethod = NULL,
  axisText = c(NULL, NULL),
  showCalc = FALSE,
  calcType = "wilcox",
  yLim = NULL,
  rotateLabels = FALSE,
  rotateY = FALSE,
  add = FALSE,
  minorGuides = NULL,
  extendTicks = TRUE,
  subgroup = FALSE,
  subgroupLabels = NULL,
  highlightLabels = NULL,
  expLabels = TRUE,
  sidePlot = FALSE,
  pointHighlights = FALSE,
  pointLaneWidth = NULL,
  na.rm = FALSE,
  flipFacts = FALSE,
  verbose = FALSE,
  legend = FALSE,
  logAdjustment = 1,
  errorCap = NULL,
  errorLineType = NULL,
  capWidth = NULL,
  lWidth = NULL,
  ...
)

Arguments

x

numeric vector or data frame; The input to prepCategoryWindow can be a numeric vector a data frame of numeric vectors.

by

factor or data frame of factors; used as the primary grouping factor and the factor levels will be used as group names if groupLabels is not specified. If by is a data frame and subgroup=TRUE, the second column is assumed to be a secondary grouping factor, breaking out the data into sub-categories within each major group determined by the levels of the first column.

groupLabels

character vector; overrides the factor levels of by to label the groups

drawPoints

logical; draws a dot plot overlay of the data.

errorBars

logical; Determins if the aggregate data and error (if any) is displayed

barWidth

numeric; cex like scaling factor for percentage of the column width the mean/median bar will span if drawn.

barType

character; Indicates the style of the mean/median bar. Should be 'dot', 'bar' or 'none'.

barThickness

numeric; a cex like multiple for the thickness (lwd) of the aggregate bar relative to the line width lWidth.

aggFun

character; Determines how the data is summarized by factor level. Valid options are mean, median.

errFun

character; How the data spread is charactarized by the error bars. Valid options are sd (standard deviation), se (standard error of the mean), t95ci (t-distribution), boot95ci (boot strap confidence interval) or range.

errorMultiple

numeric; How many standard errors/deviations should be represented by the error bars. Set to zero to supress error bars.

main

character; title for the graph which is supplied to the main argument.

sub

character; subtitle for the graph which is supplied to the sub argument. If NULL and showCalc=TRUE it will be used to display the output form calcStats.

ylab

character; y-axis label.

minorTick

positive integer; number of minor tick-marks to draw between each pair of major ticks-marks.

theme

list object; Themes are are an optional way of storing graphical preset options that are compatible with all nicePlot graphing functions.

guides

logical; will draw guidelines at the major tick-marks if set to TRUE. Color of the guidelines is determined by plotColors$guides.

outliers

positive numeric; number of interquartile ranges (IQR) past the Q1 (25%) and Q3 (75%) cumulative distribution values. Outliers are often defined as 1.5 \times IQR and extreme outliers are more than 3 \times IQR away from the inner 50% data range.

pointSize

positive integer; sets the cex multiplier for point size.

width

numeric; cex-like scaling factor controlling the width of the width of each category lane.

pointShape

positive integer; sets pty for plotting data points. Can be a vector to support additional graphical customization.

plotColors

list; a named list of vectors of colors that set the color options for all NicePlot functions. Names left unspecified will be added and set to default values automatically.

logScale

positive numeric; the base for the for log scale data transformation calculated as log(x+1,logScale).

trim

positive numeric; passed to threshold argument of quantileTrim if any data points are so extreme that they should be removed before plotting and downstream analysis. Set to FALSE to disable.

pointMethod

character; method to be used for ploting dots. Can be set to "jitter", "linear", "beeswarm" or "distribution".

axisText

character; a length two character vector containing text to be prepended or appended to the major tick labels, respectively.

showCalc

logical; if a p-value can be easily calculated for your data, it will be displayed using the sub annotation setting.

calcType

character; should match one of 'none', 'wilcox', 'Tukey','t.test','anova' which will determine which, if any statistical test should be performed on the data.

yLim

numeric vector; manually set the limits of the plotting area (eg. yLim=c(min,max)). Used to format the y-axis by default but will modify the x-axis if side=TRUE.

rotateLabels

logical; sets las=2 for the x-axis category labels. Will affect y-axis if side=TRUE. Note that this may not work well if long names or with subgrouped data.

rotateY

logical; sets las=2 for the y-axis major tick-mark labels. Will affect x-axis if side=TRUE.

add

logical; causes plotting to be added to the existing plot rather the start a new one.

minorGuides

logical; draws guidelines at minor tick-marks

extendTicks

logical; extends minor tick-marks past the first and last major tick to the edge of the graph provided there is enough room. Works for both log-scale and regular settings.

subgroup

logical; use additional column in by to group the data within each level of the major factor.

subgroupLabels

character vector; sets the labels used for the subgroup factor. Defaults to the levels of the factor.

highlightLabels

character; An optional character vector to override the factor labels associated with point highlights if active.

expLabels

logical; prints the major tick labels is logScale^{x} instead of the raw value

sidePlot

logical; switches the axis to plot horizontally instead of vertically.

pointHighlights

logical; will use additional factors in by to highlight points in the dot plot

pointLaneWidth

numeric; This controls how far data point dots can spread along the categorical axis when plotting. Used for pointMethod options 'jitter', 'beeswarm', and 'distribution'.

na.rm

logical; Should NAs be removed from the data set? Both data input and the factor input from by with be checked.

flipFacts

logical; When a dataframe of values is given, column names are used as a secondary grouping factor by default. Setting flipFacts=TRUE makes the column names the primary factor and by the secondary factor.

verbose

logical; Prints summary and p-value calculations to the screen. All data is silently by the function returned either way.

legend

logical/character; if not equal to FALSE with cause a legend to be drawn in the margins. If set to a character string instead of a logical value, the string will be used as the legend title insteas of the factor column name from by.

logAdjustment

= numeric; This number is added to the input data prior to log transformation. Default value is 1.

errorCap

character; Determines the style for the ends of the error bars. Valid options are 'ball', 'bar' or 'none'.

errorLineType

numeric; Sets lty line type for drawing the error bars.

capWidth

numeric; Controls the cex like scaling of the ball or width of the cap if they are drawn at the end of the error bars for the bar plot.

lWidth

numeric; Line width (lwd) for drawing the mean/median bars and errorbars.

...

additional options for S3 method variants.

Details

This is really two different plotting functions merged together. First, the data points can be plotted individually using a distribution waterfall, jitter, beeswarm, just linear or not plotted at all. A signle data vector can be subset (eg using multiple factors with by and optionally subgroup==TRUE) using up to two factors. If a multi-column tibble, matrix or dataframe is used for data input, then can be grouped by a single factor from by with the column names used for factor subgroups. The option flipFacts can be used in this case to make the data columns the primary grouping factor and the first factor in by used for subgroups. On top of this, the mean/median values can be overplotted using errorBars==TRUE and error or distribution (eg. sd, se range, etc.) can be also be shown as errorbars. The error bars can be multiplied by errorMultiple and supressed if errorMultiple=0.

The complicated part of using this function is handling its many options. A wrapper function to set up and run it with preset options may be a good idea if you are using it alot. Briefly put, the by argument can be a data frame of factors and the function will work through the columns in order as needed. If x is a numeric vector, then by should be a factor to group it into categories. If by is a data frame of factors and subgroup=TRUE, then the first column for by is used as the grouping factor and the second column is used as the sub-grouping factor. If pointHighlights=TRUE, and subgroup=TRUE, the the third column of by is used to highlight points data point overlay (assuming drawPoints=TRUE). If subgroup=FALSE and subgroup=TRUE, then the second column of by is used to control the point highlighting. If x itself is a data frame of numeric vectors, subgroup is automatically set to false and each column of x is plotted like a sub-group and grouped by the first column of by. Data point highlighting with pointHighlights=TRUE can still be used when x is a data frame and the highlighting factor will be drawn from the second column of by. Please note that the p-values can not always be calculated and are for general exploratory use only. More careful analysis is necessary to determine statistical significance. This function is as S3 generic and can be extended to provide class specific functionality. To further facilitate data exploration, outputs from statistical testing and data set summaries are printed to the console if verbose=TRUE.

See Also

stripchart, beeswarm, quantileTrim, prepCategoryWindow, jitter

Examples

data(iris)
mCols<-makeColorMatrix()
myCols<-list(fill=mCols[1:3,3],lines="darkblue")
niceDots(iris$Sepal.Length,iris$Species,minorTick=4,showCalc=TRUE,calcType="anova",
    ylab="Sepal Length",main="Sepal Length by Species",plotColors=myCols)


ZachHunter/NicePlots.R documentation built on Sept. 23, 2023, 4:04 a.m.