Nothing
#' @title \code{qqtest} A self-calibrated quantile-quantile plot for assessing distributional shape.
#'
#' @description Draws a quantile-quantile plot for visually assessing whether the data come
#' from a test distribution that has been defined in one of many ways.
#'
#' The vertical axis plots the data quantiles, the horizontal those of a test distribution.
#'
#' Interval estimates and exemplars provide different comparative information to assess the
#' evidence provided by the qqplot against the hypothesis that the data come from the test distribution
#' (default is normal or gaussian).
#' Interval estimates provide test information related to individual quantiles,
#' exemplars provide test information related to the shape of the quantile quantile curve.
#'
#' Optionally, a visual test of significance (a lineup plot) can be displayed to provide a
#' coarse level of significance for testing the null hypothesis that the data come from the
#' test distribution.
#'
#' The default behaviour generates 1000 samples from the test distribution and overlays the
#' plot with pointwise interval estimates for the ordered quantiles from the test distribution.
#'
#' Various option choices are available to effect different visualizations of the
#' uncertainty surrounding the quantile quantile plot.
#' These include overlaying independently generated exemplar test distribution sample
#' quantile traces so as to assess the joint (as opposed to pointwise) distribution of quantiles.
#'
#' See argument descriptions and examples for more details.
#'
#' @export qqtest
#'
#' @importFrom graphics Axis abline box lines par plot points polygon
#' @importFrom grDevices adjustcolor
#' @importFrom stats dchisq fivenum pchisq ppoints qchisq qexp qlnorm qnorm qt quantile qunif rbeta rchisq rexp rlnorm rnorm rt runif
#'
#' @param data A univariate dataset to be tested. If data has more than one column, the first is used.
#' @param dist The name of the distribution against which the comparison is made, the test distribution for a few built-in distributions.
#' One of \code{"gaussian"} (or \code{"normal"}), \code{"log normal"},\code{"half normal"},
#' \code{"uniform"},\code{"exponential"},\code{"student"}, \code{"chi-squared"}, or \code{"kay"}.
#' Only the first three characters of any of these is needed to specify the dist.
#'
#' If dist is \code{"student"}, \code{"chi-squared"}, or \code{"kay"}, then a value for the degrees of freedom
#' argument (\code{df} below) is also required.
#' @param df Degrees of freedom of \code{dist} to be used when \code{dist} is either \code{"student"} or \code{"chi-squared"}.
#' @param qfunction If non-\code{NULL}, this must be a function of a single argument (a proportion, p say) which will be used
#' to calculate the quantiles for the test distribution. If non-\code{NULL}, the \code{rfunction} should also be non-\code{NULL}.
#' The value of the \code{dist} argument will be ignored when this is the case.
#' @param rfunction If non-\code{NULL}, this must be a function of a single argument (a count, n say) which will be
#' used to randomly select a sample of size n from the test distribution.
#' If non-\code{NULL}, the \code{qfunction} must also be non-NULL.
#' If \code{qfunction} is non-\code{NULL} and \code{rfunction} is \code{NULL}, then \code{qfunction} will
#' be applied to the output of a call to \code{runif} in place of the \code{NULL} \code{rfunction}
#' (i.e. a probability integral transform is used to generate a random sample).
#'
#' The value of the \code{dist} argument will be ignored whenever \code{qfunction} is a function.
#'
#' @param dataTest If non-\code{NULL}, this must be a second data set.
#' The empirical distribution given by this data will be used as the test distribution against which the value of data will be tested.
#' If non-\code{NULL}, the values of the arguments \code{dist}, \code{qfunction}, and \code{rfunction} will all be ignored in
#' favour of using this empirical distribution as the test distribution.
#' @param p If non-\code{NULL}, this must be a vector containing the probability points at which the quantiles are to be calculated.
#' If the length of this vector is the same as that of the data, then \code{p} is taken to be the probabilities that
#' correspond to the data; otherwise the data are taken to provide and empirical quantile function values of which are
#' taken at \code{p} to produce the plot.
#'
#' If \code{p} is \code{NULL} (the default), then \code{p} is determined by the function \code{ppoints(n,a)}
#' where \code{n} is the number of data points in \code{data} and \code{a} is given by the argument \code{a} below.
#' @param a This is the second parameter given to the \code{ppoints} call to determine \code{p} as necessary.
#' If \code{a} is \code{NULL}, then default values are chosen for each specific distribution
#' (e.g. 3/8 for gaussian, 0 for uniform, etc.) and as 2/5 otherwise, following the recommendations of C. Cunnane (1978).
#' Users may supply their own value such as the original Hazen's 1/2 or Tukey's 1/3.
#' @param np This is required if the vector \code{p} is provided.
#' Because \code{p} could take any values, we need to know \code{np} the sample size that was used to construct the
#' quantiles at the provided vlaues \code{p}.
#' When \code{p} and \code{np} are provided all simulation is based on simulating values from the distribution of
#' the order statistics \code{p*np}.
#'
#' @param matchMethod It is necessary to match locations and scales of the two distributions. The method used to do this matching
#' is given by \code{matchMethod} which must be one of \code{c("hinges","quartiles", "middlehalf", "bottomhalf", "tophalf")}.
#' A line is fitted to the data and its coefficients used to adjust the location and scale of the test quantiles to match the data.
#'
#' If \code{matchMethod = "hinges"} (the default), then the line is fit to \code{(x, y)} pairs given by the three central values of
#' \code{fivenum()} on each of the coordinates; if \code{"quartiles"} the three quartiles of each coordinate are paired and a line fit.
#' The method \code{"middlehalf"}, as the name suggests, fits a line to the middle half of the sorted data. In this way, it is very similar
#' to the previous two methods. All three are fairly robust and use the central part of the data to estimate the location and scale based roughly
#' on matching medians and inter-quartile spreads.
#'
#' The remaining two methods, \code{"bottomhalf"} and \code{"tophalf"}, use a line fitted to the bottom and top half of the sorted data.
#' These are not generally recommended but might be used when the comparison \code{dist} is skewed. For example with a chi-squared distribution,
#' one might use \code{bottomhalf} to match on data from the shorter left tail of the distribution. Similarly, \code{"tophalf"} might be
#' preferred when \code{dist} is skewed in the opposite direction.
#'
#' Finally, note that the user may supply their own matching method by providing a function having vector arguments \code{x} and \code{y}
#' for the horizontal and vertical quantiles which returns a vector of coefficients (intercept, slope) for the line to be used to match
#' the quantile location and scales.
#' @param xAxisAsProbs If \code{TRUE} (the default is FALSE) the horizontal axis will be labelled as probabilities.
#' These are the cumulative probabilities according to the test distribution. They are located at the corresponding quantile values.
#' They are handy in comparing percentiles of the test and data distributions as well as giving some measure of the symmetry and
#' tail weights of the test distribution by their location.
#' If \code{FALSE} the axis is labelled according to the quantile values.
#' @param yAxisAsProbs If \code{TRUE} (the default is FALSE) the vertical axis will be labelled as probabilities.
#' These are the cumulative probabilities according to the empirical distribution of the \code{data}.
#' They are located at the corresponding quantile values.
#' They are handy in comparing percentiles of the test and data distributions as well as giving
#' some measure of the symmetry and tail weights of the \code{data} distribution by their location.
#' If \code{FALSE} the axis is labelled according to the quantile values.
#' @param xAxisProbs A vector of probabilities to be used to label the x axis ticks when \code{xAxisAsProbs} is \code{TRUE}.
#' Default is \code{c(0.05, 0.25, 0.50, 0.75, 0.95)}. Ignored if \code{xAxisAsProbs} is \code{FALSE}.
#' @param yAxisProbs A vector of probabilities to be used to label the y axis ticks when \code{yAxisAsProbs} is \code{TRUE}.
#' Default is \code{c(0.05, 0.25, 0.50, 0.75, 0.95)}. Ignored if \code{yAxisAsProbs} is \code{FALSE}.
#' @param nreps The number of replicate samples to be taken from the test distribution to construct the pointwise intervals
#' for each quantile. Default is 1000.
#' From these samples, an empirical distribution is generated from the test distribution for the ordered quantiles
#' corresponding to the values of\code{ppoints(length(data))}.
#' These are used to construct central intervals of whatever proportions are given by \code{centralPercents}.
#' @param centralPercents The vector of proportions determining the central intervals of the empirical distribution of
#' each ordered quantile from the test distribution.
#' Default is \code{c(0.90, 0.95, 0.99)} corresponding to central 90, 95, and 99\% simulated pointwise confidence
#' intervals for each quantile coming from the test distribution for a sample the same size as \code{data}.
#' The quality of these interval locations typically increases with \code{nreps} and decreases with the
#' probability used for each interval.
#' @param envelope If \code{TRUE} (the default), a grey envelope is plotted showing the central intervals for
#' each quantile as a shade of grey. The higher is the corresponding probability associated with the interval,
#' the lighter is the shade.
#' The outermost edges of the envelope are the range of the simulated data from the test distribution.
#' The envelope thus provides a (pointwise) density estimate of the quantiles drawn from the test distribution for this sample size.
#' If \code{FALSE} no envelope is drawn.
#' @param drawPercentiles If \code{TRUE}, a pair of curves is plotted to show each of the central intervals as a different line type.
#' These are plotted over the envelope if \code{envelope} is \code{TRUE}.
#' If \code{FALSE} (the default) no simulated percentile curves are drawn.
#' @param drawQuartiles If \code{TRUE}, a pair of curves is plotted to show the quartiles (central 50\% region)
#' of the ordered quantiles simulated from the test distribution. The median of these is also plotted as a solid line type.
#' These are plotted over the envelope if \code{envelope} is \code{TRUE}.
#' If \code{FALSE} (the default) none of these curves are drawn.
#' @param legend If \code{TRUE} (the default is \code{NULL}) with \code{nreps>0} a legend for the appearance
#' of the simulated ranges of the central intervals is added to the plot.
#' If \code{FALSE}, no legend appears. If \code{NULL}, legend always appears except when \code{lineup = TRUE}.
#' @param legend.xy Either a string being one of \code{c("bottomright", "bottom", "bottomleft", "left", "topleft","top", "topright", "right", "center")}
#' (default is \code{"topleft"}) or a list with named components \code{x} and \code{y}
#' to be interpreted exactly as in the function \code{legend()} from the \code{graphics} package.
#' @param legend.cex Character expansion size for legend (default 0.8).
#' @param nexemplars (default is 0) The number of replicate samples to be taken from the test distribution and
#' plotted as a coloured trail on the qqplot.
#' Each such trail is a sample of the same size as \code{data} but truly coming from the test distribution.
#' Each trail gives some idea of what the shape of a qqplot would be for a sample of that size from the test distribution.
#' Together, they give some sense of the variability in the plot's shape.
#' @param typex (default is "o", or match \code{type} if supplied) The \code{plot} type to be used in the
#' plotting of the exemplars, legal values are the same as for the \code{type} argument of \code{plot}.
#' @param plainTrails If \code{TRUE}, then a single grey colour is used for all exemplar trails.
#' If \code{FALSE} (the default), each exemplar trail is shown in a different colour.
#' @param colTrails Colours to be used for the trails (default will be multi-colours).
#' @param alphaTrails The alpha transparency to be used in plotting all exemplar trails. The default is 0.25.
#' Because the trails will overplot, a smaller \code{alphaTrails} value is recommended as \code{nexemplars} increases.
#' @param lwdTrails The graphical line width (\code{lwd}) to be used in plotting all exemplar trails. The default is 1.
#' Because the trails will overplot, combining a larger \code{lwdTrails} with \code{envelope = FALSE},
#' a lower \code{alphaTrails} value larger \code{nexemplars} can give a truer sense of the density of qqplot
#' configurations than with \code{envelope = TRUE}.
#' @param lineup If \code{TRUE} (default is \code{FALSE}) the qqplot of \code{data} is randomly located in a
#' grid of \code{nsuspects} plots. Identical arguments are given to construct all \code{qqtest} plots in the grid.
#' Assuming the viewer has not seen the qqplot of this \code{data} before, a successful selection of the true \code{data}
#' plot out of the grid of plots corresponds to evidence against the hypothesis that the \code{data} come from the test distribution.
#' Significance level is 1/\code{nsuspects}.
#' Similarly, if the viewer narrows the selection down to only k possible subjects AND the true location is among them, then
#' the corresponding observed level of significance (p-value) would be k//\code{nsuspects}.
#'
#' Each plot is given a suspect number from 1 to \code{nsuspects} (left to right, top to bottom).
#' The suspect number of the plot corresponding to the actual \code{data} is returned, slightly obfuscated to help keep the test honest.
#' @param nsuspects The total number of plots (default is 20) to be viewed in the lineup display when \code{lineup} is \code{lineup}.
#' @param col If non-\code{NULL}, \code{col} must be colour to be used for the points in the plot. If \code{NULL} (the default), an \code{hcl} colour will be used from the values of the arguments \code{h}, \code{c}, \code{l}, and \code{alpha}.
#' @param h The hue of the colour of the points. Specified as an angle in degrees from 0 to 360 around a colour wheel. E.g. 0 is red, 120 green, 240 blue, Default is 260 (a bluish).
#' @param c The chroma of the colour of the points. Takes values from 0 to an upper bound that is a function of hue, \code{h}, and luminance, \code{l}. Roughly, for fixed \code{h} and \code{l} the higher the value of \code{c} the greater the intensity of colour.
#' @param l The luminance of the colour of the points. Takes values from 0 to 100. For any given combination of hue, \code{h}, and chroma, \code{c}, only a subset of this range will be possible. Roughly, for fixed \code{h} and \code{c} the higher the value of \code{l} the lighter is the colour.
#' @param alpha The alpha transparency of the colour of the points. Takes values from 0 to 1. Values near 0 are more transparent, values near 1 (the default) are more opaque. Alpha values sum when points over plot, giving some indication of density.
#' @param cex The graphical parameter \code{cex} for the size of the points.
#' @param pch The graphical parameter \code{pch} for the point character to be used for the points. Default is 19, a filled circle.
#' @param type The graphical parameter \code{type} for the points of the qqplot. Default is "p".
#' @param main The graphical parameter \code{main} providing a title for the plot.
#' If \code{NULL} (the default), the title will be "qqtest" when \code{lineup = FALSE} and "Suspect: " followed by
#' the plot number when \code{lineup = TRUE}. An empty string will suppress the title.
#' @param xlab The graphical parameter \code{xlab} labelling the x axis of the plot.
#' If \code{NULL} (the default), an \code{xlab} is created based on the information available from the
#' other arguments to \code{qqtest} about the test distribution. An empty string will suppress the labelling.
#' @param ylab The graphical parameter \code{ylab} labelling the y axis of the plot.
#' If \code{NULL} (the default), a \code{ylab} is created based on the information available from the other
#' arguments to \code{qqtest}. An empty string will suppress the labelling.
#' @param xlim The graphical parameter \code{xlim} determining the display limits of the x axis.
#' @param ylim The graphical parameter \code{ylim} determining the display limits of the y axis.
#' @param axes The graphical parameter \code{axes} determining whether axes are to be displayed
#' (default is \code{NULL} which the same as \code{TRUE} except when \code{lineup=TRUE}, then \code{axes} is \code{FALSE}).
#' @param bty The graphical parameter \code{bty} determining the type of box to be used to
#' enclose the plot (default is \code{"o"}, set \code{bty="n"} for no box).
#' @param ... Any further graphical parameters to be passed to the \code{plot} function.
#'
#' @return Displays the qqplot.
#'
#' Invisibly returns a list with named components \code{x}, \code{y}, and \code{order} giving the horizontal and
#' vertical locations of the points in sorted order, as well as the order vector from the input data. The result of \code{qqtest} must be assigned to get these.
#' The values could then, for example, be used to identify or label points in the display.
#'
#' When \code{lineup} is \code{TRUE}, it returns a string encoding
#' the true location of the data as a calculation to be evaluated. This provides some simple obfuscation of the true
#' location so that the visual assessment can be honest. The true location is revealed by calling \code{revealLocation()}
#' on the returned value.
#'
#' @source
#' "Self calibrating quantile-quantile plots",
#' R. Wayne Oldford, The American Statistician, 70, (2016)
#' \url{https://doi.org/10.1080/00031305.2015.1090338}
#'
#' "Unbiased Plotting Positions -- A Review",
#' C. Cunnane, Journal of Hydrology, Vol. 37 (1978), pp. 205-222.
#'
#' @examples
#' #
#' # default qqtest plot
#' qqtest(precip, main = "Precipitation (inches/year) in 70 US cities")
#' #
#' # qqtest to compare to qqnorm
#' op <- par(mfrow=c(1,2))
#' qqnorm(precip, main="qqnorm")
#' qqtest(precip, main="qqtest",
#' xAxisAsProbs=FALSE, yAxisAsProbs=FALSE)
#' par(op)
#' #
#' # Use lines instead of envelope
#' qqtest(precip, envelope=FALSE, drawPercentiles=TRUE,
#' main = "Precipitation (inches/year) in 70 US cities")
#' #
#' # Use quartiles instead of envelope
#' qqtest(precip, envelope=FALSE, drawQuartiles=TRUE,
#' main = "Precipitation (inches/year) in 70 US cities")
#' #
#' # Use coloured exemplars (qqplot of data simulated from the test distribution)
#' # and suppress the envelope. Where the envelope, percentiles, and quartiles are
#' # simulated pointwise bands, exemplars give some sense of what the (joint) shape of the
#' # quantile-quantile plot should look like (for data from the test distribution).
#' # Each simulated sample is a different colour.
#' qqtest(precip, nexemplars=10, typex="o", envelope=FALSE, type="p",
#' main = "Precipitation (inches/year) in 70 US cities")
#' #
#' # Alternatively, the trail of each exemplar could be plain (the identical grey).
#' # Making each trail wide and assigning it some transparency (alpha near 0)
#' # allows the trails to give a sense of the density through the darkness of the grey.
#' #
#' qqtest(precip, nexemplars=20, envelope=FALSE,
#' lwdTrails=3, plainTrails=TRUE, alphaTrail=0.4, typex="o", type="o",
#' main = "Precipitation (inches/year) in 70 US cities")
#' #
#' # Wide coloured exemplars with some transparency provide an indication of
#' # density and allow some trails to be followed by colour.
#' #
#' qqtest(precip, nexemplars=20, envelope=FALSE,
#' lwdTrails=3, alphaTrail=0.4, typex="o", type="o", col="black",
#' main = "Precipitation (inches/year) in 70 US cities")
#'
#'
#' # Envelope and exemplars with coloured trails to be followed.
#' #
#' qqtest(precip, nexemplars=5,
#' lwdTrails=2, alphaTrail=0.6, alpha=0.8,
#' main = "Precipitation (inches/year) in 70 US cities")
#' #
#' #
#' # gaussian - qqplot, but now showing in the line up
#' trueLoc <- qqtest(precip, lineup=TRUE, main="Suspect", legend=FALSE,
#' cex=0.75, col="grey20", ylab="", pch=21)
#' # the location of the real data in the line up can be found by evaluating
#' # the contents of the string
#' trueLoc
#' #
#' # Cut and paste the string contents into the R console, or simply
#' revealLocation(trueLoc)
#' #
#' #
#' # log-normal ... using the bacteria data from Whipple(1916)
#' data(bacteria, package="qqtest")
#' # Note that these are selected percentiles from a sample of 365 days in a year
#' with(bacteria,
#' qqtest(count, dist = "log-normal", p=percentTime/100, np=365, type="o",
#' yAxisAsProbs=FALSE, ylab="bacteria per cc",
#' xAxisProbs = c(0.01, 0.50,0.75, 0.90, 0.95, 0.99, 0.995),
#' xlab="Percentage of days in 1913",
#' main = "Number of bacteria from the Delaware river in 1913")
#' )
#' ptics <- c(0.01, 0.10, 0.25, 0.50, 0.75, 0.90, 0.99 )
#' axis(1,at=qnorm(ptics), labels=floor(ptics*100))
#' yvals <- c(100, 1000, 10000, 100000)
#' axis(2, at=log(yvals,10),
#' labels=c("100", "1,000", "10,000", "100,000"))
#' #
#' # compare this to the log-scaled normal qqplot
#' #
#' #
#' with(bacteria,
#' qqtest(log(count, 10), dist = "normal",
#' p=percentTime/100, np=365,
#' type="o", axes=FALSE,
#' ylab="bacteria per cc",
#' xlab="Proportion of days in 1913",
#' main = "Number of bacteria from the Delaware river in 1913")
#' )
#' #
#' #
#' # Half normal ... using the penicillin data from Daniel(1959)
#' data(penicillin)
#'
#' qqtest(penicillin, dist = "half-normal")
#'
#' # Or the same again but with significant contrast labelled
#'
#'
#' with (penicillin,
#' {qqtest(value, yAxisProbs=c(0.1, 0.75, 0.90, 0.95),
#' dist="half-normal",
#' ylab="Sample cumulative probability",
#' xlab="Half-normal cumulative probability")
#' ppAdj <- (1+ppoints(31))/2 # to get half-normals from normal
#' x <- qnorm(ppAdj)
#' valOrder <- order(value) # need data and rownames in increasing order
#' y <- value[valOrder]
#' tags <- rownames(penicillin)[valOrder]
#' selPoints <- 28:31 # going to label only the largest effects
#' xoffset <- c(0.01, 0.02, 0.03, 0.075) # text function is a bit off
#' text(x[selPoints]-xoffset, y[selPoints],
#' tags[selPoints],
#' pos=2, cex=0.75)
#' }
#' )
#'
#' # Alternatively, use the returned results for a lot less work
#' results <- qqtest(penicillin$value, yAxisProbs=c(0.1, 0.75, 0.90, 0.95),
#' dist="half-normal",
#' ylab="Sample cumulative probability",
#' xlab="Half-normal cumulative probability")
#' tags <- row.names(penicillin)[results$order]
#' selPoints <- 28:31 # going to label only the largest effects
#' xoffset <- c(0.01, 0.02, 0.03, 0.075) # text function is a bit off
#' text(results$x[selPoints]-xoffset, results$y[selPoints],
#' tags[selPoints],
#' pos=2, cex=0.75)
#'
#' # or the same points could have been identified interactively vusing
#' # identify(results$x, results$y, labels = row.names(penicillin)[results$order])
#' #
#' # K on 9 df ... see help(dkay)
#' # Use data on primer paint thickness (standard deviations on n=10)
#' data(primer, package="qqtest")
#' with (primer,
#' qqtest(s, dist="kay", df=9,
#' yAxisAsProbs=FALSE,
#' ylab="Standard deviation of primer thickness (in mils)")
#' )
#' #
#' # chi-squared on 3 df
#' # Use robust covariance matrix in calculation Mahalanobis distances
#' # for the classical Brownlee stackloss data.
#' data(stacklossDistances, package="qqtest")
#' with(stacklossDistances,
#' qqtest(robust, dist="chi", df=3, ylab="Robust Mahalanobis distances"))
#' #
#' #
#' # user supplied qfunction and rfunction -- compare to beta distribution
#' qqtest(precip,
#' qfunction=function(p){qbeta(p, 2, 2)},
#' rfunction=function(n){rbeta(n, 2, 2)},
#' main = "Precipitation (inches/year) in 70 US cities")
#' #
#' #
#' # user supplied qfunction only -- compare to beta distribution
#' qqtest(precip,
#' qfunction=function(p){qbeta(p, 2, 2)},
#' main = "Precipitation (inches/year) in 70 US cities")
#' #
#' # comparing data samples
#' #
#' # Does the sample of beaver2's temperatures look like they
#' # could have come from a distribution shaped like beaver1's?
#' #
#' qqtest(beaver2[,"temp"],
#' dataTest=beaver1[,"temp"],
#' ylab="Beaver 2", xlab="Beaver 1",
#' main="Beaver body temperatures")
#'
qqtest <- function (data,
dist=c("gaussian", "normal",
"log-normal", "half-normal",
"uniform", "exponential",
"chi-squared", "kay",
"student","t"),
df=1,
qfunction=NULL,
rfunction=NULL,
dataTest=NULL,
p=NULL,
a=NULL,
np = NULL,
matchMethod = c("hinges","quartiles", "middlehalf", "bottomhalf", "tophalf"),
xAxisAsProbs = FALSE,
yAxisAsProbs = FALSE,
xAxisProbs = c(0.05, 0.25, 0.50, 0.75, 0.95),
yAxisProbs = c(0.05, 0.25, 0.50, 0.75, 0.95),
nreps=1000,
centralPercents=c(0.90, 0.95, 0.99),
envelope=TRUE,
drawPercentiles = FALSE,
drawQuartiles = FALSE,
legend=NULL,
legend.xy = "topleft",
legend.cex = 0.8,
nexemplars=0,
typex=NULL,
plainTrails=FALSE,
colTrails=NULL,
alphaTrails=0.25,
lwdTrails=1,
lineup=FALSE,
nsuspects=20,
col=NULL,
h=260,
c=90,
l=60,
alpha=1.0,
cex=NULL,
pch=19,
type=NULL,
main=NULL,
xlab=NULL,
ylab=NULL,
xlim=NULL,
ylim=NULL,
axes=NULL,
bty="o",
...
) {
if (is.matrix(data) || is.data.frame(data)) {
data <- data.matrix(data)[,1]
}
data_order <- order(data)
sorted_data <- data[data_order]
if (is.null(typex)){
typex <- if (is.null(type)) "o" else type
}
if (is.null(type)) type <- "p"
if (!is.numeric(alphaTrails)) {stop("alphaTrails must be a number from 0 to 1.")} else {
if ((alphaTrails < 0) | (alphaTrails > 1)) stop("alphaTrails must be a number from 0 to 1.")
}
if (nexemplars > 0) {
if (is.null(colTrails)) {
colTrails <- sapply(1:nexemplars, function(i) {grDevices::hcl(h=i*360/nexemplars, c=90, l=60)})
}
colTrails <- adjustcolor(colTrails, alpha.f = alphaTrails)
alphaTrails <- 1
colTrails <- rep(colTrails, length.out=nexemplars)
}
if(is.list(legend.xy)) {
if((!("x" %in% names(legend.xy))) |
(!("y" %in% names(legend.xy)))) {
stop("Need to specify which components of legend.xy are x and which are y")
}
}
generateUniformOrderStat <- NULL
# If probabilities p are provided, only these values will be used.
if (!is.null(p) ){
# first check whether these are the same as the number of data points
# if so, then assume that the data points *are* the values at these places.
# if not, then extract from the data the empirical quantiles at the given p values
if (length(p)!=length(data)) {data <- quantile(data,p)}
#
if (envelope||drawPercentiles||drawQuartiles||lineup||(nexemplars>0)) {
# This means we are going to have generate some values but cannot count
# on the length(data) or length(p) to determine the sample size on which
# to base the generation.
# In this case, the sample size must be provided as the value of np:
if (is.null(np) || !is.numeric(np)) {
stop("The sample size np, on which values of p are to be based, must be given whenever the values of p are specified.")}
# Because we have specified probabilities, we are going to have to generate
# the corresponding order statistics directly.
# Plan is to generate these by applying the appropriate quantile function to
# the corresponding order statistic from a uniform ... Q(U_(i))
# Need to determine an appropriate index.
# We will take the sample size N to be large enough to have each prob in p
# be i/N for large enough N (Note fractional indices are possible)
orderIndices <- np * p
# We generate the uniform order statistics
generateUniformOrderStat <- function(index, thismany) {
rbeta(thismany, index, np + 1 - index)}
}
}
n <- length(data)
# Get the distribution argument and fix synonyms
dist <- match.arg(dist)
# normal and gaussian are the same
if (dist=="normal") {dist <- "gaussian"}
# t and student are the same
if (dist=="t") {dist <- "student"}
# The following stay null if there are none to create
reps <- NULL
exemplars <- NULL
# Has the user supplied some axis labels?
makeXlabel <- is.null(xlab)
makeYlabel <- is.null(ylab)
#
# Has the user supplied a main title?
makeTitle <- is.null(main)
#
# If lineup, don't do all reps yet
if (lineup) {
if (!(nsuspects > 1)) {
warning("nsuspects must be > 1, set to 20 instead")
nsuspects <- 20
}
nrepsInput <- nreps
nreps <- nsuspects
nexemplarsInput <- nexemplars
nexemplars <- 0 ##### Should be zero? or 1?
if (is.null(axes)) axes <- FALSE
if (makeYlabel) ylab <- ""
if (is.null(legend)) legend <- FALSE
if (is.null(cex)) cex <- 2 / floor(sqrt(nsuspects))
} else {
# No lineup? Default axes = TRUE
if(is.null(axes)) axes <- TRUE
}
#
# Getting some reps from the appropriate distribution
if (!is.null(dataTest)) {
## plan is to test the data against the
## empirical distribution from dataTest
if (is.null(p)){
if (is.null(a)){
## use general purpose a=2/5 as per Cunnane (1978)
p <- ppoints(n,2/5)
}
else {p <- ppoints(n,a)}
}
qfunction <- function(p){quantile(dataTest,p)}
rfunction <- function(n){sample(dataTest,n,replace=TRUE)}
if (makeXlabel) {xlab <- "Empirical distribution"}
}
else {
if (is.null(qfunction)){
switch(dist,
# Gaussian or Normal
"gaussian"={if (is.null(p)){
if (is.null(a)){
## a=3/8 as per Cunnane (1978) for Normal
p <- ppoints(n,3/8)
}
else { p <- ppoints(n,a) }
}
qfunction <- qnorm
rfunction <- rnorm
if (makeXlabel) {xlab <- "Gaussian"}
},
"log-normal"={if (is.null(p)){
if (is.null(a)){
## a=3/8 as per Cunnane (1978) for Normal
p <- ppoints(n,3/8)
}
else { p <- ppoints(n,a) }
}
qfunction <- qlnorm
rfunction <- rlnorm
if (makeXlabel) {xlab <- "Lognormal"}
},
"half-normal"={if (is.null(p)){
if (is.null(a)){
## a=3/8 as per Cunnane (1978) for Normal
p <- ppoints(n,3/8)
}
else { p <- ppoints(n,a) }
}
qfunction <- function(p) {qnorm((1 + p)/2)}
rfunction <- function(n) {abs(rnorm(n))}
if (makeXlabel) {xlab <- "Half-normal"}
},
"uniform"={if (is.null(p)){
if (is.null(a)){
## a=0 as per Cunnane (1978) for uniform
p <- ppoints(n,0)
}
else { p <- ppoints(n,a) }
}
qfunction <- qunif
rfunction <- runif
if (makeXlabel) {xlab <- "Uniform"}
},
"exponential"={if (is.null(p)){
if (is.null(a)){
## a=0.44 as per Cunnane (1978) for exponential
p <- ppoints(n,0.44)
}
else { p <- ppoints(n,a) }
}
qfunction <- qexp
rfunction <- rexp
if (makeXlabel) {xlab <- "Exponential(1)"}
},
# K distribution (= sqrt(chisquared/df))
"kay"= {if (is.null(p)){
if (is.null(a)){
if (df <= 1){
## a=1/2 as per Cunnane (1978) for (Pearson III or chi)
p <- ppoints(n,1/2)
}
else {
## a=2/5 as per Cunnane (1978) for (Pearson III or chi)
p <- ppoints(n,2/5)
}
}
else { p <- ppoints(n,a) }
}
qfunction <- function(p) {qkay(p,df)}
rfunction <- function(n) {rkay(n,df)}
if (makeXlabel) {xlab <- paste("K(",df,")", sep="")}
},
"chi-squared"={if (is.null(p)){
if (is.null(a)){
if (df <= 1){
## a=1/2 as per Cunnane (1978) for (Pearson III or chi)
p <- ppoints(n,1/2)
}
else {
## a=2/5 as per Cunnane (1978) for (Pearson III or chi)
p <- ppoints(n,2/5)
}
}
else { p <- ppoints(n,a) }
}
qfunction <- function(p) {qchisq(p,df)}
rfunction <- function(n) {rchisq(n,df)}
if (makeXlabel) {xlab <- paste("Chi-squared(",df,")", sep="")}
},
# Student's t
"student"= {if (is.null(p)){
if (is.null(a)){
## use general purpose a=2/5 as per Cunnane (1978)
p <- ppoints(n,2/5)
}
else { p <- ppoints(n,a) }
}
qfunction <- function(p) {qt(p,df)}
rfunction <- function(n) {rt(n,df)}
if (makeXlabel) {xlab <- paste("Student t(",df,")", sep="")}
},
stop(paste("Unimplemented distribution:", dist,
"Could instead call qqtest by",
"supplying a function for qfunction",
"(and optionally another for rfunction)."))
) # end of switch
} else if(is.function(qfunction)) {
if (is.null(p)){
if (is.null(a)){
## use general purpose a=2/5 as per Cunnane (1978)
p <- ppoints(n,2/5)
} else
{
p <- ppoints(n,a)
}
}
if (!is.function(rfunction)) {
rfunction <- function(n) {
sapply(runif(n),qfunction)
}
}
if (makeXlabel) {xlab <- "Hypothetical distribution"}
} else {warning("qfunction must be a function that returns quantiles")}
}
q <- qfunction(p)
xAxisTicksAt <- qfunction(xAxisProbs)
# Get the location and scale changes to use for the simulations (and to return).
# Use a line fit to get the location scale correction for the
# data sampled from the test distribution.
# Method of determining line given by value of matchMethod
#
# Two variables are used
if (is.function(matchMethod)) {
# Some user supplied function that must take x and y as arguments,
# Fit a line and return a vector of coefficients as (Intercept, slope)
locScaleCoefs <- matchMethod(x = q, y = sorted_data)
loc <- locScaleCoefs[1]
scale <- locScaleCoefs[2]
} else {
matchMethod <- match.arg(matchMethod)
switch(matchMethod,
hinges = {
locScaleLine <- stats::lm(y ~ x, data = data.frame(x = fivenum(q)[2:4],
y = fivenum(data)[2:4]))
},
bottomhalf = {
n <- length(data)
start_index <- 1
end_index <- trunc(n/2)
indices_selected <- seq(start_index, end_index)
locScaleLine <- stats::lm(y ~ x, data = data.frame(x = q[indices_selected],
y = sorted_data[indices_selected]))
},
middlehalf = {
n <- length(data)
start_index <- trunc(n/4)
end_index <- trunc(3*n/4)
indices_selected <- seq(start_index, end_index)
locScaleLine <- stats::lm(y ~ x, data = data.frame(x = q[indices_selected],
y = sorted_data[indices_selected]))
},
tophalf = {
n <- length(data)
start_index <- trunc(3*n/4)
end_index <- n
indices_selected <- seq(start_index, end_index)
locScaleLine <- stats::lm(y ~ x, data = data.frame(x = q[indices_selected],
y = sorted_data[indices_selected]))
},
quartiles = {
locScaleLine <- stats::lm(y ~ x, data = data.frame(x = fivenum(q)[2:4],
y = fivenum(data)[2:4]))
},
stop("No such 'matchMethod'"))
loc <- locScaleLine$coefficients[1]
scale <- locScaleLine$coefficients[2]
}
if (nreps > 0) {
if (is.null(generateUniformOrderStat)) {
reps <- rfunction(n*nreps)
}
else
{
reps <- qfunction(sapply(orderIndices,
function(index){generateUniformOrderStat(index,nreps)}
)
)
}
reps <- loc + scale * array(reps,dim=c(nreps,n))
}
if (nexemplars > 0) {
if (is.null(generateUniformOrderStat)) {
exemplars <- rfunction(n*nexemplars)
}
else
{
exemplars <- qfunction(sapply(orderIndices,
function(index){
generateUniformOrderStat(index, nexemplars)}))
}
exemplars <- loc + scale * array(exemplars,dim=c(nexemplars,n))
}
# Do lineup plot if asked. ... recursive
if (lineup){
suspects <- reps[1:nsuspects,]
trueLoc <- sample(1:nsuspects,1)
suspects[trueLoc,] <- data
nrow <- floor(sqrt(nsuspects))
ncol <- ceiling(sqrt(nsuspects))
parOptions <- par(mfrow=c(nrow,ncol), mar=c(0,0,1.5,0))
if (makeTitle){main <- "Suspect"}
for (i in 1:nsuspects) {
qqtest(data = suspects[i,],
dist=dist,
df=df,
qfunction = qfunction,
rfunction = rfunction,
dataTest = dataTest,
matchMethod = matchMethod,
xAxisProbs = xAxisProbs,
yAxisProbs = yAxisProbs,
xAxisAsProbs = xAxisAsProbs,
yAxisAsProbs = yAxisAsProbs,
nreps= nrepsInput,
centralPercents = centralPercents,
envelope = envelope,
drawPercentiles = drawPercentiles,
drawQuartiles = drawQuartiles,
legend = legend,
legend.xy = legend.xy,
legend.cex = legend.cex,
nexemplars = nexemplarsInput,
typex = typex,
plainTrails = plainTrails,
colTrails = colTrails,
alphaTrails = alphaTrails, # Note this will be 1 here so that repeated application doesn't make the colours disappear in lineup
lwdTrails = lwdTrails,
lineup=FALSE,
nsuspects=1,
col=col,
h=h,
c=c,
l=l,
alpha=alpha,
cex=cex,
pch=pch,
type=type,
main = if(main=="") {paste0(i)} else {paste(main, i, sep=": ")},
xlab=xlab,
ylab = ylab,
xlim = xlim,
ylim = ylim,
axes=axes,
bty=bty,
...)
}
par(parOptions)
returnValue <- hideLocation(trueLoc, nsuspects)
returnValue # this must be returned
} else {
# Otherwise construct the plot.
if (!is.null(reps)) {
# Sort the reps into the right form and get summary statistics
reps <- apply(reps, 1, sort) # Note this flips the dimensions of reps
Nums <- apply(reps, 1, fivenum)
if (envelope||drawPercentiles){
nLevels <- length(centralPercents)
centralPercents <- sort(centralPercents)
SymmetricAdjust <-(1-centralPercents)/2
bottomPcts <- (1-centralPercents)-SymmetricAdjust
topPcts <- centralPercents + SymmetricAdjust
Pcts <- c(rev(bottomPcts),topPcts)
NumsPct <- apply(reps, 1, function(x){quantile(x,Pcts)})
# defining grey
greyhue <- 260
greyluminance <- 65
greychroma <- 0
greyalpha <- 0.6/(nLevels +1) # plus 1 for the range
grey <- grDevices::hcl(h= greyhue, c= greychroma, l= greyluminance,
alpha= greyalpha)
}
}
## Get the base plot
xlim <- if(!is.null(xlim)) {xlim} else {range(q)}
ylim <- if(!is.null(ylim)) {
ylim
} else {
yrange <- range(data)
if (!is.null(reps)) {
yrange <- range(c(yrange, range(reps)))
}
if (!is.null(exemplars)) {
yrange <- range(c(yrange, range(exemplars)))
}
yrange
}
plot(0,0,
main = if(makeTitle) {"qqtest"} else main,
xlab = if(makeXlabel) {
paste(xlab, if(xAxisAsProbs) {
"cumulative probability (on quantile scale)"
} else {"quantiles"})
} else xlab,
ylab = if(makeYlabel) {
paste(if(yAxisAsProbs) {
"Sample cumulative probability (on quantile scale)"
} else {"Sample quantiles"})
} else ylab,
col = "white",
xlim = xlim,
ylim = ylim,
axes = FALSE,
type="n",
bty=bty,
... # other plot parameters
)
# Draw the axes
if(axes) {
if (xAxisAsProbs) {
Axis(side=1,
labels = paste(xAxisProbs),
at = xAxisTicksAt)
} else {
Axis(side=1)
}
if (yAxisAsProbs) {
Axis(side=2,
labels = paste(yAxisProbs),
at = quantile(data,yAxisProbs))
} else {
Axis(side=2)
}
}
# put the box around to look standard
if (!bty=="n") {box(bty=bty)}
## Add the envelope for the replicates
if(envelope && !is.null(reps)) {
# first the range
polygon(c(q,rev(q)),
c(Nums[1,],rev(Nums[5,])),
border=grey,
col=grey)
# now the central pointwise "confidence" intervals
nLevels <- length(centralPercents)
for (i in 1:nLevels){
iLower <- i
iUpper <- nLevels*2 - iLower +1
polygon(c(q,rev(q)),
c(NumsPct[iLower,],rev(NumsPct[iUpper,])),
border=grey,
col=grey)
}
}
## Add exemplar trails
if (!is.null(exemplars)&&nexemplars>=1) {
if(!plainTrails){
for (i in 1:nexemplars){
points(q,sort(exemplars[i,]),
col=colTrails[i],
type=typex,
lwd=lwdTrails)
}
} else { # grey trails
plainCol <- grDevices::hcl(h=260, c=0, l= 90, alpha=alphaTrails)
for (i in 1: nexemplars){
points(q,sort(exemplars[i,]),
col=plainCol,
type=typex,
lwd=lwdTrails)
}
}
}
if(drawPercentiles && !is.null(reps)){
# and the various percentiles
if (drawQuartiles) {
lineTypes <- c(3:(nLevels +2))
} else{
lineTypes <- c(1:nLevels)
}
lineCols <- rep("darkgrey",nLevels)
for(i in 1:nLevels) {
# central intervals
lineCols[i] <- "darkgrey"
# draw the lower and upper lines
lines(q,NumsPct[i,],col= lineCols[i],lty= lineTypes[i], lwd=2)
lines(q,NumsPct[2*nLevels-i+1,],col= lineCols[i],lty= lineTypes[i], lwd=2)
}
}
# quartiles
if(drawQuartiles && !is.null(reps)){
# draw quartiles
lines(q,Nums[2,],col="black",lty=2)
lines(q,Nums[4,],col="black",lty=2)
# draw median
lines(q,Nums[3,],col="black",lty=1)
}
# and optional legend
if (is.null(legend)) legend <- TRUE
if (legend && (drawQuartiles||drawPercentiles) && !is.null(reps)){
if (drawPercentiles) {
if (drawQuartiles) {
legendString <- c(paste(signif(100*rev(centralPercents),3), "% central range", sep=""),
"quartiles", "median")
legendLineTypes <- c(lineTypes,2,1)
legendCols <- c(lineCols,"black","black")
} else {
# no quartiles
legendString <- paste(signif(100*rev(centralPercents),3), "% central range", sep="")
legendLineTypes <- lineTypes
legendCols <- lineCols
}
} else {
# only quartiles
legendString=c("quartiles", "median")
legendLineTypes <- c(2,1)
legendCols <- c("black","black")
}
# Draw it
if (envelope && !drawQuartiles) {
fillCols <- rep(grey, nLevels+1)
for (i in 2:(nLevels+1)) {
fillCols[i] <- grDevices::hcl(h= greyhue, c= greychroma, l= greyluminance, alpha=i * greyalpha)
}
if (is.list(legend.xy)) {
legend(legend.xy$x, legend.xy$y,
legend = legendString,
lwd=1,
cex=legend.cex,
bty="n",
lty= legendLineTypes,
col= legendCols,
fill=fillCols,
border=fillCols,
text.col="darkgrey",
title=paste("Simulated ranges", "n =", nreps))
} else {
legend(legend.xy,
legend = legendString,
lwd=1,
cex=legend.cex,
bty="n",
lty= legendLineTypes,
col= legendCols,
fill=fillCols,
border=fillCols,
text.col="darkgrey",
title=paste("Simulated ranges", "n =", nreps))
}
} else {
if (is.list(legend.xy)) {
legend(legend.xy$x, legend.xy$y,
legend = legendString,
lwd=1,
cex=legend.cex,
bty="n",
lty= legendLineTypes,
col= legendCols,
text.col="darkgrey",
title=paste("Simulated ranges", "n =", nreps))
} else {
legend(legend.xy,
legend = legendString,
lwd=1,
cex=legend.cex,
bty="n",
lty= legendLineTypes,
col= legendCols,
text.col="darkgrey",
title=paste("Simulated ranges", "n =", nreps))
}
}
}
if (legend && !drawQuartiles && !drawPercentiles && envelope && !is.null(reps)){
# Draw it
fillCols <- rep(grey, nLevels+1)
for (i in 2:(nLevels+1)) {
fillCols[i] <- grDevices::hcl(h= greyhue, c= greychroma, l= greyluminance, alpha=i * greyalpha)
}
if (is.list(legend.xy)) {
legend(legend.xy$x, legend.xy$y,
legend = c("Range",paste(signif(100*rev(centralPercents),3),
"% central range",
sep="")),
col=fillCols,
fill=fillCols,
border=fillCols,
cex=legend.cex,
bty="n",
text.col="darkgrey",
title=paste("Simulated ranges", "n =", nreps))
} else {
legend(legend.xy,
legend = c("Range", paste(signif(100*rev(centralPercents),3),
"% central range",
sep="")),
col=fillCols,
fill=fillCols,
border=fillCols,
cex=legend.cex,
bty="n",
text.col="darkgrey",
title=paste("Simulated ranges", "n =", nreps))
}
}
## And finally the points
if (is.null(col)){
plot_colour <- grDevices::hcl(h=h, c=c, l=l, alpha=alpha)
} else {plot_colour <- col}
points(q, sorted_data,
col=plot_colour,
type=type,
pch=pch,
cex=if(is.null(cex)) 1 else cex
)
returnValue <- list(x = q, y = sorted_data, order = data_order)
invisible(returnValue) # returned only if assigned.
} #end of else from #lineup
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.