# vis.test: Do a Visual test of a null hypothesis by choosing the graph... In TeachingDemos: Demonstrations for Teaching and Learning

## Description

These functions help in creating a set of plots based on the real data and some modification that makes the null hypothesis true. The user then tries to choose which graph represents the real data.

## Usage

 ```1 2 3 4 5 6 7``` ```vis.test(..., FUN, nrow=3, ncol=3, npage=3, data.name = "", alternative) vt.qqnorm(x, orig=TRUE) vt.normhist(x, ..., orig=TRUE) vt.scatterpermute(x, y, ..., orig=TRUE) vt.tspermute(x, type='l', ..., orig=TRUE) vt.residpermute(model, ..., orig=TRUE) vt.residsim(model, ..., orig=TRUE) ```

## Arguments

 `...` data and arguments to be passed on to `FUN` or to plotting functions, see details below `FUN` The function to create the plots on the original or null hypothesis data `nrow` The number of rows of graphs per page `ncol` The number of columns of graphs per page `npage` The number of pages to use in the testing `data.name` Optional character string for the name of the data in the output `alternative` Optional character string for the alternative hypothesis in the output `orig` Logical, should the original data be plotted, or data based on the null hypothesis `x` data or x-coordinates of the data `y` y-coordinates of the data `type` type of plot, passed on to plot function (use 'p' for points) `model` An `lm` object, or any model object for which `fitted` and `resid` return vectors

## Details

The `vis.test` function will create a `nrow` by `ncol` grid of plots, one of which is based on the real (original) data and the others which are based on a null hypothesis simulation (a statistical "lineup"). The real plot is placed at random within the set. The user then clicks on their best guess of which plot is the real one (the most different from the others). If the null hypothesis is true for the real data, then this will be a guess with a 1/(`nrow`*`ncol`) probability of success. This process is then repeated for a total of `npage` times. A p-value is then constructed based on the number of correct guesses and the null hypothesis that there is a 1/(`nrow`*`ncol`) chance of guessing correct each time (this will work best if the person doing the choosing has not already seen plots/summaries of the data).

If the plotting function (`FUN`) is not passed as a named argument, then the first argument (in the ...) that is a function will be used. If no functions are passed then the function will stop with an error.

The plotting function (`FUN`) can be an existing function or a user supplied function. The function must have an argument named "orig" which indicates whether to plot the original data or the null hypothesis data. A new seed will be set before each call to `FUN` except when `orig` is `TRUE`. Inside the function if `orig` is `TRUE` then the function should plot the original data. When `orig` is `FALSE` then the function should do some form of simulation based on the data with the null hypothesis true and plot the simulated data (making sure to give no signs that it is different from the original plot).

The return object includes a list with the seeds set before each of the plots (`NA` for the original data plot) and a vector of the plots selected by the user. This information can be used to recreate the simulated plots by setting the seed then calling `FUN`.

The `vt.qqnorm` function tests the null hypothesis that a vector of data comes from a normal distribution (or at least pretty close) by creating a `qqnorm` plot of the original data, or the same plot of random data from a normal distribution with the same mean and standard deviation as the original data.

The `vt.normhist` function tests the null hypothesis that a vector of data comes from a normal distribution (or at least pretty close) by plotting a histogram with a reference line representing a normal distribution of either the original data or a set of random data from a normal distribution with the same mean and standard deviation as the original.

The `vt.scatterpermute` function tests the null hypothesis of "no relationship" between 2 vectors of data. When `orig` is `TRUE` the function creates a scatterplot of the 2 variables, when `orig` is `FALSE` the function first permutes the y variable randomly (making no relationship) then creates a scatter plot with the original x and permuted y variables.

The `vt.tspermute` function creates a time series type plot of a single vector against its index. When `orig` is false, the vector is permuted before plotting.

The `vt.residpermute` function takes a regression object (class lm, or any model type object for which `fitted` and `resid` return vectors) and does a residual plot of the fitted values on the x axis and residuals on the y axis. The loess smooth curve (`scatter.smooth` is the plotting function) and a reference line at 0 are included. When `orig` is `FALSE` the residuals are randomly permuted before being plotted.

The `vt.residsim` function takes a regression object (class lm, or any model type object for which `fitted` and `resid` return vectors) and does a residual plot of the fitted values on the x axis and residuals on the y axis. The loess smooth curve (`scatter.smooth` is the plotting function) and a reference line at 0 are included. When `orig` is `FALSE` the residuals are simulate from a normal distribution with mean 0 and standard deviation the same as the residuals.

## Value

The `vis.test` function returns an object of class `htest` with the following components:

 `method` The string "Visual Test" `data.name` The name of the data passed to the function `statistic` The number of correct "guesses" `p.value` The p-value based on the number of correct "guesses" `nrow` The number of rows per page `ncol` The number of columns per page `npage` The number of pages `seeds` A list with 3 vectors containing the seeds set before calling `FUN`, the correct plot has an `NA` `selected` A vector of length `npage` indicating the number of the figure picked in each of the `npage` tries

The other functions are run for their side effects and do not return anything meaningful.

## Warning

The p-value is based on the assumption that under the null hypothesis there is a 1/(`nrow`*`ncol`) chance of picking the correct plot and that the `npage` choices are independent of each other. This may not be true if the user is familiar with the data or remembers details of the plot between picks.

## Author(s)

Greg Snow [email protected]

## References

Buja, A., Cook, D. Hofmann, H., Lawrence, M. Lee, E.-K., Swayne, D.F and Wickham, H. (2009) Statistical Inference for exploratory data analysis and model diagnostics Phil. Trans. R. Soc. A 2009 367, 4361-4383 doi: 10.1098/rsta.2009.0120

`set.seed`

## Examples

 ```1 2 3 4 5 6 7``` ```if(interactive()) { x <- rexp(25, 1/3) vis.test(x, vt.qqnorm) x <- rnorm(100, 50, 3) vis.test(x, vt.normhist) } ```

TeachingDemos documentation built on May 29, 2017, 11:33 a.m.