Do a Visual test of a null hypothesis by choosing the graph that does not belong.

Share:

Description

These functions help in creating a set of plots based on the real data and some modification that makes the null hypothesis true. The user then tries to choose which graph represents the real data.

Usage

1
2
3
4
5
6
7
vis.test(..., FUN, nrow=3, ncol=3, npage=3, data.name = "", alternative)
vt.qqnorm(x, orig=TRUE)
vt.normhist(x, ..., orig=TRUE)
vt.scatterpermute(x, y, ..., orig=TRUE)
vt.tspermute(x, type='l', ..., orig=TRUE)
vt.residpermute(model, ..., orig=TRUE)
vt.residsim(model, ..., orig=TRUE)

Arguments

...

data and arguments to be passed on to FUN or to plotting functions, see details below

FUN

The function to create the plots on the original or null hypothesis data

nrow

The number of rows of graphs per page

ncol

The number of columns of graphs per page

npage

The number of pages to use in the testing

data.name

Optional character string for the name of the data in the output

alternative

Optional character string for the alternative hypothesis in the output

orig

Logical, should the original data be plotted, or data based on the null hypothesis

x

data or x-coordinates of the data

y

y-coordinates of the data

type

type of plot, passed on to plot function (use 'p' for points)

model

An lm object, or any model object for which fitted and resid return vectors

Details

The vis.test function will create a nrow by ncol grid of plots, one of which is based on the real (original) data and the others which are based on a null hypothesis simulation (a statistical "lineup"). The real plot is placed at random within the set. The user then clicks on their best guess of which plot is the real one (the most different from the others). If the null hypothesis is true for the real data, then this will be a guess with a 1/(nrow*ncol) probability of success. This process is then repeated for a total of npage times. A p-value is then constructed based on the number of correct guesses and the null hypothesis that there is a 1/(nrow*ncol) chance of guessing correct each time (this will work best if the person doing the choosing has not already seen plots/summaries of the data).

If the plotting function (FUN) is not passed as a named argument, then the first argument (in the ...) that is a function will be used. If no functions are passed then the function will stop with an error.

The plotting function (FUN) can be an existing function or a user supplied function. The function must have an argument named "orig" which indicates whether to plot the original data or the null hypothesis data. A new seed will be set before each call to FUN except when orig is TRUE. Inside the function if orig is TRUE then the function should plot the original data. When orig is FALSE then the function should do some form of simulation based on the data with the null hypothesis true and plot the simulated data (making sure to give no signs that it is different from the original plot).

The return object includes a list with the seeds set before each of the plots (NA for the original data plot) and a vector of the plots selected by the user. This information can be used to recreate the simulated plots by setting the seed then calling FUN.

The vt.qqnorm function tests the null hypothesis that a vector of data comes from a normal distribution (or at least pretty close) by creating a qqnorm plot of the original data, or the same plot of random data from a normal distribution with the same mean and standard deviation as the original data.

The vt.normhist function tests the null hypothesis that a vector of data comes from a normal distribution (or at least pretty close) by plotting a histogram with a reference line representing a normal distribution of either the original data or a set of random data from a normal distribution with the same mean and standard deviation as the original.

The vt.scatterpermute function tests the null hypothesis of "no relationship" between 2 vectors of data. When orig is TRUE the function creates a scatterplot of the 2 variables, when orig is FALSE the function first permutes the y variable randomly (making no relationship) then creates a scatter plot with the original x and permuted y variables.

The vt.tspermute function creates a time series type plot of a single vector against its index. When orig is false, the vector is permuted before plotting.

The vt.residpermute function takes a regression object (class lm, or any model type object for which fitted and resid return vectors) and does a residual plot of the fitted values on the x axis and residuals on the y axis. The loess smooth curve (scatter.smooth is the plotting function) and a reference line at 0 are included. When orig is FALSE the residuals are randomly permuted before being plotted.

The vt.residsim function takes a regression object (class lm, or any model type object for which fitted and resid return vectors) and does a residual plot of the fitted values on the x axis and residuals on the y axis. The loess smooth curve (scatter.smooth is the plotting function) and a reference line at 0 are included. When orig is FALSE the residuals are simulate from a normal distribution with mean 0 and standard deviation the same as the residuals.

Value

The vis.test function returns an object of class htest with the following components:

method

The string "Visual Test"

data.name

The name of the data passed to the function

statistic

The number of correct "guesses"

p.value

The p-value based on the number of correct "guesses"

nrow

The number of rows per page

ncol

The number of columns per page

npage

The number of pages

seeds

A list with 3 vectors containing the seeds set before calling FUN, the correct plot has an NA

selected

A vector of length npage indicating the number of the figure picked in each of the npage tries

The other functions are run for their side effects and do not return anything meaningful.

Warning

The p-value is based on the assumption that under the null hypothesis there is a 1/(nrow*ncol) chance of picking the correct plot and that the npage choices are independent of each other. This may not be true if the user is familiar with the data or remembers details of the plot between picks.

Author(s)

Greg Snow 538280@gmail.com

References

Buja, A., Cook, D. Hofmann, H., Lawrence, M. Lee, E.-K., Swayne, D.F and Wickham, H. (2009) Statistical Inference for exploratory data analysis and model diagnostics Phil. Trans. R. Soc. A 2009 367, 4361-4383 doi: 10.1098/rsta.2009.0120

See Also

set.seed

Examples

1
2
3
4
5
6
7
if(interactive()) {
  x <- rexp(25, 1/3)
  vis.test(x, vt.qqnorm)

  x <- rnorm(100, 50, 3)
  vis.test(x, vt.normhist)
}

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.