dVis: Visual Inference for different Lineup Scenarios under a...

Description Usage Arguments Details Value References Examples

View source: R/pvalues.r

Description

Density, distribution function and quantiles for visual inference scenarios. Visual inference is used to determine significance of a visual finding. The lineup protocol (Buja et al., 2009) establishes a formal framework for testing graphical findings. The package nullabor helps with the creation of lineups using various null generation models. Here, we provide functions to evaluate results.

Usage

1
2
3
4
5
dVis(x, K, m = 20, alpha, scenario = 3)

pVis(x, K, m = 20, alpha, scenario = 3, lower.tail = TRUE)

qVis(p, K, m = 20, alpha, scenario = 3)

Arguments

x

vector, number of data identifications,

K

positive value, number of evaluations of the lineup,

m

number of panels in the lineup,

alpha

positive value, rate parameter of the flat Dirichlet distribution,

scenario

integer value.

lower.tail

defaults to TRUE, if TRUE probabilities are P(X ≤q x), otherwise, P(X ≥q x). Note that the second probability is a deviation from R standard: usually P(X > x) is returned. However, here, returning P(X ≥q x) is more useful in an inference setting, as it corresponds to the p value.

p

(vector of) probabilities,

Details

When administering visual tests, we distinguish between three different scenarios:

Under scenario 3, the number of data picks under the null hypothesis that the data plot is visually not more salient than one of the null plots is distributed according to a ratio of Beta functions:

P (X = x) = {K \choose x} \frac{B(x + α, K-x+(m-1)α)}{B(α, (m-1)α)}

where B(.,.) is the Beta function, α > 0 is the rate of a flat Dirichlet distribution, K is the number of times the lineup has been evaluated, x number of times the data plot has been picked as the visually most interesting, m is the number of panels in a lineup (the lineup size).

For large values of alpha, scenario 3 converges to scenario 1.

Value

The functions return (a vector of) quantiles or probabilities for P(X = x), P(X ≤q x).

References

Andreas Buja, Dianne Cook, Heike Hofmann, Michael Lawrence, Eun-Kyung Lee, Deborah F. Swayne and Hadley Wickham, Statistical inference for exploratory data analysis and model diagnostics. Phil. Trans. R. Soc. A. 367: 4361-4383, 2009, https://doi.org/10.1098/rsta.2009.0120

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# Probabilities to see between 5 and 10 data identifications
# in lineup of size 20, with 15 evaluations, and an estimated 
# alpha of 0.1
dVis(x=5:10, K= 15, m=20, alpha = 0.1)

dframe <- data.frame(
  x=0:15, 
  probabilities = c(dVis(0:15, K=15, alpha = 0.1),
                    dVis(0:15, K=15, alpha = 1),
                    dVis(0:15, K=15, alpha = 5)),
  alpha = factor(rep(c(0.1, 1, 5), each = 16))                  
)
library(ggplot2)
ggplot(data = dframe, aes(x = x, y = probabilities, colour = alpha)) +
  geom_point() 
  
 # how many data picks do we need in a lineup of size m
 # with K = 30 evaluations and alpha = 0.1 to achieve a 
 # significance at 10%, 5% or 1%?
 qVis(p = c(0.9, 0.95, 0.99), K = 30, m=20, alpha = 0.1)

heike/vinference documentation built on Oct. 17, 2020, 7:08 a.m.