rp.sample: Interactive exploration of sampling variation

View source: R/rp_sample.r

rp.sampleR Documentation

Interactive exploration of sampling variation

Description

Graphical exploration of variation in samples and sample means. The primary use of the function is in interactive mode, using a variety of controls for the display. Normal or binomial distributions can be used.

Usage

rp.sample(n, mu, sigma, distribution  = 'normal', shape = 0,
          panel = TRUE, nbins = 20, nbins.mean = 20,
          display, display.sample, display.mean, nsim = 50,
          show.out.of.range = TRUE, ggplot = TRUE,
          hscale = NA, vscale = hscale, pause = 0.01)

Arguments

n

the size of the sample. If this is missing, it is set to 25.

mu

the mean of the distribution from which samples are taken. If this is missing it is set to 5 for a normal distribution and 0.5 for a binomial distribution.

sigma

the standard deviation of the normal distribution. If this is missing it is set to 0.4.

distribution

a character value which determines whether a 'normal' or 'binomial' distribution is used.

shape

the shape parameter of the skew-normal distribution. When this is set to the default value of 0, samples are generated from a normal distribution. Setting non-zero values for this parameter gives some skewness to the distribution from which the data are sampled.

panel

a logical value which determines whether the function runs in interactive mode. See Details.

nbins

an integer value which sets the number of bins used in the data histograms.

nbins.mean

an integer value which sets the number of bins used in the histogram of the sample means.

display

a logical value which determines the form of graphical display used initially or in non-interactive mode. Possible values are 'histogram' (the default), 'density' or 'violin'.

display.sample

a logical vector which controls options for graphical display of the data, used initially or in non-interactive mode. See Details.

display.mean

a logical vector which controls options for graphical display of the sample means, used initially or in non-interactive mode. See Details.

nsim

an integer value which the number of accumulated mean values which are plotted when the function runs in non-interactive mode. See Details.

show.out.of.range

a logical value which controls whether observations lying beyond 3 standard deviations (for samples) or 3 standard errors (for sample means) are indicated. The scales of the plots are fixed at 3 standard deviations above and below the mean so that the axes are fixed for all samples.

ggplot

a logical value which controls whether ggplot graphics are used. If this is set to FALSE or the ggplot2 package is not available, functionality reverts to the standard graphics version available in rpanel 1.1-5 which has simpler functionality. The ggplot version of the function is recommended.

hscale, vscale

scaling parameters for the size of the plot when panel.plot is set to TRUE. The default values are 1.

pause

a time delay, in seconds, for the insertion of components into the control panel. The speed of some computing systems can create a panel which does not expand in time to contain all its components. The pause argument adds a short delay to each component to avoid this.

Details

When display is set to density or violin, density estimates are constructed using a bandwidth which is optimal for a normal distribution. For small samples this provides a stable and conservative estimate which is not unduly influenced by features which may well simply be due sampling variation. As the sample size increases, the estimate will still converge to the true density function.

When the size of the sample is less than 10, a histogram or density estimate is not a very effective display. This also causes issues of scaling the vertical axis. So in this case individual points are displayed instead.

The visual effect of the animation is assisted by holding the axes constant. This means that there may occasionally be observations outside the displayed horizontal range, or a histogram height which exceeds the displayed vertical range. This is denoted by a + symbol at the top of the relevant histogram bars. This issue can often be tackled by reducing the number of histogram bins.

When display is set to 'density' or 'violin', individual points are plotted, with a random vertical position. This is suppressed when the number of points exceeds 5000.

The display.sample and display.mean arguments control the details of what is displayed initially and, more usefully, when the function operates in non-interactive mode. Each argument is a logical vector with named values. display.sample has the default setting c(data = TRUE, population = FALSE, mean = FALSE, 'st.dev. scale' = FALSE) while the default for display.mean is c('sample mean' = FALSE, 'accumulate' = FALSE, 'se scale' = FALSE, 't-statistic' = FALSE, 'zoom' = FALSE, 'distribution' = FALSE). Any elements of these arguments which are not explicitly identified are set to the default values. In the case of binomial data the 'st.dev. scale' setting is disabled as it is not a helpful addition to the plot.

The principal use of the function is in interactive mode, when panel is set to TRUE. If panel is set to FALSE then interactive mode is switched off. In this case, if the ggplot2 package is available and the ggplot argument is set to TRUE, the function returns plots of a sample of data and of accumulated means as components plotdata and plotmean of the returned object. The number of accumulated means is set by the nsim argument.

If the ggplot2 package is not available, standard graphics are used with simpler display options, along the lines of the function provided in version 1.1-5 of the package.

Value

When the function operates in interactive mode, with panel set to TRUE, nothing is returned. When panel is set to FALSE, plots of a sample of data and of accumulated means are provided as components sample and mean of the returned object.

References

rpanel: Simple interactive controls for R functions using the tcltk package. Journal of Statistical Software, 17, issue 9.

Examples

## Not run: 
   rp.sample()

## End(Not run)

rpanel documentation built on March 12, 2026, 9:07 a.m.

Related to rp.sample in rpanel...