clt: Central Limit Theorem (CLT)

View source: R/central_limit_theorem.R

cltR Documentation

Central Limit Theorem (CLT)

Description

A movie to illustrate the ideas of the sampling distribution of a mean and the central limit theorem.

Usage

clt(
  n = 20,
  distn,
  params = list(),
  panel_plot = TRUE,
  hscale = NA,
  vscale = hscale,
  n_add = 1,
  delta_n = 1,
  arrow = TRUE,
  leg_cex = 1.25,
  ...
)

Arguments

n

An integer scalar. The size of the samples drawn from the distribution chosen using distn.

distn

A character scalar specifying the distribution from which observations are sampled. Distributions "beta", "binomial", "chisq", "chi-squared", "exponential", "f", "gamma", "geometric", "gev", "gp", "hypergeometric", "lognormal", "log-normal", "negative binomial", "normal", "poisson", "t", "uniform" and "weibull" are recognised, case being ignored.

If distn is not supplied then distn = "exponential" is used.

The "gev" and "gp" cases use the gev and gp distributional functions in the revdbayes package.

The other cases use the distributional functions in the stats-package. If distn = "gamma" then the (shape, rate) parameterisation is used. If scale is supplied via params then rate is inferred from this. If distn = "negative binomial" then the (size, prob) parameterisation is used. If mu is supplied via params then prob is inferred from this (and size). If distn = "beta" then ncp is forced to be zero.

params

A named list of additional arguments to be passed to the density function associated with distribution distn. The (shape, rate) parameterisation is used for the gamma distribution (see GammaDist) even if the value of the scale parameter is set using params.

If a parameter value is not supplied then the default values in the relevant distributional function set using distn are used, except for "beta" (shape1 = 2, shape2 = 2), "chisq" (df = 4), "f" (df1 = 4, df2 = 8), "gev" (shape = 0.2). "gamma" (shape = 2, "gp" (shape = 0.1), "poisson" (lambda = 5) and "t" (df = 4) and "weibull" (shape = 2).

panel_plot

A logical parameter that determines whether the plot is placed inside the panel (TRUE) or in the standard graphics window (FALSE). If the plot is to be placed inside the panel then the tkrplot library is required.

hscale, vscale

Numeric scalars. Scaling parameters for the size of the plot when panel_plot = TRUE. The default values are 1.4 on Unix platforms and 2 on Windows platforms.

n_add

An integer scalar. The number of simulated datasets to add to each new frame of the movie.

delta_n

A numeric scalar. The amount by which n is increased (or decreased) after one click of the + (or -) button in the parameter window.

arrow

A logical scalar. Should an arrow be included to show the simulated sample mean from the top plot being placed into the bottom plot?

leg_cex

The argument cex to legend. Allows the size of the legend to be controlled manually.

...

Additional arguments to the rpanel functions rp.button and rp.doublebutton, not including panel, variable, title, step, action, initval, range.

Details

Loosely speaking, a consequence of the Central Limit Theorem is that the mean of a large number of independent and identically distributed random variables, each with mean \mu and finite standard deviation \sigma, has approximately a normal distribution, even if these original variables are not normally distributed.

This movie considers examples where this limiting result holds and illustrates graphically the closeness of the limiting approximation provided by the relevant normal limit to the true finite-n distribution. Of course, when distn = "normal" this result is exact.

Samples of size n are repeatedly simulated from the distribution chosen using distn. These samples are summarized using a plot that appears at the top of the movie screen. For each sample the mean of these n values is calculated, stored and added to another plot, situated below the first plot. This plot is either a histogram or an empirical c.d.f., chosen using a radio button. A rug is added to a histogram provided that it contains no more than 1000 points.

The p.d.f. (for a continuous variable) or p.m.f. (for a discrete variable) of the original variables is added to the top plot.

Once it starts, four aspects of this movie are controlled by the user.

  • There are buttons to increase (+) or decrease (-) the sample size, that is, the number of values over which a mean is calculated.

  • Each time the button labelled "simulate another n_add samples of size n" is clicked n_add new samples are simulated and their sample mean are added to the bottom histogram.

  • There is a button to switch the bottom plot from displaying a histogram of the simulated means and the limiting normal p.d.f. to the empirical c.d.f. of the simulated data and the limiting normal c.d.f.

  • There is a checkbox to add to the bottom plot the approximate (large n) normal p.d.f./c.d.f. (with mean \mu and standard deviation \sigma / \sqrt{n}), implied by the CLT.

Value

Nothing is returned, only the animation is produced.

See Also

movies: a user-friendly menu panel.

smovie: general information about smovie.

cltq: Central Limit Theorem for sample quantiles.

Examples

# Exponential data
clt()

# Uniform data
clt(distn = "uniform")

# Poisson data
clt(distn = "poisson")

smovie documentation built on May 29, 2024, 10:28 a.m.