knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
cannonball bundles a couple of functions that I use when teaching introductory courses in quantitative methodology and statistics.
You can install cannonball from Github with:
library(devtools) install_github("janhove/cannonball")
# Load the package library(cannonball)
plot_r()
: Draw different scatterplots with the same correlation coefficientA single correlation coefficient can correspond to any number of scatterplots.
plot_r()
produces 16 such scatterplots for any given Pearson correlation coefficient to drive this point home.
plot_r(r = 0.5, n = 35)
For more info, see ?plot_r
.
Accompanying blog post: What data patterns can lie behind a correlation coefficient?
To help students see the connection between an experiment's design and its analysis, I've written two functions.
walkthrough_p()
guides the user through a completely randomised
experiment: Data points are generated and randomly assigned
to either the control or intervention condition. Then, the
intervention effect is added to the data points in the intervention
condition. Finally, the data are analysed using a Student's t-test
or, if pedant = TRUE
, a permutation test.
# see ?walkthrough_p walkthrough_p(n = 18, diff = 0.3, sd = 1, pedant = TRUE)
walkthrough_blocking()
works similarly to walkthrough_p()
but describes a randomised block design: Prior information about
the data points is available in the form of a covariate (e.g.,
a pretest score). This information is used to group participants
into 'blocks', and the randomisation is restricted in that one
participant per block is assigned to the control and one to the
intervention condition. Crucially, the analysis needs to take this
restricted randomisation into account.
# ?walkthrough_blocking walkthrough_blocking(n = 12, diff = 0.4, sd = 1)
An alternative, more powerful, but less easily explained analysis would use the blocking covariate as a control variable in a general linear model rather than the blocking factor.
The data from experiments in which entire clusters
of participants (e.g., classes) are assigned to the
experimental conditions can't be analysed in the same
way as data from experiments in which the participants
are assigned to the conditions individually. clustered_data()
generates data for a cluster-randomised experiment and can
be used to demonstrate the increased Type-I error rate
if such data are analysed using t-tests on the individual outcomes.
# Simulate clustered data with an ICC of 0.3 d <- clustered_data(ICC = 0.3) # Plot library(ggplot2) ggplot(data = d, aes(x = reorder(class, outcome, FUN = median), y = outcome)) + geom_boxplot(outlier.shape = NA) + geom_point(shape = 1, position = position_jitter(width = 0.1, height = 0)) + facet_wrap(~ condition, scales = "free_x") + xlab("class")
A covariate can also be created. For more info, see ?clustered_data
.
Accompanying article: Vanhove, Jan. 2015. Analyzing randomized controlled interventions: Three notes for applied linguists. Studies in Second Language Learning and Teaching 5(1). 135–152. (See the correction note for this paper.)
Accompanying blog post: Experiments with intact groups: spurious significance with improperly weighted t-tests.
These functions may be helpful for helping you to judge whether your data conform to the assumptions of your statistical model. By embedding the model's diagnostic plot in a line-up of diagnostic plots of simulated data for which the model's assumptions are literally met, analysts can more easily determine whether any blips in these plots are indicative of assumption violations or whether they can plausibly be accounted for by sampling error/noise. See Buja et al. (2009).
# Fit model m <- lm(mpg ~ wt, data = mtcars) # Create parade/line-up my_parade <- parade(m) # Check for residual nonlinearities: # Which plot looks most different from the rest? lin_plot(my_parade) # Reveal position of actual diagnostic plot reveal(my_parade)
Related functions are:
var_plot()
: Check for non-constant variance in the residuals.norm_qq()
: Check for non-normality of the residuals (quantile--quantile plot).norm_hist()
: Check for non-normality of the residuals (histogram).parade_summary()
: Summarise residuals per cell (combination of unique predictor values).Accompanying article: Vanhove, Jan. 2018. Checking the assumptions of your statistical model without getting paranoid. Preprint on PsyArxiv.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.