$\$
# install.packages("latex2exp") library(latex2exp) options(scipen=999) knitr::opts_chunk$set(echo = TRUE) set.seed(123)
# get some images that are used in this document SDS230::download_image("which_are_prob_densities.png") SDS230::download_image("area_pdf.png") SDS230::download_image("probability_area.png") SDS230::download_image("Combined_Cumulative_Distribution_Graphs.png") # get some images and data that are used in this document SDS230::download_image("student_t.png") SDS230::download_data("amazon.rda") SDS230::download_data("gingko_RCT.rda") SDS230::download_data("alcohol.rda")
$\$
$\$
Let's us reexamine the randomized controlled trial experiment by Solomon et al (2002) to see if there is evidence that taking a gingko pills affects cognition.
$H_0: \mu_{gingko} - \mu_{placebo} = 0$
$H_A: \mu_{gingko} - \mu_{placebo} \ne 0$
$\alpha = 0.05$
$\$
load("gingko_RCT.rda") # plot the data boxplot(gingko, placebo, names = c("Gingko", "Placebo"), ylab = "Memory score") # create a stripchart data_list <- list(gingko, placebo) stripchart(data_list, group.names = c("Gingko", "Placebo"), method = "jitter", xlab = "Memory score", col = c("red", "blue"))
$\$
The formula for a t-statistic is:
$$t = \frac{\bar{x}_t - \bar{x}_c}{\sqrt{\frac{s^2_t}{n_t} + \frac{s^2_c}{n_c}}}$$
$\$
# combine the data from the treatment and control groups together # use a for loop to create shuffled treatment and control groups and shuffled statistics # plot the null distribution as a histogram
$\$
# plot the null distribution again with a red line a the value of the observed statistic # calculate the p-value
$\$
When we used a statistic of $\bar{x}_t - \bar{x}_c$ in our randomization test in class 7 we got a p-value of 0.127. How do these results compare?
$\$
Let's explore probability functions in R...
R has built in functions to generate data from different distributions. All these functions start with the letter r.
We can set the random number generator seed to always get the same sequence of random numbers.
Let's get a sample of n = 200 random points from the uniform distribution using runif()
# set the seed to a specific number to always get the same sequence of random numbers set.seed(530) # generate n = 100 points from U(0, 1) using runif() function # plot a histogram of these random numbers
There are many other distributions we can get random numbers from including:
rnorm()rexp()rbinom()And many more!
The first argument to all these functions is the number of random points you want to generate (n) and then there are additional arguments that can be used to control the shape of the distribution (i.e., that set the "parameters" of the distribution),
# generate n = 1000 points from standard normal distribution N(0, 1) # plot a histogram of these random numbers
$\$
$\$
Probability density functions can be used to model random events. All probability density functions, f(x), have these properties:
Which of the following are probability density functions?

$\$
For continuous (quantitative) data, we use density function f(x) to find the probability (e.g., the long run frequency) that a random number X is between two values a and b using:
$P(a < X < b) = \int_{a}^{b}f(x)dx$

$\$
$\$
If we want to plot the true probability density function for the standard uniform distribution U(0, 1) we can use the dunif() function. All density function in base R start with d.
# the x-value domain for the density function f(x) # plot the probability density function
Question: Can you create a density plot for the standard normal distribution?
$\$
Cumulative probability distribution functions give us the probability of getting a random number X that is less than (or equal to) a particular value x; i.e., they give us $P(X \le x)$. For example, they could be used to give us the probability that a random number will be less than 2: $P(X \le 2)$.
Cumulative probability distribution functions are obtained by integrating a probability density function:
$P(X \le x) = F_X(x) = \int_{-\infty}^x f(x)dx$
where f(x) is a probability density function and $F_X(x)$ is the cumulative distribution function.

$\$
To get the values that a random number X is less than a particular value x using R, we can use a series of functions that start with the letter d.
For example, to get the probability a random number X generated from the standard uniform distribution U(0, 1) will be less than .25; i.e., $P(X \le .25)$ we can use dunif().
$\$
Let's redo this analysis using a parametric probability distribution, which in this case is the t-distribution. The same 5 steps of hypothesis testing apply here as well!
$\$
Same as before...
$H_0: \mu_{gingko} - \mu_{placebo} = 0$
$H_A: \mu_{gingko} - \mu_{placebo} > 0$
$\alpha = 0.05$
$\$
Same as before:
$$t = \frac{\bar{x}_t - \bar{x}_c}{\sqrt{\frac{s^2_t}{n_t} + \frac{s^2_c}{n_c}}}$$
$\$
We will now use a parametric t-distribution (i.e., density function) as a null distribution. The t-distribution has one parameter called "degrees of freedom". We will set this parameter as the minimum of $n_t - 1$ or $n_c - 1$.
What are the degrees of freedom for this study?
Let's visualize the t-distribution
# get the degrees of freedom # visualize the t-distribution density curve using the dt() function # how does this compare to our t-distribution created by shuffling?
$\$
We can get $P(X < stat)$ for a t-distribution using the pt() function.
$\$
How does our p-value and decision compare to the p-value decision we got from the permutation test?
$\$
We can use the built in t.test() function to run a t-test as well.
Note: If you want to run one-tailed tests you can use the extra argument alternative argument.
Why is the p-value slightly different than what we got when we used the pt() function?
$\$

$\$
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.