$\$
# install.packages("latex2exp") library(latex2exp) options(scipen=999) knitr::opts_chunk$set(echo = TRUE) set.seed(123)
# get some images that are used in this document SDS230::download_image("which_are_prob_densities.png") SDS230::download_image("area_pdf.png") SDS230::download_image("probability_area.png") SDS230::download_image("Combined_Cumulative_Distribution_Graphs.png") # get some images and data that are used in this document SDS230::download_image("student_t.png") SDS230::download_data("amazon.rda") SDS230::download_data("gingko_RCT.rda") SDS230::download_data("alcohol.rda")
$\$
$\$
Let's us reexamine the randomized controlled trial experiment by Solomon et al (2002) to see if there is evidence that taking a gingko pills affects cognition.
$H_0: \mu_{gingko} - \mu_{placebo} = 0$
$H_A: \mu_{gingko} - \mu_{placebo} \ne 0$
$\alpha = 0.05$
$\$
load("gingko_RCT.rda") # plot the data boxplot(gingko, placebo, names = c("Gingko", "Placebo"), ylab = "Memory score") # create a stripchart data_list <- list(gingko, placebo) stripchart(data_list, group.names = c("Gingko", "Placebo"), method = "jitter", xlab = "Memory score", col = c("red", "blue"))
$\$
The formula for a t-statistic is:
$$t = \frac{\bar{x}_t - \bar{x}_c}{\sqrt{\frac{s^2_t}{n_t} + \frac{s^2_c}{n_c}}}$$
$\$
# combine the data from the treatment and control groups together # use a for loop to create shuffled treatment and control groups and shuffled statistics # plot the null distribution as a histogram
$\$
# plot the null distribution again with a red line a the value of the observed statistic # calculate the p-value
$\$
When we used a statistic of $\bar{x}_t - \bar{x}_c$ in our randomization test in class 7 we got a p-value of 0.127. How do these results compare?
$\$
Let's explore probability functions in R...
R has built in functions to generate data from different distributions. All these functions start with the letter r
.
We can set the random number generator seed to always get the same sequence of random numbers.
Let's get a sample of n = 200 random points from the uniform distribution using runif()
# set the seed to a specific number to always get the same sequence of random numbers set.seed(530) # generate n = 100 points from U(0, 1) using runif() function # plot a histogram of these random numbers
There are many other distributions we can get random numbers from including:
rnorm()
rexp()
rbinom()
And many more!
The first argument to all these functions is the number of random points you want to generate (n
) and then there are additional arguments that can be used to control the shape of the distribution (i.e., that set the "parameters" of the distribution),
# generate n = 1000 points from standard normal distribution N(0, 1) # plot a histogram of these random numbers
$\$
$\$
Probability density functions can be used to model random events. All probability density functions, f(x), have these properties:
Which of the following are probability density functions?
$\$
For continuous (quantitative) data, we use density function f(x) to find the probability (e.g., the long run frequency) that a random number X is between two values a and b using:
$P(a < X < b) = \int_{a}^{b}f(x)dx$
$\$
$\$
If we want to plot the true probability density function for the standard uniform distribution U(0, 1) we can use the dunif()
function. All density function in base R start with d
.
# the x-value domain for the density function f(x) # plot the probability density function
Question: Can you create a density plot for the standard normal distribution?
$\$
Cumulative probability distribution functions give us the probability of getting a random number X that is less than (or equal to) a particular value x; i.e., they give us $P(X \le x)$. For example, they could be used to give us the probability that a random number will be less than 2: $P(X \le 2)$.
Cumulative probability distribution functions are obtained by integrating a probability density function:
$P(X \le x) = F_X(x) = \int_{-\infty}^x f(x)dx$
where f(x)
is a probability density function and $F_X(x)$ is the cumulative distribution function.
$\$
To get the values that a random number X is less than a particular value x using R, we can use a series of functions that start with the letter d
.
For example, to get the probability a random number X generated from the standard uniform distribution U(0, 1) will be less than .25; i.e., $P(X \le .25)$ we can use dunif()
.
$\$
Let's redo this analysis using a parametric probability distribution, which in this case is the t-distribution. The same 5 steps of hypothesis testing apply here as well!
$\$
Same as before...
$H_0: \mu_{gingko} - \mu_{placebo} = 0$
$H_A: \mu_{gingko} - \mu_{placebo} > 0$
$\alpha = 0.05$
$\$
Same as before:
$$t = \frac{\bar{x}_t - \bar{x}_c}{\sqrt{\frac{s^2_t}{n_t} + \frac{s^2_c}{n_c}}}$$
$\$
We will now use a parametric t-distribution (i.e., density function) as a null distribution. The t-distribution has one parameter called "degrees of freedom". We will set this parameter as the minimum of $n_t - 1$ or $n_c - 1$.
What are the degrees of freedom for this study?
Let's visualize the t-distribution
# get the degrees of freedom # visualize the t-distribution density curve using the dt() function # how does this compare to our t-distribution created by shuffling?
$\$
We can get $P(X < stat)$ for a t-distribution using the pt()
function.
$\$
How does our p-value and decision compare to the p-value decision we got from the permutation test?
$\$
We can use the built in t.test()
function to run a t-test as well.
Note: If you want to run one-tailed tests you can use the extra argument alternative
argument.
Why is the p-value slightly different than what we got when we used the pt()
function?
$\$
$\$
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.