knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
$$Outcome_i = (Model) + error_i$$
apply
family. apply()
options(scipen = 20) #dataset, rows/columns, function round(apply(quakes, 2, mean), 2)
lapply()
For Example:
lapply(quakes, mean)
sapply()
round(sapply(quakes, mean),2)
tapply()
For Example:
tapply(quakes$mag, #dependent variable list(quakes$stations), #independent variable(s) mean) #function
We can start by thinking about raw deviations:
$$SD^2 = \frac {\sum_{i=1}^{n}(x_{i} - \bar{x})^2} {n}$$
$$ MSE = \frac {SS} {df} = \frac {\sum_{i=1}^{n}(outcome_i-model_i)^2} {N-1} $$
Interpreting SD as a measure of model fit:
knitr::include_graphics("pictures/introDA3/standard_error.png")
Smart people have shown with Monte Carlos how these things work, which lead to the Central Limit Theorem and the Law of Large Numbers
psych::describe(quakes$mag)
Definition - a confidence interval is a range of values between which the value of the population parameter is believed to be, along with a probability that the interval correctly estimates the true (unknown) population parameter.
knitr::include_graphics("pictures/introDA3/confidence_intervals.png")
We've already talked about how +/- 1.96 and +/- 2.58 are the Z-score cut offs for 95% and 99%.
M <- apply(quakes, 2, mean) SE <- apply(quakes, 2, function(x){ sd(x)/sqrt(length(x)) }) M + 1.96*SE # 95% confidence interval M M - 1.96*SE
The Null Hypothesis Significance Testing involves drawing inferences about two contrasting propositions (each called a hypothesis) relating to the value of one or more population parameters.
Using sample data, we either:
Notice, we do not accept the null or accept the research!
Does performing an NHST tells us:
The importance of an effect?
That the null hypothesis is false?
That the null hypothesis is true?
Another problem with NHST is that it encourages all or nothing thinking.
$$Test Stat = \frac {signal}{noise} = \frac {model variance}{model error} = \frac {effect} {error}$$
knitr::include_graphics("pictures/introDA3/one-tailed-vs-two-tailed-test.jpg")
Hypothesis testing can result in one of four different outcomes:
knitr::include_graphics("pictures/introDA3/hypo_error_chart.png")
Use corrections
Family wise, or other known as family-wise error rate is the probability of making one or more false discoveries, or type I errors when performing hypotheses tests.
Experiment wise, or experiment-wise error rate is the proportion of experiments in which one or more Type I errors occur.
Example corrections
Power is influenced by:
Effect size,
Alpha
Type of test
Sample size
Definition - an effect size is a standardized measure of the size of an effect:
There are several effect size measures that can be used:
r = .1, d = .2 (small effect):
r = .3, d = .5 (medium effect):
r = .5, d = .8 (large effect):
Beware of these 'canned' effect sizes though:
library(MOTE) M <- tapply(quakes$mag, quakes$stations, mean) STDEV <- tapply(quakes$mag, quakes$stations, sd) N <- tapply(quakes$mag, quakes$stations, length) head(M) #compare station 10 to 11 effect <- d.ind.t(M[1], M[2], STDEV[1], STDEV[2], N[1], N[2], a = .05) effect$d
In this section, you've learned about:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.