knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
A way of predicting the value of one variable from other variables.
$$Y_i = b_0 + b_1X_i + \varepsilon_i$$
What is $b_i$?
What is $b_0$?
knitr::include_graphics("pictures/regression/5.png")
Simple Linear Regression = SLR
Multiple Linear Regression = MLR
MLR types include:
Is my overall model (i.e., the regression equation) useful at predicting the outcome variable?
How useful are each of the individual predictors for my model?
However, we can think about the hypotheses for the overall test being:
Generally, this form does not include one tailed tests because the math is squared, so it is impossible to get negative values in the statistical test.
We are trying to predict each person's score $Y_i$ by:
knitr::include_graphics("pictures/regression/7.png")
knitr::include_graphics("pictures/regression/6.png")
We test the individual predictors with a t-test:
Therefore, we might use the following hypotheses:
Or, we could use a directional test, since the test statistic t can be negative:
$df = N - k - 1$
b = unstandardized regression coefficient
$\beta$ = standardized regression coefficient
b or $\beta$?:
Mental Health and Drug Use:
library(rio) master <- import("data/regression_data.sav") master <- master[ , c(8:11)] str(master)
summary(master) nomiss <- na.omit(master) nrow(master) nrow(nomiss)
In this section, we will add a few new outlier checks:
Because we are using regression as our model, we may consider using multiple checks before excluding outliers.
mahalanobis()
function we have used previously.mahal <- mahalanobis(nomiss, colMeans(nomiss), cov(nomiss)) cutmahal <- qchisq(1-.001, ncol(nomiss)) badmahal <- as.numeric(mahal > cutmahal) ##note the direction of the > table(badmahal)
lm()
function with our regression formula.Y ~ X + X
: Y is approximated by X plus X.model1 <- lm(CESD_total ~ PIL_total + AUDIT_TOTAL_NEW + DAST_TOTAL_NEW, data = nomiss)
How do we calculate how much change is bad?
k <- 3 ##number of IVs leverage <- hatvalues(model1) cutleverage <- (2*k+2) / nrow(nomiss) badleverage <- as.numeric(leverage > cutleverage) table(badleverage)
How do we calculate how much change is bad?
cooks <- cooks.distance(model1) cutcooks <- 4 / (nrow(nomiss) - k - 1) badcooks <- as.numeric(cooks > cutcooks) table(badcooks)
##add them up! totalout <- badmahal + badleverage + badcooks table(totalout) noout <- subset(nomiss, totalout < 2)
model2 <- lm(CESD_total ~ PIL_total + AUDIT_TOTAL_NEW + DAST_TOTAL_NEW, data = noout)
summary(model2, correlation = TRUE)
standardized <- rstudent(model2) fitted <- scale(model2$fitted.values)
{qqnorm(standardized) abline(0,1)}
hist(standardized)
{plot(fitted, standardized) abline(0,0) abline(v = 0)}
If your assumptions go wrong:
summary(model2) library(papaja) apa_style <- apa_print(model2) apa_style$full_result$modelfit
r apa_style$full_result$modelfit
summary(model2)
apa_style$full_result$PIL_total apa_style$full_result$AUDIT_TOTAL_NEW apa_style$full_result$DAST_TOTAL_NEW
r apa_style$full_result$PIL_total
r apa_style$full_result$AUDIT_TOTAL_NEW
r apa_style$full_result$DAST_TOTAL_NEW
Two concerns:
QuantPsyc
package for $\beta$ values. library(QuantPsyc) lm.beta(model2)
knitr::include_graphics("pictures/regression/19.png")
knitr::include_graphics("pictures/regression/19.png")
knitr::include_graphics("pictures/regression/19.png")
We would add these to our other reports:
r apa_style$full_result$PIL_total
, $pr^2 = .30$r apa_style$full_result$AUDIT_TOTAL_NEW
, $pr^2 < .01$r apa_style$full_result$DAST_TOTAL_NEW
, $pr^2 < .01$library(ppcor) partials <- pcor(noout) partials$estimate^2
Answers the following questions:
Uses:
We can use dummy coding to convert our variable into predictive comparisons.
knitr::include_graphics("pictures/regression/21.png")
IVs:
DV:
hdata <- import("data/dummy_code.sav") str(hdata)
attributes(hdata$treat) hdata$treat <- factor(hdata$treat, levels = 0:4, labels = c("No Treatment", "Placebo", "Paxil", "Effexor", "Cheerup"))
Model 1: Controls for family history before testing if treatment is significant
model1 <- lm(after ~ familyhistory, data = hdata) summary(model1)
Remember you have to leave in the family history variable or you aren't actually controlling for it.
model2 <- lm(after ~ familyhistory + treat, data = hdata) summary(model2)
anova()
function.anova(model1, model2)
Remember dummy coding equals:
b values can be hard to interpret, so using emmeans
can help us understand the results.
summary(model2) library(emmeans) emmeans(model2, "treat")
pcor
code on our categorical variables. What can we do to calculate? model_summary <- summary(model2) t_values <- model_summary$coefficients[ , 3] df_t <- model_summary$df[2] t_values^2 / (t_values^2+df_t)
pwr
library to calculate the required sample size for any particular effect size. library(pwr) R2 <- model_summary$r.squared f2 <- R2 / (1-R2) R2 f2
u
is degrees of freedom for the model, first value in the F-statisticv
is degrees of freedom for error, but we are trying to figure out sample size for each condition, so we leave this one blank. f2
is the converted effect size.sig.level
is our $\alpha$ valuepower
is our power level#f2 is cohen f squared pwr.f2.test(u = model_summary$df[1], v = NULL, f2 = f2, sig.level = .05, power = .80)
In this lecture, we've outlined:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.