knitr::opts_chunk$set(echo = FALSE)
library(learnr)
hint_text <- function(text, text_color = "#E69F00"){
  hint <- paste("<font color='", text_color, "'>", text, "</font>", sep = "")
  return(hint)
}

Logistic Regression

Overview

This tutorial focuses on learning logistic regression. The tutorial includes a combination of videos, text, knowledge check quizzes, and exercises.

The PowerPoint slides for the presentation in the videos are on Canvas if you want a copy. For those not enrolled in my class, these files can be found here: https://osf.io/9tgxm/

The videos (as well as others) can also be found on my YouTube channel https://www.youtube.com/channel/UC5kDZTyHZlgSgSEa3YQXOig

Packages

This tutorial uses the following packages:

These packages should be automatically loaded within this tutorial (but you already installed learnr if you are here). If you are working outside of this tutorial (e.g., working with the data files and trying analysis in R) then you need to make sure that the necessary packages has been installed on your machine.

Data

All data files exist within this package, so we can simply call them without reference to a file location.

logistic2 These data represent 164 women and examine compliance with mammogram screening recommendations (i.e., did they get a mammogram?). The variables are comply coded as 0 = No, 1 = Yes (this is the outcome variables), phyrec also coded as 0 = No, 1 = Yes, representing whether their doctor recommended them for screening (i.e., did their doctor tell them to schedule a screening), knowledge a continuously scaled measure of mammogram knowledge, and two variables measuring mammograms perceptions - benefits and barriers.

admit These data examine Ph.D. program admissions. The variable admit is coded as 0 = No, 1 = Yes, representing whether or not the student was admitted (this is the outcome variable). The predictors are gre, gpa, and rank (ranking of the candidate from letter writers).

Both files are on the class' Google Share drive if you want to work with them outside of the tutorial. They are already loaded in this tutorial so you will not need to load the data as you work. (Note: If you are not in my class, these files can be found here: https://osf.io/9tgxm/).

Videos generally range from about 10 to 15 minutes long (there are five of them).

Video 1: Logistic Regression Basics and Terms

Quiz 1: Basics

learnr::quiz(
  learnr::question("When do we use Logistic Regression?",
    learnr::answer("When we have a dichotomous outcome variable", correct = TRUE),
    learnr::answer("When linear regression assumptions fail"),
    learnr::answer("When we have categorical predictors"),
    learnr::answer("When we have dichotomous predictors"),
    correct = "Correct!",
    incorrect = "Sorry, that's incorrect. Try again.",
    random_answer_order = TRUE,
    allow_retry = T
  ),
  learnr::question("Why do we use Logistic instead of regular linear regression?",
    learnr::answer("Because we are not examining linear relationships",correct=TRUE),
    learnr::answer("Because dichotomous predictors can't be normalized"),
    learnr::answer("To handle violations of linear regression assumptions"),
    correct = "Correct",
    random_answer_order = TRUE,
    incorrect = "Sorry, that's incorrect. Try again.",
    allow_retry = T
  ),
  learnr::question("For the *log odds ratio* a negative relationship is represented by ... ",
    learnr::answer("A negative coefficient", correct = TRUE),
    learnr::answer("A positive coefficient"),
    learnr::answer("Values below 1"),
    learnr::answer("Values above 1"),
    correct = "Correct!",
    incorrect = "Incorrect. Please try again.",
    random_answer_order = TRUE,
    allow_retry = T
  ),
  learnr::question("For the *odds ratio* a negative relationship is represented by ... ",
    learnr::answer("Values below 1", correct = TRUE),
    learnr::answer("A negative coefficient"),
    learnr::answer("A positive coefficient"),
    learnr::answer("Values above 1"),
    correct = "Correct!",
    incorrect = "Incorrect. Please try again.",
    random_answer_order = TRUE,
    allow_retry = T
  )
)

Examining our data

logistic2<-MVstats::logistic2
admit<-MVstats::admit
str(logistic2)

Paste the code into the run window below. The str command tells us what kind of data we have. Our primary interest here are variables that are categorical. They have to be explicitly recoded as factors with each level being given a name.


Of note here are that two of the variables are numeric but only have scores of 0 and 1. Those variables are physrec and comply. Assuming that 0 mean "No" and 1 means "Yes" we will need to define those variables as Factors and assign labels to the different factor levels. I added two variables noted as phyrec_F and comply_F that represent these are factors.

Coding the Factors

The code below demonstrates one way to recode a variable. The summary command provides a way to verify that the coding was done correctly.

logistic2$physrec_F<-factor(logistic2$physrec,
                          levels = c(0:1),
                          labels = c("No", "Yes"))
summary(logistic2$physrec_F)

Exercise 1: Variable Coding

Use the datafile admit (again note that these data are already loaded in your workspace). In this datafile , the outcome variable is called admit (yes both the data file and variable have the same name). admit will need to be recoded (0 = No, 1 = Yes). Run the summary on the new variable (call the new variable admit_F). If you get stuck, the solution button will provide the correct code.


admit$admit_F<-factor(admit$admit,
                          levels = c(0:1),
                          labels = c("No", "Yes"))
summary(admit$admit_F)

Video 2: Try it as Linear Regression

With Linear Regression Examples (a.k.a., why we can't use linear regression)

In this section, you will examine the code from the video and then do an exercise where you adapt that code to the admit data.

Note that this section is to demonstrate why we cannot use linear regression with dichotomous outcome (dependent) variables.

In the video, we started with using linear regression using this code. Note that I am using the variable that was numeric (comply) here as lm will not accept factors as dependent variables.

reg<-lm(comply~physrec,data=logistic2)
summary(reg)
plot(reg)

Bonus: Jittering the plot!

The code below pulls the predicted values from our analysis into the data set (line 2), calculates the residual (line 3), plots them with a jitter (line 3), and then add the line (line 4).

Jittering is useful when we have categorical values on a scatter plot as these values "pile up" - that is there might be 20 scores in the same place on the graph but we can't really see them because they are all on top of each other. The jitter function adds random noise to the values so we can better see the patterns.

reg<-lm(comply~physrec,data=logistic2)
logistic2$predicted<-round((reg$fitted.values),2)
logistic2$residuals<-logistic2$comply-logistic2$predicted
plot(logistic2$residuals ~ jitter(logistic2$predicted, 1), pch = 1)
abline(lm(logistic2$residuals~logistic2$predicted))

Exercise 2: Running linear regression on the admit data

For the admit data predict admit from GRE scores (name the model object ex2), summarize the model, and add plot the results (using the regular plot command). How do the residuals look? (Note: We don't need to jitter here as gre is a continuous predictor). The variable names are admit and gre.


ex2<-lm(admit~gre, data = admit)
summary(ex2)
plot(ex2)

Video 3: Running Logistic Regression with 1 Predictor

Quiz 2 First Steps

learnr::quiz(
  learnr::question("The first step in running logistic regression is often...",
    learnr::answer("Coding categorical variables as factors", correct = TRUE),
    learnr::answer("Graphing the residuals"),
    learnr::answer("Running linear regression"),
    learnr::answer("Converting from log odds to odds"),
    correct = "Correct!",
    incorrect = "Sorry, that's incorrect. zzzt - the experiment requires that you continue.",
    random_answer_order = TRUE,
    allow_retry = T
  )
)

Examples from the video

In the video, we saw the following logistic regression code. Recall that we needed to code the variables to be factors before running logistic. I did this and called the new variables comply_F and physrec_F.

Model1<-glm(comply_F~physrec_F, data=logistic2, family = binomial)
summary(Model1)

Use the window below to run the code to reproduce results from the video.


Next, convert the log odds ratios given in the summary to odds ratios.

exp(Model1$coefficients)
exp(confint(Model1))

Use the window below to run the code to reproduce results from the video.


Exercise 3: Running Logistic Regression with One Predictor

Using the admit data from earlier, run a logistic regression predicting admission (admit_F from GRE scores (gre*). Name your model ex3. Be sure to convert to odds ratios and get confidence intervals as well. Remember, a higher score on the admissions variable is a yes response, lower scores are a no response.


ex3<-glm(admit_F~gre, data=admit, family = binomial)
summary(ex3)
exp(ex3$coefficients)
exp(confint(ex3))

Quiz 3: Interpret Results

learnr::quiz(
  learnr::question("Based on the results of your analysis...",
    learnr::answer("Higher GRE scores relate to a greater likelihood of admission", correct = TRUE),
    learnr::answer("GRE scores are not related to admission"),
    learnr::answer("Lower GRE scores relate to a greater likelihood of admission"),
    correct = "Yes!",
    incorrect = "Sorry, that's incorrect.",
    random_answer_order = TRUE,
    allow_retry = T
  )
)

Model Fit

Just like in linear regression, we get a statistic that addresses whether or not the model significantly predicts outcomes. Those are not particularly useful with just one predictor but we will examine the statistic now so we can run use those values later for models with multiple predictors. We will also examine how well our models predict outcomes in terms of the percent of predictions that are correct, something that is useful regardless of the number of predictors.

The basic process here is to take the difference between the null deviance model and the residual deviance model.

Model1<-glm(comply_F~physrec_F, data=logistic2, family = binomial)
summary(Model1)

The null model reflects a model with no predictors. This is sometimes called a "intercept-only" model. It is how well we can predict comply without any predictors. One way to think about this is by asking "how many people could we accurately classify" simply based on descriptive statistics. Let's start by examining the summary below.

summary(logistic2$comply_F)

Based on these numbers, we could correctly classify 88 people out of the total of 164 (88+76) by simply predicting everyone was in the "No" category.

Comparing the null model and our prediction model (the one where we used physrec as a predictor) is accomplished with the code below. For some reason, the logistic regression output does not provide that test but the code below accomplishes.

modelChi<-Model1$null.deviance-Model1$deviance
modelChi
chidf<-Model1$df.null-Model1$df.residual
chidf
chisq.prob<-1-pchisq(modelChi, chidf)
chisq.prob

Here we get a chi-square value of 34.6 based on 1 df (each predictor uses a df) with p < .001.

Exercise 4: Model Fit with Chi-Square

Using the admit data from earlier, run a logistic regression predicting admission (admit_F from GRE scores (gre). Name your model ex4 this time. Derive the Chi-square, df, and probability.


ex4<-glm(admit_F~gre, data=admit, family = binomial)
modelChi<-ex4$null.deviance-ex4$deviance
modelChi
chidf<-ex4$df.null-ex4$df.residual
chidf
chisq.prob<-1-pchisq(modelChi, chidf)
chisq.prob

Digging deeper on model fit: Null model

We can examine the null by running a logistic regression without any predictors. To do that the code below uses ~1. You can use this is in any prediction model to run a null (a.k.a., intercept only) model. In many cases, R provides this as a null model in its output, so running the intercept-only model is unnecessary. However, in some applications (e.g., Mixed Linear Models), you need to run the null model for model comparison tests.

null<-glm(comply_F~1, data=logistic2, family = binomial)
summary(null)

Note that the deviance statistic are the same as we saw in Model 1 under Null Deviance.

summary(Model1)

More Model Fit: Predicted outcome vs. actual outcome

Another way to evaluate model fit is to compare actual outcomes to predicted outcomes. The code below accomplishes this. Technically, we are taking the fitted (predicted) probability of group membership and assigning people to groups based on those predicted probabilities.

This isn't a formal evaluation of fit (i.e., we don't get a significance test) but it is good common sense way to evaluate fit over and above significance values. For example, a model that does not improve prediction very much over the null model might yield a statistically significant model fit under some conditions (e.g., large sample size) but not appreciably improve classification.

table(null$y,fitted(null)>.5)

Across the top is the prediction. Here, with no predictors (only the intercept) we predict everyone to be "False" (a No response on the outcome variable). On the left hand side, the 0 and 1 reflect the actual scores, again showing that we have 88 Nos and 76 Yeses. Even without predictors, we still could get about 54% correct.

Now, examining the model that includes predictors.

table(Model1$y,fitted(Model1)>.5)

Here, we see we are now predicting both True (Yes) and False (No) outcomes. We see that of the 88 people who did not get screened, we predicted 44 correctly as False but also 44 incorrectly as True - so we have a 50% accuracy rate. Among those that did get screened (the 1s), we got 69 correct but only 7 wrong (91% correct!). Overall we have 113 correct responses (44+69) and 51 incorrect (7+44). In total, we have 69% right compared to 54% without any predictors.

It is always important to interpret this statistic in comparison to the null model. You may be able to correctly classify 70% in a model with predictors but that means very different things if the null model correctly classifies 68% vs. 50%. If the null classifies 68% correctly, our new model (with predictors) isn't much of an improvement whereas if the null classifies 50% correctly, our new model substantially improves prediction.

Exercise 5: Classification Tables

Build a classification table for the admit data. Use the model built in the previous exercise called ex4 to derive the table.

ex4<-glm(admit_F~gre, data=admit, family = binomial)

table(ex4$y,fitted(ex4)>.5)

Quiz 4: Interpretation

learnr::quiz(
  learnr::question("Based on the classification table, which statement is true",
    learnr::answer("Including GRE as a predictor did not increase the number of correctly  classified cases", correct = TRUE),
    learnr::answer("GRE scores improved prediction"),
    learnr::answer("GRE scores made prediction worse"),
    correct = "Correct! Note that everyone was predicted as FALSE. Since the majority of cases were No, we could simply predict everyone as being not admitted and get about 68% right (273/400)",
    incorrect = "Sorry, that's incorrect.",
    random_answer_order = TRUE,
    allow_retry = T
  )
)

Note that in the prediction model we did see that GRE scores were significantly related to admission but the classification statistics tell us that we still didn't improve prediction accuracy over the null.

summary(glm(admit_F~gre, data=admit, family = binomial))

Video 4: Logistic Regression with Multiple Predictors

Quiz 5: Interactions?

learnr::quiz(
  learnr::question("Logistic regression cannot handle interactions: ",
    learnr::answer("False", correct = TRUE),
    learnr::answer("True"),
    correct = "Correct!",
    incorrect = "Sorry, that's incorrect.",
    random_answer_order = TRUE,
    allow_retry = T
  ),
  learnr::question("Centering is not necessary in logistic regression: ",
    learnr::answer("False", correct = TRUE),
    learnr::answer("True"),
    correct = "Correct!",
    incorrect = "Sorry, that's incorrect.",
    random_answer_order = TRUE,
    allow_retry = T
  )
)

Logistic Regression with Multiple Predictors

This analysis appeared in the video.

Model4<-glm(comply~physrec+knowledg+benefits+barriers, data=logistic2, family = binomial())
summary(Model4)
exp(Model4$coefficients)
exp(confint(Model4))
modelChi<-Model4$null.deviance-Model4$deviance
modelChi
chidf<-Model4$df.null-Model4$df.residual
chidf
chisq.prob<-1-pchisq(modelChi, chidf)
chisq.prob

To interpret this model, start with the model chi-square values. We have $\chi^2$(4) = 58.8, p < .001. This tells us the four variables together significantly predict compliance.

From there, move to the specific predictors. My preference is to use exp(b), the odds ratio. Below I interpret each using their 95% ci's but you can also do this with the probability. Recall that if the CI does not include 1, that we reject the null (a value of 1 means the odds of admission were completely unrelated to the outcome).

Physician's recommendation related to greater compliance with screening, exp(b) = 6.31, 95% CI [2.53,17.57]. Similarly, those who perceived more benefits from mammography were more likely to comply, exp(b) = 1.72, 95% CI [1.08,2.82]. Perceived barriers related to less compliance, exp(b) = 0.56, 95% CI [0.40,0.77], Knowledge was unrelated to compliance, exp(b) = 0.92, 95% CI [0.11,7.69]

Turning to classification, the table below shows us jumping to 69% correctly classified (113 correct out of 164 total).

table(Model4$y,fitted(Model4)>.5)

Exercise 6: Running logistic with multiple predictors

Run logistic regression using the admit data. Predict admit_F from gre, gpa, and rank. Call your model ex6. Just run the model and summarize it for this part.


ex6<-glm(admit_F~gre+gpa+rank, data=admit, family = binomial)
summary(ex6)

Now get the odds ratios and their CIs


exp(ex6$coefficients)
exp(confint(ex6))

Finally, calculate the model $\chi^2$, df, and probability.


modelChi<-ex6$null.deviance-ex6$deviance
modelChi
chidf<-ex6$df.null-ex6$df.residual
chidf
chisq.prob<-1-pchisq(modelChi, chidf)
chisq.prob

Quiz 6 Interpreting Multiple Predictor Models

learnr::quiz(
  learnr::question("What statistic(s) address whether the model predicted admission?",
    learnr::answer("Chi-square", correct = TRUE),
    learnr::answer("Log odds ratios (coefficients)"),
    learnr::answer("Odds ratios (exponentiated coefficients)"),
    correct = "Correct!",
    incorrect = "Sorry, that's incorrect.",
    random_answer_order = TRUE,
    allow_retry = T
  ),
  learnr::question("How many degrees of freedom does the model have?",
    learnr::answer("3", correct = TRUE),
    learnr::answer("1"),
    learnr::answer("2"),
    learnr::answer("4"),
    correct = "Correct! One df per predictor",
    incorrect = "Sorry, that's incorrect.",
    random_answer_order = TRUE,
    allow_retry = T
  ),
  learnr::question("Was the model statistically significant?",
    learnr::answer("Yes", correct = TRUE),
    learnr::answer("No"),
    correct = "Correct!",
    incorrect = "Sorry, that's incorrect.",
    random_answer_order = TRUE,
    allow_retry = T
  ),
  learnr::question("Which variables uniquely predicted admission?",
    learnr::answer("All of them", correct = TRUE),
    learnr::answer("Just GRE"),
    learnr::answer("Just GPA"),
    learnr::answer("Just Rank"),
    learnr::answer("Just GRE and GPA"),
    correct = "Correct!",
    incorrect = "Sorry, that's incorrect.",
    random_answer_order = TRUE,
    allow_retry = T
  ),
  learnr::question("Which is the best conclusion regarding GRE scores",
    learnr::answer("Higher GRE scores related to a greater chance of admission", correct = TRUE),
    learnr::answer("Lower GRE scores related to a greater chance of admission"),
    learnr::answer("GRE scores were unrelated to admission decisions"),
    correct = "Correct!",
    incorrect = "Sorry, that's incorrect.",
    random_answer_order = TRUE,
    allow_retry = T
  ),
  learnr::question("Which is the best conclusion regarding GPA",
    learnr::answer("Higher GPA related to a greater chance of admission", correct = TRUE),
    learnr::answer("Lower GPA related to a greater chance of admission"),
    learnr::answer("GPA was unrelated to admission decisions"),
    correct = "Correct!",
    incorrect = "Sorry, that's incorrect.",
    random_answer_order = TRUE,
    allow_retry = T
  ),
  learnr::question("Which is the best conclusion regarding rank (recall that rank is sort of reversed, a rank of 1 is highest and a rank of 4 is lowest",
    learnr::answer("More favorable rankings (e.g., a 1 or 2) related to a greater chance of admission", correct = TRUE),
    learnr::answer("Less favorable rankings (e.g., a 3 or 4) related to a greater chance of admission"),
    learnr::answer("Rankings were unrelated to admission decisions"),
    correct = "Correct!",
    incorrect = "Sorry, that's incorrect.",
    random_answer_order = TRUE,
    allow_retry = T
  )

)

Exercise 7: Classification for multiple predictors

Recall the code for classification from earlier.

table(Model4$y,fitted(Model4)>.5)

Adapt this code to the model in the previous exercise to build a classification table for the admit data. You can use the analysis you ran called ex6.

ex6<-glm(admit_F~gre+gpa+rank, data=admit, family = binomial)

table(ex6$y,fitted(ex6)>.5)

Quiz 7: Classification

learnr::quiz(
  learnr::question("What percent of the total number of no outcomes (coded as 0) were correctly classified (as False)? ",
    learnr::answer("Over 90%", correct = TRUE),
    learnr::answer("Under 10%"),
    learnr::answer("Around 70"),
    correct = "Correct! There are 273 total 0 responses (253+20). 253 of those were correctly classified",
    incorrect = "Sorry, that's incorrect.",
    random_answer_order = TRUE,
    allow_retry = T
  ),
learnr::question("What percent of the total number of yes outcomes (coded as 1) were correctly classified (as True)? ",
    learnr::answer("Around 20%", correct = TRUE),
    learnr::answer("Over 75%"),
    learnr::answer("Around 70"),
    correct = "Correct! There are 127 total 1 responses (98+29). Only 29 of those were correctly classified",
    incorrect = "Sorry, that's incorrect.",
    random_answer_order = TRUE,
    allow_retry = T
  ),
learnr::question("What percent of the total number  outcomes were correctly classified? ",
    learnr::answer("Around 70%", correct = TRUE),
    learnr::answer("Under 30%"),
    learnr::answer("Nearly 100%"),
    correct = "Correct! There are 282 correct classifications (253+29) out of a total of 400 cases.",
    incorrect = "Sorry, that's incorrect.",
    random_answer_order = TRUE,
    allow_retry = T
  )
)

Video 5: Logistic Regression Other Useful Statistics

Likelihood ratio $\chi^2$

The LR $\chi^2$ is a more stable estimate of statistical significance than are the z-tests provided in R (these are often termed Wald tests and sometimes presented as a $\chi^2$ value). Although not the case in our examples, those values tend to go crazy with smaller samples and complex models.

Running the LR $\chi^2$ requires that we first build a model with all of our predictors. Then we build models with all but one of the predictors included. In the example from the video with four predictors this meant that we created four additional models, in each one we excluded one (and only one) of our variables. Model 4 is the full model (with all four predictors). Model.phys has everything but physician's recommendation. Model.know has everything but knowledge, etc.

First for physrec

Model4<-glm(comply~physrec+knowledg+benefits+barriers, data=logistic2, family = binomial())
Model.phys<-glm(comply~knowledg+benefits+barriers, data=logistic2, family = binomial())

Now for knowledg. Note that I don't need the full model here because we already ran it.

Model.know<-glm(comply~physrec+benefits+barriers, data=logistic2, family = binomial())

For benefits

Model.benef<-glm(comply~physrec+knowledg+barriers, data=logistic2, family = binomial())

For barriers

Model.barr<-glm(comply~physrec+knowledg+benefits, data=logistic2, family = binomial())

Next we use the lmtest package to build model comparisons. Note that the command is lrtest (rather than lmtest).

lmtest::lrtest(Model4,Model.phys)
lmtest::lrtest(Model4,Model.know)
lmtest::lrtest(Model4,Model.benef)
lmtest::lrtest(Model4,Model.barr)

The result for the first comparison (Model4 vs. Model.phys) is the LR $\chi^2$ for phys and so on down the line.

Exercise 8 - Likelihood Ratio $\chi^2$

Using the admit data, build your models here. Recall, we predicted admit_F from gre,gpa,and rank. Remember you will need one full model and then several (depending on the number of predictors) that exclude a single variable. Note: We are just building the models here. When you run it nothing will appear yet as we are just creating the objects that hold the model information.


ModelFull<-glm(admit~gre+gpa+rank, data=admit, family = binomial)
Model.gre<-glm(admit~gpa+rank, data=admit, family = binomial)
Model.gpa<-glm(admit~gre+rank, data=admit, family = binomial)
Model.rank<-glm(admit~gre+gpa, data=admit, family = binomial)

Once you've built the models, compare them using lrtest from the lmtest package. This should yield three tests.

ModelFull<-glm(admit~gre+gpa+rank, data=admit, family = binomial)
Model.gre<-glm(admit~gpa+rank, data=admit, family = binomial)
Model.gpa<-glm(admit~gre+rank, data=admit, family = binomial)
Model.rank<-glm(admit~gre+gpa, data=admit, family = binomial)

lmtest::lrtest(ModelFull,Model.gre)
lmtest::lrtest(ModelFull,Model.gpa)
lmtest::lrtest(ModelFull,Model.rank)

For comparison, I've added a summary of the full model below.

ModelFull<-glm(admit~gre+gpa+rank, data=admit, family = binomial)
summary(ModelFull)

$R^2$ Analogues ($R^2_L$)

The final value we examined in the video was $R^2_L$. The code below calculated the statistic from our final model.

R2L<-(Model4$null.deviance-Model4$deviance) / Model4$null.deviance
R2L

Quiz 8: Likelihood Ratio $\chi^2$

learnr::quiz(
  learnr::question("Compare the results we get using the z values from the full model to those produced by `lrtest`. Which conclusion is most correct? (Note, the p values are a good way to draw comparisons",
    learnr::answer("Findings are consistent", correct = TRUE),
    learnr::answer("Very different findings"),
    learnr::answer("Consistent for some variables, inconsistent for others"),
    correct = "Correct! In this case both tests provided results that were very similar.",
    incorrect = "Sorry, that's incorrect.",
    random_answer_order = TRUE,
    allow_retry = T
  ),
learnr::question("According to the lecture, under what conditions will you more likely see differences between the two approaches?",
    learnr::answer("Small sample sizes", correct = TRUE),
    learnr::answer("Large sample sizes"),
    learnr::answer("Non-normal predictors"),
    correct = "Correct! The LR chi-square is preferred with small samples. No harm in using for everything though.",
    incorrect = "Sorry, that's incorrect.",
    random_answer_order = TRUE,
    allow_retry = T
  )
)

Exercise 9 Computing $R^2_L$

Adapt the code above to the admit data. Be sure to use your final model.

ModelFull<-glm(admit~gre+gpa+rank, data=admit, family = binomial)

R2L<-(ModelFull$null.deviance-ModelFull$deviance) / ModelFull$null.deviance
R2L

The more common $R^2$ analogues produce somewhat different findings. Nagelkerke gives .14 and Cox-Snell gives .10. (Not computed in this example, just provided for comparison).

Quiz 9: $R^2_L$ vs. Other Estimates

learnr::quiz(
  learnr::question("Compared to other approaches, *$R^2_L$* tends to provide ... ",
    learnr::answer("Smaller values", correct = TRUE),
    learnr::answer("Bigger values"),
    learnr::answer("Consistent estimates, all of approaches are the same"),
    correct = "Correct! ",
    incorrect = "Sorry, that's incorrect.",
    random_answer_order = TRUE,
    allow_retry = T
  ),
learnr::question("*$R^2_L$* is considered the 'best' $R^2$ analogue, however use of other approaches persists. Which of the following reasons were discussed in the video?",
    learnr::answer("Both are correct", correct = TRUE),
    learnr::answer("Researchers tend to prefer larger $R^2$ values"),
    learnr::answer("SPSS defaults to Cox Snell and Nagelkerke estimates"),
    correct = "Correct! Although the difference may seem small, they actually represent large proportional differences. The Cox Snell value .10 of is actually 25% bigger than *$R^2_L$* of .08. The Nagelkerke value of .14 is 75% bigger than *$R^2_L$* of .08!",
    incorrect = "Sorry, that's incorrect.",
    random_answer_order = TRUE,
    allow_retry = T
)
)

Congratulations! You've reached the end of the tutorial.



chrisaberson/MVstats documentation built on May 5, 2020, 10:34 p.m.