IntroAnalysis:::ma223_setup()
library(learnr)

knitr::opts_chunk$set(echo = FALSE)
learnr::tutorial_options(exercise.cap = "Exercise")

## Shared Data ----
acadsleep2 = acadsleep |>
  drop_na(sleepHours) |>
  group_by(participantID) |>
  summarise(avgsleep = mean(sleepHours))

```{css, echo=FALSE} / Solution Functionality / .accordion > input[type="checkbox"] { position: absolute; left: -100vw; }

.accordion .content { overflow-y: hidden; height: 0; transition: height 0.3s ease; }

.accordion > input[type="checkbox"]:checked ~ .content { height: auto; overflow: visible; }

.accordion label { display: block; }

/ Styling /

.accordion { margin-bottom: 1em; }

.accordion > input[type="checkbox"]:checked ~ .content { padding: 15px; border: 1px solid #e8e8e8; border-top: 0; }

.accordion .handle { margin: 0; }

.accordion label { cursor: pointer; font-weight: normal; padding: 15px; }

.accordion label:hover, .accordion label:focus { background: #d8d8d8; }

/ Turning arrow / .accordion .handle label:before { font-family: 'fontawesome'; content: "\f054"; display: inline-block; margin-right: 10px; font-size: .58em; line-height: 1.556em; vertical-align: middle; }

.accordion > input[type="checkbox"]:checked ~ .handle label:before { content: "\f078"; }

.tipbox { padding: 1em 1em 1em 1em; border: 1px solid black; border-radius: 10px; box-shadow: 2px 2px #888888; }

.keyidea { border: 1px solid #54585A; background: #4F758B; color: white; }

.keyidea:before { content: "Key Idea: "; font-weight: bold; }

.tip { background: lightyellow; }

.tip:before { content: "Tip: "; font-weight: bold; }

.title{ color: #800000; background: none; }

.solution { background: #ebebeb; }

h2{ color: white; background-color: #800000; }

h3{ color: #800000; border-bottom: thick solid #696969; }

h4{ color: #800000; }

blockquote { font-size: inherit; padding: 0px 20px; }

## Background
It is no surprise that research continues to establish the importance of sleep in a healthy lifestyle.  It is also no surprise that college students routinely claim that getting the recommended amount of sleep each night is not possible.  The StudentLife dataset, collected by Dartmouth College, recorded several features on undergraduate students.  Among the measurements recorded were the amount and the quality of sleep of participants over several weeks (approximately 30 days).  A subset of this data, investigated by [Benard and Darod (2018)](https://osf.io/ew9k7), is available (`acadsleep`).  

  > According to the National Sleep Foundation, most college-aged students need 79 hours of sleep per night to avoid daytime drowsiness, mood changes, weight gain, poor health, and low energy.  If you are struggling to maintain this level of sleep (as you will see is common among college students in this sample), speak with the [Office of Student Academic Success](https://rosehulman.sharepoint.com/sites/osas/SitePages/Home.aspx) to learn time-management skills that can prioritize sleep or the [Student Counseling Center](https://rosehulman.sharepoint.com/sites/counseling/SitePages/Home.aspx) to learn skills to calm your mind and help you sleep; this is also a valuable resource if a lack of sleep is leading to increased feelings of depression.


## Sampling Distributions
### Exercise: Interpreting Sampling Distributions
> The Patient Health Questionnaire-9 is used to screen individuals for signs of depression.  Higher scores indicate increases signs of depression.  A score of 10 or higher indicates moderate (to more severe) depressive symptoms.  
> 
> Suppose we are interested in the proportion of students who exhibit signs of moderate-to-severe depression.  Using the data, we constructed the following model of the sampling distribution of the proportion of students exhibiting signs of moderate-to-severe depression.  The model is based on 5000 bootstrap replications.  Based on the model, is it reasonable that 20% of students exhibit signs of moderate-to-severe depression?  Explain.
>
> __Note:__ The PHQ-9 is a self-screening tool, and it is not meant to be used alone for diagnosing individuals.  If you believe that you, or someone you know, is experiencing depression, seek the guidance of a medical professional, such as the [Student Counseling Center](https://rosehulman.sharepoint.com/sites/counseling/SitePages/Home.aspx).

```r
set.seed(123)
acadsleep.red <- acadsleep |>
  group_by(participantID) |>
  slice(1) |>
  ungroup()

prop.distn <- tibble(
  props = rbinom(5000, size = nrow(acadsleep.red),
                 prob = mean(acadsleep.red$PHQ9 >= 10)) / nrow(acadsleep.red)
)

ggplot(data = prop.distn,
       mapping = aes(x = props)) +
  geom_bar(fill = 'grey75', color = 'black') +
  labs(y = "",
       x = "Sample Proportion of Students\nwith Moderate-to-Severe Depression") +
  theme(axis.ticks.y = element_blank(),
        axis.text.y = element_blank())

Yes, it is reasonable that 20% of students exhibit signs of moderate-to-severe depression based on this study. When looking at the model for the sampling distribution of the statistic, we identify the "center" of the distribution to range between approximately 0.08 and 0.22; any values falling in this center region are reasonable values of the underlying parameter. That is, the data is consistent with being generated from a population where the true parameter was any value in this range.
Sampling distributions are centered on the true value of the parameter while _models_ for sampling distributions are centered on the statistic from the sample; values toward the center of the _model_ indicate reasonable values of the parameter.

Constructing a Confidence Interval

Exercise: Mathematical Model

Suppose we are interested in estimating the average number of hours a students gets of sleep. Using mathematical notation, write the model for the data generating process which corresponds to this research objective.

Note: this is an interesting research question in that the response variable is an average of repeated measurements taken over the course of the study. We will see better ways of addressing this question in future modules.

Let $\mu$ represent the mean average number of hours a student gets of sleep; the "mean" is not redundant because the response variable is an average. Then, our model for the data generating process is $$(\text{Average Sleep})_i = \mu + \varepsilon_i$$

Exercise: Computing a CI

The following code computes the average number of hours of sleep (avgsleep) each participant received over the course of the study. The resulting data is stored in acadsleep2; use this dataset moving forward.

acadsleep2 = acadsleep |>
  drop_na(sleepHours) |>
  group_by(participantID) |>
  summarise(avgsleep = mean(sleepHours))

Using the available data compute a 98% confidence interval for the mean average hours of sleep a student receives. You may assume the sample is representative of all college students; you may also assume the amount of sleep one student receives is independent of the amount of sleep any other student receives.

sleep.model = specify_mean_model()

estimate_parameters()
sleep.model = specify_mean_model(avgsleep ~ 1, data = acadsleep2)

estimate_parameters(sleep.model,
                    confidence.level = 0.98)
Note that the computer _always_ assumes observations are independent of one another (and therefore, we do not need to specify this). The computer also _always_ assumes the errors are identically distributed.

Interpreting a Confidence Interval

Exercise: Conclusions

Suppose the Student Affairs Office is made aware of the above results and concludes that nearly every student is receiving at least 5.5 hours of sleep, on average, each night. Is this an appropriate conclusion? Explain.

No, this is not an appropriate conclusion. A confidence interval is an estimate of the _parameter_; it does not make a statement about the value of the variable at the level of individuals in the population. Therefore, while we can make a statement about the mean average amount of sleep a student is receiving, we cannot make a statement about the average for any individual.

Exercise: Hypotheses

Does the study provide evidence that, on average, students sleep an average of more than 6 hours each night? Explain.

No, we do not have evidence to support this statement. Notice that 6 is within our confidence interval; therefore, 6 is a reasonable value for the mean average amount of sleep students receive. While it is _reasonable_ that students sleep an average of more than 6 hours each night, on average, we do not have _evidence_ for this statement. We have "evidence" for a statement when the entire confidence interval agrees with the statement. In this case, the data is consistent with mean average sleep values of more than 6 as well as less than 6.

Exercise: Comparing Distributions

In the context of estimating the number of hours students sleep, compare and contrast the distribution of the sample and the sampling distribution of the sample mean.

The sample shows the variability in the average number of hours (the variable) from one student to another. It has quite a lot of variability. wzxhzdk:5 In contrast, the model for the sampling distribution is much narrower, and more bell-shaped. However, because it is a _model_ for the sampling distribution, it is centered in the same place as the distribution of the sample. wzxhzdk:6


reyesem/IntroAnalysis documentation built on March 29, 2025, 3:29 p.m.