knitr::opts_chunk$set(echo = TRUE)
library(SDS100)

download_data("popularkids.txt")
download_data("analgesics.txt")
library(SDS100)

$\$

Differences in categorical distributions:chi-square test for association

The data is from:

Chase, M. A., and Dummer, G. M. (1992), "The Role of Sports as a Social Determinant for Children," Research Quarterly for Exercise and Sport, 63, 418-424

The subjects, students in grades 4-6 in selected schools in Michigan, were asked the following question: What would you most like to do at school?

A. Make good grades. B. Be good at sports. C. Be popular.

Demographic information was also collected for each student.

# chi-square test for association
# grades, popularity, sports preference across grades 


# data code book
# https://math.tntech.edu/machida/ISR/3070/DASL/a/popularkids.html?dataset=3070/DASL/a/popularkids.txt


library(SDS100)

#download_data("popularkids.txt")


kids <- read.table("popularkids.txt", header = TRUE)

grade <- kids$Grade
goals <- kids$Goals


# Step 1: 
# H0: 
# HA: 





# Step 2:

# observed table



# visualize the data
par(mfrow = c(3, 1))







# expected counts






# can visualize the expected counts
par(mfrow = c(3, 1))








# calculate chi-square statistic





# step 3: null dist



par(mfrow = c(1, 1))






# step 4: p-value





# step 5: decision



# built in R function to do it!






# Extra: can also look at rural, suburan and urban
# Is there a difference between these? 


living_location <- kids$Urban.Rural


# observed table



# visualize the data

$\$

One-way ANOVA

A pharmaceutical company tested three formulations of a pain relief medicine for migraine headache sufferers. For the experiment, 27 volunteers were selected and 9 were randomly assigned to one of three drug formulations. The subjects were instructed to take the drug during their next migraine headache episode and to report their pain on a scale of 1 = no pain to 10 = extreme pain 30 minutes after taking the drug.

data from: https://dasl.datadescription.com/datafile/analgesics/?_sfm_methods=Analysis+of+Variance&_sfm_cases=4+59943

$\$

Step 1

# download_data("analgesics.txt")

drugs <- read.table("analgesics.txt", header = TRUE)




# step 2

drug_type <- drugs$Drug
pain_rating <- drugs$Pain

# get the means by group to explore the data



# visualize the data




# see if there is the same variability between groups



# get the observed F-statistic using the get_F_stat() function 






# step 3: visualize null distribution









# step 4






# step 5:






# built in R functions for running an ANOVA using the aov() function

$\$

Inference for regression - Bechdel data

Step 1

# load the library with the data
library(fivethirtyeight)

# remove missing values
bechdel <- na.omit(bechdel)

# extract variables of interest
budget <- bechdel$budget/10^6
revenue <- bechdel$domgross/10^6


# fit a linear model


# get the slope coefficient from the model




# step 3: using randomization methods





# visualize the null distribution: where is the observed stat




# step 4 





# step 5





# using parametric methods with R's built-in functions







# could do the bootstrap to get a CI for beta0 using SDS100::resample_pairs()








# we can also use R's built in functions to test if rho = 0...


emeyers/SDS100 documentation built on April 28, 2024, 5:07 p.m.