In emeyers/SDS100: Tools for the class Introductory Statistics

knitr::opts_chunk$set(echo = TRUE)

library(SDS100)

# download the data

SDS100::download_data("zodiac.csv")
SDS100::download_data("popularkids.txt")
SDS100::download_data("analgesics.txt")

$\$

Inference on more than 2 proportions

Let's run a chi-square test for goodness of fit test to see if there are the same number of CEOs born under different astrological signs.

Step 1:

#SDS100::download_data("zodiac.csv")


zodiac <- read.csv("zodiac.csv", header = TRUE)


# 2a: visualize the data

births <- zodiac$Births





# step 2b. calculate the observed statistic






# step 3: visualize the null





# step 4: p-value




# 5. make a decision


# sanity check using built in R functions

$\$

Inference on more than 2 proportions using resampling methods

Can we run the test using resampling methods?

Step 1:

# step 2:  calculate the observed statistic





# step 3: create the null distribution 






# view the null distribution






# how does this compare with the parametric p-value?

$\$

Differences in categorical distributions:chi-square test for association

The data is from:

Chase, M. A., and Dummer, G. M. (1992), "The Role of Sports as a Social Determinant for Children," Research Quarterly for Exercise and Sport, 63, 418-424

The subjects, students in grades 4-6 in selected schools in Michigan, were asked the following question: What would you most like to do at school?

A. Make good grades. B. Be good at sports. C. Be popular.

Demographic information was also collected for each student.

# chi-square test for association
# grades, popularity, sports preference across grades 


# data code book
# https://math.tntech.edu/machida/ISR/3070/DASL/a/popularkids.html?dataset=3070/DASL/a/popularkids.txt


library(SDS100)

#download_data("popularkids.txt")


kids <- read.table("popularkids.txt", header = TRUE)

grade <- kids$Grade
goals <- kids$Goals


# Step 1: 
# H0: 
# HA: 





# Step 2:

# observed table



# visualize the data
par(mfrow = c(3, 1))







# expected counts






# can visualize the expected counts
par(mfrow = c(3, 1))








# calculate chi-square statistic





# step 3: null dist



par(mfrow = c(1, 1))






# step 4: p-value





# step 5: decision



# built in R function to do it!






# Extra: can also look at rural, suburan and urban
# Is there a difference between these? 


living_location <- kids$Urban.Rural


# observed table



# visualize the data

$\$

One-way ANOVA

A pharmaceutical company tested three formulations of a pain relief medicine for migraine headache sufferers. For the experiment, 27 volunteers were selected and 9 were randomly assigned to one of three drug formulations. The subjects were instructed to take the drug during their next migraine headache episode and to report their pain on a scale of 1 = no pain to 10 = extreme pain 30 minutes after taking the drug.

data from: https://dasl.datadescription.com/datafile/analgesics/?_sfm_methods=Analysis+of+Variance&_sfm_cases=4+59943

# download_data("analgesics.txt")

drugs <- read.table("analgesics.txt", header = TRUE)


# step 1: 

# H0: 
# HA: 




# step 2






# variances barely meet criteria




# get observe statistic using the get_F_stat function






# step 3: visualize null








# step 4:






# step 5:






# built in R functions