library('dplyr')
library('ISDSWorkshop')
workshop(launch_index=FALSE)

Introduction to R - Activity Solutions

Introduction

x = 1:10
y = rep(c(1,2), each = 5)
m = lm(y ~ x)
s = summary(m)

Now, look at the result of each line

x
y
m
s
s$r.squared

For those who are familiar with linear regression, this may look familiar.

Calculator

Calculate the probability the individual has the disease if the test is positive when

specificity = 0.95
sensitivity = 0.99
prevalence = 0.001
probability = (sensitivity*prevalence) / (sensitivity*prevalence + (1-specificity)*(1-prevalence))
probability

Yes, it is only about 2%!

Read csv file

Read in the fluTrends.csv file.

# Read in the csv file
fluTrends = read.csv('fluTrends.csv')
names(fluTrends)

# To maintain pretty column names, use 
fluTrends = read.csv('fluTrends.csv', check.names = FALSE)
names(fluTrends)
# unfortunately these names won't work with the 
# fluTrends$colname syntax, but you can use back-ticks
summary(fluTrends$`United States`)

Descriptive statistics

GI = read.csv("GI.csv")
# Min, max, mean, and median age for zipcode 20032.
GI_20032 <- GI %>%
  filter(zipcode == 20032)

min(   GI_20032$age)
max(   GI_20032$age)
mean(  GI_20032$age)
median(GI_20032$age)

Alternatively

summary(GI_20032$age)

Graphical statistics

Construct a histogram and boxplot for age at facility 37.

# Construct a histogram and boxplot for age at facility 37.
GI_37 <- GI %>%
  filter(facility == 37) 

hist(GI_37$age)

# Construct a boxplot for age at facility 37.
boxplot(GI_37$age)

Construct a bar chart for the zipcode at facility 37.

# Construct a bar chart for the zipcode at facility 37.
barplot(table(GI_37$zipcode))

Perhaps this plot isn't so useful. Maybe it would be better to just use the first 3 zipcode digits

# Construct a bar chart for the first three digits of zipcode at facility 37.
barplot(table(trunc(GI_37$zipcode/100)))


jarad/ISDSWorkshop documentation built on May 18, 2019, 2:39 p.m.