library('dplyr') library('ISDSWorkshop') workshop(launch_index=FALSE)
x = 1:10 y = rep(c(1,2), each = 5) m = lm(y ~ x) s = summary(m)
Now, look at the result of each line
x
y
m
s
s$r.squared
For those who are familiar with linear regression, this may look familiar.
Calculate the probability the individual has the disease if the test is positive when
specificity = 0.95 sensitivity = 0.99 prevalence = 0.001 probability = (sensitivity*prevalence) / (sensitivity*prevalence + (1-specificity)*(1-prevalence)) probability
Yes, it is only about 2%!
Read in the fluTrends.csv
file.
# Read in the csv file fluTrends = read.csv('fluTrends.csv') names(fluTrends) # To maintain pretty column names, use fluTrends = read.csv('fluTrends.csv', check.names = FALSE) names(fluTrends) # unfortunately these names won't work with the # fluTrends$colname syntax, but you can use back-ticks summary(fluTrends$`United States`)
GI = read.csv("GI.csv")
# Min, max, mean, and median age for zipcode 20032. GI_20032 <- GI %>% filter(zipcode == 20032) min( GI_20032$age) max( GI_20032$age) mean( GI_20032$age) median(GI_20032$age)
Alternatively
summary(GI_20032$age)
Construct a histogram and boxplot for age at facility 37.
# Construct a histogram and boxplot for age at facility 37. GI_37 <- GI %>% filter(facility == 37) hist(GI_37$age) # Construct a boxplot for age at facility 37. boxplot(GI_37$age)
Construct a bar chart for the zipcode at facility 37.
# Construct a bar chart for the zipcode at facility 37. barplot(table(GI_37$zipcode))
Perhaps this plot isn't so useful. Maybe it would be better to just use the first 3 zipcode digits
# Construct a bar chart for the first three digits of zipcode at facility 37. barplot(table(trunc(GI_37$zipcode/100)))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.