Advanced R

If you shut down R, you will need to change your working directory and run the following commands to load libraries,

library('ggplot2')
library('dplyr')
library('tidyr')
library('RColorBrewer') 
library('ISDSWorkshop')
workshop(launch_index=FALSE)

Read in the data files and create additional variables

# Read csv files and create additional variables
icd9df = read.csv("icd9.csv")
GI     = read.csv("GI.csv") %>%
  mutate(
    date      = as.Date(date),
    weekC     = cut(date, breaks="weeks"),
    week      = as.numeric(weekC),
    weekD     = as.Date(weekC),
    facility  = as.factor(facility),
    icd9class = factor(cut(icd9, 
                           breaks = icd9df$code_cutpoint, 
                           labels = icd9df$classification[-nrow(icd9df)], 
                           right  = TRUE)),
    ageC      = cut(age, 
                    breaks = c(-Inf, 5, 18, 45 ,60, Inf)),
    zip3      = trunc(zipcode/100))

Exporting tables

There are many ways to export tables (for use in Word). Here I will cover two basic approachess that are simple and work well.

There are more sophisticated approaches that I will not cover:

Cut-and-paste

You can cut-and-paste directly from R.

ga_l <- GI %>%
  group_by(gender, ageC) %>%
  summarize(count = n())

ga_w <- ga_l %>%
  spread(ageC, count) %>%
  print(row.names = FALSE)

Cut-and-pasting this table is done in ASCII format. This looks good in a plain text document, e.g. Notepad, TextEdit, and some email, but will not look good in other formats, e.g. docx.

Create HTML table

library('xtable')
tab = xtable(ga_w,
             caption = "Total GI cases by Sex and Age Category",
             label   = "myHTMLanchor",
             align   = "ll|rrrrr") # rownames gets a column

Save the table to a file

print(tab, file="table.html", type="html", include.rownames=FALSE)

Output for this HTML table

The HTML code looks like

print(tab, type="html", include.rownames=FALSE)

and the resulting table looks like

print(tab, type="html", include.rownames=FALSE)

Copy-and-paste table to Word

Now you can

  1. Open the file (table.html)
  2. Copy-and-paste this table to Word.

Activity - exporting tables

Create a Word table for the number of cases by facility and age category.

# Summarize data by facility and age category

# Reshape data from long to wide format

# Create HTML table

# Save HTML to file

# Copy-and-paste table into Word

When you have completed the activity, compare your results to the solutions.

Maps

The packages ggplot2 and maps can be used together to make maps.

Map of the continental US

library('maps')
states = map_data("state")
ggplot(states, aes(x = long, y = lat, group = group)) + 
  geom_polygon(color="white")

Map of the counties in the continental US

counties = map_data("county")
ggplot(counties, aes(x = long, y = lat, group = group)) + 
  geom_polygon(color = "white")

Map of the counties in Iowa

ggplot(counties %>% filter(region=="iowa"), 
       aes(x = long, y = lat, group = group)) + 
  geom_polygon(fill = "gray", color = "black")

Construct an appropriate data set

To make an informative map, we need to add data.

fluTrends = read.csv("fluTrends.csv", check.names=FALSE)

For simplicity, only keep the most recent 12 weeks on states.

nr = nrow(fluTrends)
flu_w = fluTrends[(nr-11):nr, c(1,3:53)]
dim(flu_w)

Reshape to long format

flu_l <- flu_w %>%
  gather(region, index, -Date)

Merge fluTrends data with map_data

The region names in map_data are lower case, so use tolower() to convert all the region names in flu_l to lowercase. Then the map_data and fluTrends data are merged using merge().

head(unique(states$region))
flu_l$region = tolower(flu_l$region)
states_merged = merge(states, flu_l, sort=FALSE, by='region')

Make the plots

Some Google Flu Trend data.

states_merged$Date = as.Date(states_merged$Date)
mx_date = max(states_merged$Date)
ggplot(states_merged %>% filter(Date == mx_date), 
       aes(x = long, y = lat, group = group, fill = index)) + 
  geom_polygon() + 
  labs(title=paste('Google Flu Trends on', mx_date), x='', y='') +
  theme_minimal() + 
  theme(legend.title = element_blank()) +
  coord_map("cylindrical") + 
  scale_fill_gradientn(colours=brewer.pal(9,"Reds")) # Thanks Annie Chen

Last 12 weeks.

ggplot(states_merged, 
       aes(x=long, y=lat, group=group, fill=index)) + 
  geom_polygon() + 
  labs(title='Google Flu Trends', x='', y='') +
  theme_minimal() + 
  theme(legend.title = element_blank()) +
  facet_wrap(~Date) + 
  coord_map("cylindrical") + 
  scale_fill_gradientn(colours=brewer.pal(9,"Reds")) 

Activity

Modify the code to determine what elements of the map are affected.

Advanced: Download the data directly from http://www.google.org/flutrends/about/data/flu/historic/us-historic-v3.txt using `read.csv' and then construct a map of the most recent Google Flu Trends data.

# Construct Google Flu Trends map

When you have completed the activity, compare your results to the solutions.

Packages

A package provides additional functionality besides what base installation of R provides. You have already used a number of additional packages:

The Comprehensive R Archive Network (CRAN) has over 9,594 packages. These packages can be installed using the install.packages() function, e.g.

install.packages("ggplot2")

For many reasons, packages may not be on CRAN but may be available from other sources:

Github

To install a package from github, you can use the install_github() function from the devtools package. For example,

# install.packages("devtools")
library('devtools')
install_github('nandadorea/vetsyn')

Bioconductor

Bioconductor provides tools for high-throughput genomic data. There are over 1,296 packages available from bioconductor. To install bioconductor, use

source("http://bioconductor.org/biocLite.R")
biocLite()

Then to install a package from bioconductor use

biocLite("edgeR")

The bioconductor repository may be useful in the future for public health biosurveillance.

Source

All packages can be installed from their source. If a package is not available in a repository, then this is the only way to install the package. To install from source download the source file (.tar.gz) and then use the install.packages() function with arguments repos=NULL and type="source". For example, from a source version of this workshop, you would use the following command:

install.packages("ISDSWorkshop_0.3.tar.gz", 
                 repos = NULL, 
                 type  = "source")

Packages for surveillance

Some packages for performing surveillance are

SpatialEpi - Kulldorff example

The SpatialEpi package implements the Kulldorff cluster detection method in the kulldorff() function.

Setting up the analysis (taken directly from the example)

library('SpatialEpi')

## Load Pennsylvania Lung Cancer Data
data(pennLC)
data <- pennLC$data

## Process geographical information and convert to grid
geo <- pennLC$geo[,2:3]
geo <- latlong2grid(geo)

## Get aggregated counts of population and cases for each county
population <- tapply(data$population, data$county, sum)
cases      <- tapply(data$cases,      data$county, sum)

## Based on the 16 strata levels, compute expected numbers of disease
n.strata       <- 16
expected.cases <- expected(data$population, data$cases, n.strata)

## Set Parameters
pop.upper.bound <- 0.5
n.simulations   <- 999
alpha.level     <- 0.05
plot            <- TRUE

SpatialEpi - Kulldorff example

Plot the data

clr = cases/population
clr = 1-clr/max(clr) # make sure range is (0,1)
plot(pennLC$spatial.polygon, axes=TRUE, col=rgb(1, clr, clr))

SpatialEpi - Kulldorff example

Run the analysis and plot the results (directly from the example)

## Kulldorff using Binomial likelihoods
binomial <- kulldorff(geo, cases, population, NULL, pop.upper.bound, n.simulations, 
  alpha.level, plot)
cluster <- binomial$most.likely.cluster$location.IDs.included

## plot
plot(pennLC$spatial.polygon,axes=TRUE)
plot(pennLC$spatial.polygon[cluster],add=TRUE,col="red")
title("Most Likely Cluster")

Activity - Install the surveillance package

If you are connected to the internet, try installing the surveillance package. If you are successful, look at its help file.

# Install the surveillance package

When you have completed the activity, compare your results to the solutions.

Functions

Packages are typically a collection of functions. But you can write your own. For example,

add = function(a,b) {
  return(a+b)
}
add(1,2)

or

add_vector = function(v) {
  sum = 0
  for (i in 1:length(v)) sum = sum + v[i]
  return(sum)
}

A simple outbreak detection function

Here is a simple outbreak detection function

alert = function(y,threshold=100) {
  # y is the time series 
  # an alert is issue for any y above the threshold (default is 100)
  factor(ifelse(y>threshold, "Yes", "No"))
}

Run this on our weekly GI data.

GI_w <- GI %>%
  group_by(weekD) %>%
  summarize(count = n())

GI_w$alert = alert(y = GI_w$count, threshold = 150)

ggplot(GI_w, aes(x = weekD, y = count, color = alert)) + 
  geom_point() + 
  scale_color_manual(values = c("black","red"))

R in batch

For routine analysis, it can be helpful to run R in batch mode rather than interactively. To do this, create a script and save the script with a .R extension, e.g. script.R:

# Read in the data perhaps as a csv file

# Create some table and save them to html files

# Create some figures and save them to jpegs

# Run some outbreak detection algorithms and produce figures

Now, from the command line run

Rscript script.R

Or, from within R run

source("script.R")

Now each week you can update the csv file and then just run the script which will create all new tables and figures.

Shiny apps

R can be used to create HTML applications through the shiny package. The code is written entirely in R, although you can use cascading style sheets (CSS) if you want to update how it looks. Here is an example that allows visualization of CDC data.

These apps can also be run locally, e.g.

install.packages('shiny')
library('shiny')
runGitHub('NLMichaud/WeeklyCDCPlot')


jarad/ISDSWorkshop documentation built on May 18, 2019, 2:39 p.m.