README.md

wunderscraper

A package for sampling weather stations via Wunderground

overview

Wunderscraper helps tap and organize a wealth of real-time weather data from Wunderground. The real-time nature of Wunderground's vast network of weather stations must be sampled; it is impossible to collect data from all the stations all the time. Wunderscraper provides flexible spatial and temporal sampling to efficiently build a representation of weather at hyper local scales.

installation

devtools::install_github('cmarmstrong/wunderscraper')

sampling

Sampling is a method for constructing a representation of a population. At the heart of sampling theory is independence; sampling one unit shouldn't change the probability of sampling another. Spatial sampling is especially challenging because units are not independent. Measurements at one weather station will be correlated with those at nearby stations. One way to preserve spatial independence is to partition space into units that are independent, and draw a representation from each partition.

Sampling methods offer a couple of basic tools for preserving independence and focusing on a population of interest. Multistage sampling is the primary tool for partitioning a population into independent units. The initial stages draw samples from a large unit, like regions or states, and later stages draw samples from smaller units nested within the larger ones, eg counties or zip codes. Stratified sampling is a tool for ensuring sub-populations recieve adequate coverage. Stratified sampling repeats a sample stage for each sub-population. Stratified sampling is useful for evenly covering sub-populations, or for oversampling a particularly small sub-population. See the examples in the next section for more details.

features

library(wunderscraper)
schedulerMMDD <- scheduler()
setApiKey(f="apikey.txt")
## sample 1 county and collect all weather stations.  Will keep only stations
## within the county administrative boundary, as determined from tigris
scrape(schedulerMMDD, c("GEOID", "ZCTA5", "id"), size=1)
data(zctaRel)
## monitor a tri-state area
triState <- zctaRel[zctaRel $STATEFP %in% c("09", "34", "36"), ]
repeat scrape(schedulerMMDD, c("STATEFP", "GEOID", "ZCTA5"), size=c(1, 10, 1, 10),
              sampleFrame=triState)
## monitor a tri-state, stratified by state to ensure complete coverage each sample
repeat scrape(schedulerMMDD, c("GEOID", "ZCTA5"), size=c(10, 1, 10), strata=rep("STATEFP", 3),
              sampleFrame=triState)
## monitor a tri-state area with two hour period
plan(schedulerMMDD, '2 hours')
repeat {
    scrape(schedulerMMDD, c("GEOID", "ZCTA5"), size=c(10, 1, 10), strata=rep("STATEFP", 3),
           sampleFrame=triState)
    sync(schedulerMMDD)
}
## sample stations at a resolution of 0.01 degrees, one station per grid of resolution
scrape(schedulerMMDD, c("GEOID", "ZCTA5"), size=c(10, 1, 1), strata=c(NA, NA, "GRID"),
       cellsize=c(NA, 0.01))
?scrape


Try the wunderscraper package in your browser

Any scripts or data that you put into this service are public.

wunderscraper documentation built on May 2, 2019, 12:40 p.m.