options(htmltools.dir.version = FALSE)


# Copyright 2018 Province of British Columbia
# 
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# 
# http://www.apache.org/licenses/LICENSE-2.0
# 
# Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.

knitr::opts_chunk$set(
  collapse = TRUE,
  echo = FALSE,
  comment = "#>",
  fig.path = "graphics/prod/figs"
)

options(scipen = 10)
library(tidyhydat)
library(knitr)
library(tidyverse)
library(lubridate)
library(corrr)
library(leaflet)
library(sf)
library(mapview)

class: inverse background-image: url(https://upload.wikimedia.org/wikipedia/commons/3/3e/Clearwater_River_Wells_Gray_Park.jpg) background-size: cover

Outline

.VeryLarge[ - Common Analysis Problems - What is R and why use it? - What is tidyhydat? - Some R basics - An example of how R can help - Leveraging R and what I'm not showing you - Where to get help - Questions ]


class: inverse, center, middle

Common Analysis Problems


class: center, basic

Accessing Hydrometric Data

include_graphics("graphics/ec_data_explorer2.gif")

11 clicks!


class: basic, center

Stakeholder/Manager: "Hey, this is a really cool analysis but we need to add five stations. Can you run it again?"

--

Make it reproducible!


class: basic, center

Get off the factory line

How much time do you spend copying and pasting?

--

Automate!

--

But how...


class: inverse, left, middle

...Use R!

.pull-left[ (or more generally any programmatic code based analysis approach...) ]

Drawing


.pull-left[

What is R?

.large[ - Free and open source - Statistical programming language - Publication quality graphics - But definitely not intimidating... ] ]

-- .pull-right[

Why use R?

.large[ - Efficient - Reproducible - Scalable ] ]

--

Not guaranteed to help with this...


Questions worth asking...

.large[ - Are your methods reproducible? - What is your analysis recipe? - Can you share it? ]

Drawing

Excuse me, do you have a moment to talk about Excel?


class:basic

| R | Excel | |-------------------------------------------|--------------------------------------------------------| | Data and analysis are separate | Data and analysis are usually stored in the same place |


Drawing

.footnote[ From: http://blog.yhat.com/posts/R-for-excel-users.html. ]


class:basic

| R | Excel | |-------------------------------------------|--------------------------------------------------------| | Data and analysis are separate | Data and analysis are usually stored in the same place | | Data structure is strict | Data structure is flexible |


Drawing

.footnote[ From: http://blog.yhat.com/posts/R-for-excel-users.html. ]


class:basic

| R | Excel | |-------------------------------------------|--------------------------------------------------------| | Data and analysis are separate | Data and analysis are usually stored in the same place | | Data structure is strict | Data structure is flexible | | Operations are achieved through scripting | Operations are achieved through pointing and clicking |


Drawing

.footnote[ From: http://blog.yhat.com/posts/R-for-excel-users.html. ]


class:basic

| R | Excel | |-------------------------------------------|--------------------------------------------------------| | Data and analysis are separate | Data and analysis are usually stored in the same place | | Data structure is strict | Data structure is flexible | | Operations are achieved through scripting | Operations are achieved through pointing and clicking | | Iteration is automated | Iteration is usually done by hand |

R provides a clear pathway for efficiency and reproducibility through automation and code

.footnote[ From: http://blog.yhat.com/posts/R-for-excel-users.html. ]


class:basic

The objective of tidyhydat is to provide a standard method of accessing ECCC hydrometric data sources (historical and real time) using a consistent and easy to use interface that employs tidy data principles within the R project.

Drawing

--

tidy|hydat


hydat::Water Survey of Canada Network

stns <- hy_stations() %>% 
  filter(HYD_STATUS == "ACTIVE")

st_as_sf(stns, coords = c("LONGITUDE","LATITUDE"),
             crs = 4326,
             agr= "constant") %>%
mapview(zcol = "STATION_NAME", legend = FALSE, map.types = "Esri.WorldImagery", cex = 4,
        popup = popupTable(., zcol = c("STATION_NUMBER", "STATION_NAME", "PROV_TERR_STATE_LOC")))

#leaflet(data = stns) %>% 
#  addTiles() %>% 
#  addMarkers(~LONGITUDE, ~LATITUDE, label=~as.character(STATION_NAME), clusterOptions = markerClusterOptions()) %>% #
#  setView(-96, 63, zoom = 3)

tidy::tidy data

Tidy datasets are all alike but every messy dataset is messy in its own way1

--

Each variable forms a column

Each observation forms a row

.footnote[ [1] Wickham, Hadley. 2014. Tidy Data. Journal of Statistical Software 59 (10). Foundation for Open Access Statistics: 1–23. ]


tidy::untidy data

src <- hy_src()
tbl(src, "DLY_FLOWS") %>%
  filter(STATION_NUMBER == "08MF005") %>%
  select(-contains("_SYMBOL"), )

tidy::tidy data

hy_daily_flows(station_number = "08MF005")

tidy::tidyhydat

Drawing

--


class: inverse, center, middle

An Example


class: basiclh

tidyhydat & some basic R

=SUM(A1:A23)
=AVERAGE(A1:A23)

class: basiclh

tidyhydat & some basic R

flows_data <- hy_daily_flows(station_number = c("08MF005","09CD001","05KJ001","02KF005"))
flows_data

class: basiclh, center

Analyze the correlation between:

stns_tbl <- hy_stations(c("08MF005","09CD001","05KJ001","02KF005"))[,c("STATION_NUMBER", "STATION_NAME")]

x <- stns_tbl %>%
  rename(`Station Name`=STATION_NAME, `Station Number`=STATION_NUMBER) %>%
  knitr::kable(format = 'html') %>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"))
gsub("<thead>.*</thead>", "", x)
stns <- hy_stations(stns_tbl$STATION_NUMBER) 


st_as_sf(stns, coords = c("LONGITUDE","LATITUDE"),
             crs = 4326,
             agr= "constant") %>%
mapview(zcol = "STATION_NAME", legend = FALSE, cex = 6,
        popup = popupTable(., zcol = c("STATION_NUMBER", "STATION_NAME", "PROV_TERR_STATE_LOC")))

Build the analysis

flows_data

flows_data: object


Build the analysis

flows_data %>%
  spread(key = STATION_NUMBER, value = Value) #<<

%>%: "then"

spread: function


Build the analysis

flows_data %>%
  spread(key = STATION_NUMBER, value = Value) %>%
  select(-Date, -Symbol, -Parameter) #<<

select: function


Build the analysis

flows_data %>%
  spread(key = STATION_NUMBER, value = Value) %>%
  select(-Date, -Symbol, -Parameter) %>%
  correlate() #<<

correlation: function


Build the analysis

flows_data %>%
  spread(key = STATION_NUMBER, value = Value) %>%
  select(-Date, -Symbol, -Parameter) %>%
  correlate() %>%
  stretch() #<<

stretch: function


Scalable

stns <- hy_stations(prov_terr_state_loc = "NU") %>%
  filter(HYD_STATUS == "ACTIVE")

st_as_sf(stns, coords = c("LONGITUDE","LATITUDE"),
             crs = 4326,
             agr= "constant") %>%
mapview(zcol = "STATION_NAME", legend = FALSE, 
        popup = popupTable(., zcol = c("STATION_NUMBER", "STATION_NAME", "PROV_TERR_STATE_LOC", "HYD_STATUS")))

Scalable

stns <- hy_stations(prov_terr_state_loc = "NU") %>%
  filter(HYD_STATUS == "ACTIVE")

nu_flows <- hy_daily_flows(station_number = stns$STATION_NUMBER)
nu_flows

Scalable

nu_flows %>% #<<
  spread(STATION_NUMBER, Value) %>%
  select(-Date, -Symbol, -Parameter) %>%
  correlate() %>% 
  stretch() 

Efficient, Reproducible and Scalable

Drawing

What else is available in tidyhydat?

All tables in HYDAT

.Large[ - Instantaneous peaks - Daily, monthly and yearly temporal summaries - Discharge, level, sediment, particle size - Data ranges - Station metadata ]


What else is available in tidyhydat?

search_stn_name("fraser")

Pointing and clicking

include_graphics("graphics/wateroffice.gif")

What else is available in tidyhydat?

realtime_plot("08MF005", Parameter = "Flow")

What else is available in R?

raw_stns <- hy_stations() %>%
  select(STATION_NUMBER:PROV_TERR_STATE_LOC, DRAINAGE_AREA_GROSS)

mad_long_avg <- hy_annual_stats(raw_stns$STATION_NUMBER) %>%
  filter(Sum_stat == "MEAN", Parameter == "Flow") %>%
  group_by(STATION_NUMBER) %>%
  summarise(Value = mean(Value, na.rm = TRUE)) %>%
  right_join(raw_stns)
mad_long_avg #<<

What else is available in R?

library(ggplot2)
ggplot(mad_long_avg,aes(x = Value, y = DRAINAGE_AREA_GROSS, colour = PROV_TERR_STATE_LOC)) +
  geom_point() +
  scale_y_continuous(trans = "log10") +
  scale_x_continuous(trans = "log10") +
  scale_colour_viridis_d(name = "Jurisdiction") +
  labs(x = "Mean long term annual discharge (m^3)", y = "Gross drainage area (km^2)") +
  theme_minimal()

It can be daunting!


Resources for R

Drawing

Drawing

Drawing


Contribute to tidyhydat

Openly developed on GitHub

https://github.com/ropensci/tidyhydat

Any contribution helps. You don't have to be an R programmer!

.pull-left[ - Questions - Ideas / Feature-requests - Bugs - Bug-fixes - Development ] .pull-right[

Drawing
] --

For example...

Authors@R: c(person("Sam", "Albers",email = "sam.albers@gov.bc.ca", role = c("aut", "cre")),
    person("David", "Hutchinson", email = "david.hutchinson@canada.ca", role = "ctb"), #<<
    person("Dewey", "Dunnington", email = "dewey@fishandwhistle.net", role = "ctb"), #<<
    person("Province of British Columbia", role = "cph"))

class: inverse, center

Some Helpful Links

Installing R & RStudio with local package libraries

-https://github.com/bcgov/bcgov-data-science-resources/wiki/Installing-R-&-RStudio

Installing tidyhydat

-https://cran.rstudio.com/web/packages/tidyhydat/README.html

Getting started with tidyhydat

-https://cran.rstudio.com/web/packages/tidyhydat/vignettes/tidyhydat_an_introduction.html
-https://cran.rstudio.com/web/packages/tidyhydat/vignettes/tidyhydat_example_analysis.html

BC Gov data science resource wiki

-https://github.com/bcgov/bcgov-data-science-resources/wiki

class: basic background-image: url(https://media.giphy.com/media/TnDoEoXfT7YoE/giphy.gif) background-size: cover

Questions?

.content-box-blue[ Slides available from

-https://github.com/ropensci/tidyhydat/blob/master/presentations/tidyhydat_intro.pdf
-https://github.com/ropensci/tidyhydat/blob/master/presentations/tidyhydat_intro.Rmd

Contact sam.albers@gov.bc.ca ]



bcgov/tidyhydat documentation built on Jan. 15, 2024, 4:03 a.m.