knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE, out.width = "100%" ) library(cubble) library(dplyr) library(ggplot2) library(patchwork)
One common task when working with spatio-temporal data is to match nearby sites. For example, we may want to verify the location of an old list of stations with current stations, or we may want to match the data from different data sources. In this vignette, we will introduce the spatial and temporal matching in cubble using an example on matching river level data with precipitation in Victoria, Australia.
In cubble, spatial and temporal matching are performed using the functions match_spatial()
and match_temporal()
. The match_spatial()
function calculates the spatial distance between observations in two cubble objects. Various distance measures are available (check sf::st_distance
). Analysts can specify the number of matched groups to output using the spatial_n_group
argument (default to 4 groups) and the number of matches per group using the spatial_n_group
argument (default to 1, one-to-one matching). The syntax to use match_spatial()
is:
match_spatial(<cubble_obj1>, <cubble_obj2>, ...)
The function match_temporal()
calculates the similarity between time series within spatially matched groups. Two identifiers are required: one for separating each matched group (match_id
) and one for separating the two data sources (data_id
). The argument temporal_by
uses the by
syntax from dplyr's *_join
to specify the temporal variables to match.
The similarity score between two time series is calculated using a matching function, which can be customised by the analysts. The matching function takes two time series as a list and returns a single numerical score. This allows for flexibility in using existing time series distance calculation implementation. By default, cubble implements a simple peak matching algorithm (match_peak
) that counts the number of peaks in two time series that fall within a specified temporal window. The syntax to use match_temporal()
is
match_temporal( <cubble_obj_from_match_spatial>, data_id = , match_id = , temporal_by = c("..." = "...") )
Now let's consider an example of matching water data from river gauges with precipitation. The [water level data]((http://www.bom.gov.au/metadata/catalogue/19115/ANZCW0503900528?template=full), collected by the Bureau of Meteorology, can be compared with the precipitation since rainfall can directly impact water level in river. Here is the location of available weather stations and water gauges in Victoria, Australia:
river <- cubble::river %>% mutate(type = "river") %>% rename(id = station) climate_vic <- climate_aus %>% # subset for Victoria stations filter(between(as.numeric(stringr::str_sub(id, 7, 8)), 76, 90)) %>% mutate(type = "climate") vic_map <- ozmaps::abs_ste %>% filter(NAME == "Victoria") ggplot() + geom_sf(data = vic_map) + geom_point(data = dplyr::bind_rows(river, climate_vic), aes(x = long, y = lat, color = type)) + theme_bw() + theme(legend.position = "bottom") + labs(x = "Longitude", y = "Latitude") + scale_color_brewer(palette = "Dark2")
Both climate_vic
and river
are cubble objects, and we can obtain a summary of the 10 closest pairs between them:
(res_sp <- match_spatial(climate_vic, river, spatial_n_group = 10))
The result can also be returned as cubble objects by setting the argument return_cubble = TRUE
. The output is be a list where each element is a paired cubble object. To combine all the results into a single cubble, you can use bind_rows()
. In the case when a site in the second cubble (the river
data here) is matched to two stations in the first cubble (climate_vic
here), the binding may not be successful since cubble requires unique rows in the nested form. In the summary table above, the river station 226027
is matched to more than one weather station: ASN00085072
(group 3) and ASN00085298
(group 5). Similarly, the river station 230200
is matched in group 7 and 8). In such cases, you can either deselect one pair before combining, or work with the list output with the purrr::map
syntax:
res_sp <- match_spatial(climate_vic, river, spatial_n_group = 10, return_cubble = TRUE) str(res_sp, max.level = 0) res_sp[[1]] (res_sp <- res_sp[-c(5, 8)] %>% bind_rows())
For temporal matching, we match the variable Water_course_level
from the river data to prcp
in the weather station data. The variable group
and types
identify the matching group and the two datasets:
(res_tm <- res_sp %>% match_temporal( data_id = type, match_id = group, temporal_by = c("prcp" = "Water_course_level")))
Similarly, the cubble output can be returned using the argument return_cubble = TRUE
. Here we select the four pairs with the highest number of matching peaks:
res_tm <- res_sp %>% match_temporal( data_id = type, match_id = group, temporal_by = c("prcp" = "Water_course_level"), return_cubble = TRUE) (res_tm <- res_tm %>% bind_rows() %>% filter(group %in% c(1, 7, 6, 9)))
And then we can visualise them in space or across time:
res_tm_long <- res_tm %>% face_temporal() %>% unfold(group, type) %>% group_by(group, type) %>% mutate(matched = (matched - min(matched, na.rm = TRUE))/ (max(matched, na.rm = TRUE) - min(matched, na.rm = TRUE))) vic_map <- ozmaps::abs_ste |> filter(NAME == "Victoria") p1 <-ggplot() + geom_sf(data = vic_map, fill = "grey95", color = "white") + geom_point(data = dplyr::bind_rows(river, climate_vic), aes(x = long, y = lat, color = type), alpha = 0.2, fill = 0.2) + geom_point(data = res_tm %>% as_tibble(), aes(x = long, y = lat, color = type)) + ggrepel::geom_label_repel( data = res_tm %>% filter(type == "climate") %>% as_tibble(), aes(x = long, y = lat, label = group)) + scale_color_brewer(palette = "Dark2") + theme_void() + ggplot2::theme(legend.position = "bottom") + ggplot2::labs(x = "Longitude", y = "Latitude") p2 <- res_tm_long %>% ggplot(aes(x = date, y = matched, group = type,color = type)) + geom_line() + facet_wrap(vars(group)) + scale_color_brewer(palette = "Dark2", guide = "none") + theme_bw() + labs(x= "date") + scale_x_date(date_labels = "%b") + labs(x = "Week", y = "Precipitation/ water level") (p1 | p2) + patchwork::plot_layout(guides = "collect") + plot_annotation(tag_levels = 'a')& theme(legend.position = "bottom")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.