knitr::opts_chunk$set(echo = TRUE) library(dplyr) library(raster)
We built a database of summer chlorophyll-a and sea surface temperature for the Southern Ocean summarized by month.
This consists of a table of values of SST and chlor-a in a database on ACE-ecostats.
## connect to the db dp <- "/home/acebulk/data" db <- dplyr::src_sqlite(file.path(dp, "habitat_assessment.sqlite3")) ## create a query for the table library(dplyr) (tab <- tbl(db, "chl_sst_25k_monthly"))
The tab
object is now a live connection to the summary table, and can be used to issue queries of the data.
We can inquire directly for various metrics.
## the number of total rows tab %>% tally() ## the start and end date (in days since 1970-01-01) ## convert to date format (in R after collect) epoch <- as.Date("1970-01-01") tab %>% summarize(start = min(date, na.rm = TRUE), end = max(date, na.rm = TRUE)) %>% collect(n = Inf) %>% mutate(start = start + epoch, end = end + epoch) ## calculate number of non missing values tab %>% dplyr::filter(!is.na(chla) | !is.na(sst)) %>% tally()
We can issue group by operations and let the database perform tasks, or at any time use the collect(n = Inf)
function to pull the results of a query into memory. (There are 3 columns of 64-bit doubles, and 1 32-bit integer so r 9587843 * 7 * 4/1e9
Gb of useable data. )
It is possible to re-map the data back onto its 25km grid, but we need a little bit of extra material. The grid is a Lambert Azimuthal Equal Area grid centred on the south pole. We can insert this function any that the original grid is required.
default_grid <- function() { prjj <- "+proj=laea +lat_0=-90 +datum=WGS84" raster(spex::buffer_extent(projectExtent(raster(extent(-180, 180, -90, -30), crs = "+init=epsg:4326"), prjj), 25000), res = 25000, crs = prjj) }
Now issue a query to summarize the entire data set into a grand summer mean and populate the grid.
(Be sure to collect the result of the query into memory).
gmean <- tab %>% group_by(cell) %>% summarize(sst = mean(sst, na.rm = TRUE), chla = mean(chla, na.rm = TRUE)) %>% dplyr::collect(n = Inf) library(raster) chla <- setValues(default_grid(), NA) sst <- setValues(default_grid(), NA) chla[gmean$cell] <- gmean$chla sst[gmean$cell] <- gmean$sst
Plot these to show the overall summary.
library(palr) plot(sst, col = sstPal(24)) chlpal <- chlPal(palette = TRUE) plot(chla, col = chlpal$cols, breaks = chlpal$breaks, legend = FALSE) contour(sst, add = T, levels = seq(-1, 20, by = 1.5), col = rgb(0, 0, 0, 0.6)) library(aceecostats) plot(spTransform(aes_region_simple, projection(sst)), add = T)
The script to recreate the database is https://github.com/AustralianAntarcticDivision/aceecostats/blob/master/inst/workflow/05_chl_sst_25k.R
The data files used
Missing values in SST only exist where sea ice is masked by the OISST product.
Missing values in CHLA also exist where cloud cover obscured the instrument during the entire month, on the order of 2500-7000 25km cells per month. For these we found the nearest valid value and filled it in. We used nearest-neighbour lookup with the nabor
package.
We maintain a collection of daily L3 bin 4km chlorophyll-a data files, with Johnson 2013 chlorophyll pre-calculated. The L3 bins are summarized into 25km cells by bin-centre lookup, transforming the bin centre to Lambert Azimuthal Equal area.
The SST pixels are transformed directly to the 25km cells by raster reprojection on each daily grid.
Data references, timings, size of cache, etc.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.