knitr::opts_chunk$set(
  message = FALSE,
  warning = FALSE,
  collapse = TRUE,
  comment = "#>"
)

oceancolouR contains a collection of functions used in various projects in the BIO Remote Sensing Group, including functions to work with the level-3 binned ocean colour data downloaded from NASA, calculate chlorophyll-a using different algorithms, and some extra functions to simplify formatting and statistics. For example (click to expand):

Chlorophyll-a algorithms gsm(): R implementation of the GSM algorithm, with traditional coefficients or coefficients calculated with get_gs() ocx(): OCX algorithm for MODIS-Aqua, VIIRS-SNPP and SeaWiFS. OCX coefficients can be optimized with optimize_ocx_coefs(). This can also be used to calculate POLY4 chla (see examples below) ci(), hu(), oci(): Color Index, Hu, and OCI (OCx-Hu combination) algorithms qaa(): QAA algorithm (v6) to calculate chlorophyll-a as well as phytoplankton absorption coefficients * eof_chl(): EOF (empirical orthogonal function) method to calculate chlorophyll-a

Working with L3b files get_bins(): retrieve the bin numbers with corresponding longitude, latitude and bathymetry for the Pan-Canadian grid and sub-regions gen_nrows(): given a spatial resolution, get the number of rows on the global binned grid (see ?gen_nrows for a list of spatial resolution codes)
gen_start_bin(): given a spatial resolution, generate bin numbers at the start of the row on the global grid
binlatlon(): given a spatial resolution and lat/lon boundaries, generate a dataframe containing bin number and corresponding latitude and longitude
gen_bin_grid(): given a spatial resolution and lat/lon boundaries, generate a 2d regular map grid of bin numbers
var_to_rast(): given a dataframe with column1=bin number and column2=variable, plot the variable on a raster map
plot_pancan(): given a vector of data from a panCanadian L3b data file, plot the variable on a raster map with coastlines
read_h5_L3b(): read the contents of a NASA Level-3 binned file in h5 format
* get_closest_bins(): given a dataframe of latitudes and longitudes, find the closest bin numbers (used in L3b/insitu matchups)

Math/stats functions geoMean() and geoSD(): calculate geometric mean and geometric standard deviation avg_columns(): condense a matrix by averaging selected columns
shifted_gaussian(): given input parameters and an independent vector, calculate the corresponding points along a shifted gaussian curve (used in phytoplankton bloom fitting)
find_line(): given the (x,y) coordinates of 2 points, calculate slope and intercept of the line through them
shift_line(): shift a slanted line "up" or "down" a specified distance in the direction perpendicular to the line
add_matrices(): add matrices together, using na.rm argument on each individual cell
plus_minus(): get a vector of numbers from (x-n) to (x+n)
pos_angle(): convert an angle to its equivalent angle between 0 and 2pi radians
filtered_mean(): given a vector, calculate the filtered mean, standard deviation, and coefficient of variation
set_limits(): given a vector and a lower and upper limit, convert values beyond those limits to the nearest limit

Other all_lambda: returns wavebands for commonly used ocean colour sensors sparkle_fill(): fills small holes in a raster or matrix
week8(), week8_date(): convert to/from standard 8-day week number and date days_vector(): list julian days for a given year/month
read_pixEx(): load output file of SNAP pixEx pixel extraction tool as data.frame
haversine(): calculate haversine distance between two points
* plot_swath(): quickly plot the boundary of a satellite swath (e.g. a level-2 satellite file containing the 2D matrix of a single scan from a satellite sensor) to see where it's located on the global map

Disclaimer

The scripts for ci(), hu(), ocx(), and oci() were written to attempt to copy the products in NASA OBPG processing, i.e. the chlor_a and chl_ocx variables in their satellite files. Though the code has been reviewed multiple times to ensure that the calculations, boundaries, and checks are the same, the output has only been compared to the variables in one L2 image for SeaWiFS (S2006116164706.L2_GAC_OC.nc) and one for MODIS-Aqua (A2020103163000.L2_LAC_OC.nc). For these images, passing the Rrs through the ocx() and oci() functions returns results nearly identical to chl_ocx and chlor_a respectively, aside from negligible differences due to precision in R vs C++. The scripts for NASA's processing can be found here.

ocx() and gsm() functions were used to produce the results of the algorithms (POLY 1-4, GSM_GC, GSM_GCGS, and GSM_GS) in Clay et al 2019.

eof_chl() is the code written by Julien Laliberte, taken from here, which was used to produce the results of Laliberte et al 2018.

qaa() has not been validated in the NWA region yet.

How to install

First you need to install the package dependencies:

#install.packages(c("lubridate", "raster", "hdf5r", "latticeExtra", "sp",
#                   "scales", "lmodel2", "stringr", "boot",
#                   "dplyr", "tidyr", "ggplot2", "patchwork", "geodist"))

Then install oceancolouR using one of the commands below, and restart R:

#remotes::install_github("BIO-RSG/oceancolouR", build_vignettes = TRUE)
#devtools::install_github("BIO-RSG/oceancolouR", build_vignettes = TRUE)

Now load the package:

library(oceancolouR)

# and some extras for plotting
library(ggplot2)
library(patchwork)

L3BIN Data

At level 3, pixels are assigned to a static location on the earth surface ("bin"). Bins are approximately equal area and are based on a sinusoidal projection. More info here.

Functions described here are as follows:

get_bins()

Included are numeric vectors which give bin numbers and their corresponding latitudes, longitudes, and bathymetry values for NASA level-3 daily binned images at 4km and 9km resolution. These cover the Pan-Canadian grid and selected subregions (Northwest Atlantic, Northeast Pacific, and the Gulf of Saint Lawrence) with approximate latitude and longitude boundaries:

* Pan-Canadian grid ("pancan"):        -147 to -41, 39 to 86  
* Northwest Atlantic ("nwa"):       -95 to -42, 39 to 82  
* Northeast Pacific ("nep"):        -140 to -122, 46 to 60  
* Gulf of Saint Lawrence ("gosl"):   -75 to -49, 41 to 53

These datasets can be loaded using the get_bins() function where:

For example:

bindata <- get_bins() returns bin numbers with corresponding latitude, longitude and bathymetry for the Pan-Canadian grid at 4km resolution.

bindata <- get_bins(region = "nwa", resolution = "9km", variables = c("bin","bathymetry")) returns the bin numbers and corresponding bathymetry for the Northwest Atlantic at 9km resolution.

read_h5_l3b()

For reading NASA L3b files, you can use the read_h5_l3b() function. L3b files are in hierarchical netCDF format, which can't be read into R using ncdf4 and similar packages, but can be read with the hdf5r package. read_h5_l3b() is simply a wrapper function to return a list containing file contents, variable data, and bin data.

binlatlon()

As mentioned above, the bin numbers, coordinates, and bathymetry for Canada and surrounding regions are provided in premade vectors that can be accessed by get_bins(). Due to the size of the global grid, only Canadian water bins are provided. For regions of interest in other areas, you can use binlatlon() to generate a dataframe of bins and corresponding coordinates for a given spatial resolution within your region boundaries. These can be joined to the data values for each bin in your level 3 data file, and used for statistics, modelling, etc.

A couple of things to note:

var_to_rast() and gen_bin_grid()

As binned data are not on a regular grid, the functions var_to_rast() and gen_bin_grid() are provided to assign binned data to a regular grid FOR VISUALIZATION ONLY.

gen_bin_grid() generates a regular grid of bin numbers for a region by repeating each bin in a row approximately the same number of times in order to fill all the pixels of the global grid in that row (e.g. if a row has 3000 bins and the global grid has 9000 columns, then each bin will be repeated 3 times). A vector of longitudes across the whole grid (e.g. 9000 pixels in this example) is used to subset the grid to columns within the selected longitude bounds.

var_to_rast() takes a dataframe containing a column of bin numbers and data values, and assigns the values to the grid from gen_bin_grid() based on bin number.

As with binlatlon(), gen_bin_grid() assign coordinates to the center of the bin.

WARNING:

binlatlon() and gen_bin_grid() do NOT return an identical set of unique bins if they're subsetted by longitude (subsetting by latitude is not affected). This is due to the irregular horizontal spacing of bins. After subsetting by longitude, the "edges" of the binlatlon() grid are irregular (see figure below, left panel). When gen_bin_grid() repeats the bins across a row and crops the image at a column of pixels, the irregular bin spacing leads to extra bin numbers at the edge in some rows, or losing bins at the edges of other rows.

knitr::include_graphics("binlatlon_vs_genbingrid_boundaries.png")

This figure shows the same set of bins at 1-degree resolution. On the left, a set of bins (outlined in blue) from binlatlon() are plotted on a map with a sinusoidal projection, and the dashed orange lines show the boundaries of the region. On the right, the same bins are plotted as they're generated by gen_bin_grid(), with each pixel outlined in blue. The dashed orange vertical lines in the right panel show where gen_bin_grid() crops the image. Transparent pixels represent the area containing bins that are generated by both binlatlon() and gen_bin_grid(). Filled pink pixels represent the outermost bin in each row for the selected region, which is retrieved by gen_bin_grid() but NOT by binlatlon() since the center of the bin is outside the longitude bounds. Note that as you go farther north, there are fewer bins per row and so each bin is repeated across more pixels in order to fill the grid horizontally. Black pixels in the lower right corner represent bins retrieved by binlatlon() but NOT by gen_bin_grid().

Other Info

num_pix quickly returns the number of bins (/pixels) in each subregion, and lon_bounds and lat_bounds return the corresponding longitude and latitude bounds.

Or, using do.call and cbind to simplify the output:

do.call(cbind, num_pix)
do.call(cbind, lon_bounds)
do.call(cbind, lat_bounds)

Other reference info includes all_lambda, a list of wavebands (in nanometres) for commonly-used ocean colour sensors:

str(all_lambda)

Examples

Chlorophyll algorithms

Below are some examples using Rrs (remote sensing reflectances) at specific wavebands to calculate chlorophyll-a using the different functions available in the package. First, we'll define example in situ chlorophyll and satellite Rrs data to use:

# Some in situ chl / SeaWiFS Rrs data used in Clay et al (2019)
input <- matrix(c(0.166,0.00199,0.00332,0.00513,0.00397,0.00229,0.00083,0.102,
                  0.00265,0.00396,0.00468,0.00372,0.00233,0.00023,6.229,0.01333,
                  0.01128,0.00843,0.00824,0.00724,0.0023,5.391,0.00738,0.00696,
                  0.00623,0.00635,0.00726,0.002,0.592,0.00574,0.00636,0.00633,
                  0.00489,0.00307,0.00012,0.097,0.00545,0.0064,0.0064,0.00487,
                  0.00297,7e-05,0.197,0.00874,0.00969,0.0093,0.00666,0.00386,
                  0.00051,0.303,0.00722,0.00857,0.00881,0.00671,0.004,0.00055,
                  2.041,0.00207,0.00166,0.00434,0.00408,0.00485,0.00206,0.331,
                  0.00504,0.00647,0.00691,0.00602,0.00394,1e-04,0.508,0.00788,
                  0.00803,0.00777,0.00688,0.00505,0.00091,0.128,0.00574,0.0069,
                  0.00676,0.00507,0.00299,0.00034,13.208,0.00087,0.00151,0.00192,
                  0.00231,0.00293,0.00086,3.035,0.00021,0.00154,0.00271,0.00266,
                  0.00257,0.00043,0.542,0.00255,0.00366,0.00436,0.00356,0.00236,
                  2e-04,3.309,0.00632,0.00658,0.00626,0.00569,0.00489,0.00137,
                  11.693,0.01883,0.01728,0.01259,0.01236,0.00972,0.00425,5.901,
                  0.00207,0.00247,0.00306,0.00292,0.00306,0.00066,0.198,0.00077,
                  0.00329,0.0049,0.00458,0.00352,0.00154,0.149,0.0063,0.00719,
                  0.00706,0.00573,0.00355,0.00038,0.262,0.00697,0.00754,0.00707,
                  0.006,0.00386,0.00067,3.239,0.0082,0.00761,0.00681,0.00616,
                  0.00503,0.0015,7.049,0.00861,0.00772,0.00669,0.00673,0.00621,
                  0.00212,6.35,0.00123,0.00162,0.00226,0.0022,0.00234,0.00048,
                  5.881,0.00075,0.00154,0.00224,0.00218,0.00245,0.00053,0.824,
                  0.00967,0.01074,0.00733,0.00861,0.00624,0.00312,0.538,0.01484,
                  0.01257,0.01033,0.00939,0.0068,0.00198,0.237,0.00318,0.00469,
                  0.00528,0.00387,0.00228,0.00119), ncol=7, byrow=TRUE)
colnames(input) <- c("in_situ_chl","Rrs_412","Rrs_443","Rrs_490","Rrs_510","Rrs_555","Rrs_670")
chl_insitu <- input[,1]
rrs <- input[,2:ncol(input)]
head(input)

OCx

This is the standard band ratio chlorophyll algorithm, which requires tuned coefficients and blue/green wavebands based on sensor (modisaqua, seawifs, or viirssnpp). The argument use_443nm indicates whether the 443nm band should be used in the band ratio (in Clay et al 2019, it was found to increase algorithm error). For the standard OCx algorithm, this band is still in use, so we'll set use_443nm=TRUE.

use_443nm <- TRUE
coefficients <- get_ocx_coefs("seawifs", region="global", alg="ocx")
wavebands <- get_ocx_bands("seawifs", use_443nm = use_443nm)
# calculate the band ratio
br <- get_br(rrs=rrs, blues=wavebands$blues, green=wavebands$green, use_443nm=use_443nm)$rrs_ocx
# calculate ocx chl for each data point
chl_ocx <- ocx(rrs=rrs, blues=wavebands$blues, green=wavebands$green, coefs=coefficients)
str(chl_ocx)

get_ocx_coefs() contains coefficients that have been optimized for the global ocean, and also the coefficients that were optimized for the NWA (Northwest Atlantic) and NEP (Northeast Pacific). You can also optimize the coefficients for your own region by using the optimize_ocx_coefs() function if you have a dataset of in situ chlorophyll and corresponding satellite Rrs values. This function will attempt to force the calculated satellite chl-a onto a 1:1 line with the in situ chla and return the optimal coefficients.
WARNING: This does NOT separate the input data into training and test sets - that must be done manually by the user.

# what degree polynomial should be used for the algorithm? (for example, OC3M,
# used for MODIS-Aqua, is degree 4)
alg_degree <- 4
# add in situ chlorophyll and the band ratio calculated above to a dataframe
df <- data.frame(x=log10(chl_insitu), y=log10(br), stringsAsFactors = FALSE)
# get the optimal coefficients for this dataset only (they will not necessarily
# be optimal for any other dataset since this dataset is so small and the
# coefficients are not tuned using training and test sets)
best_alg_coefs <- optimize_ocx_coefs(data = df, alg_degree = alg_degree)
# calculate ocx with the optimal coefficients
chl_ocx_new <- ocx(rrs=rrs, blues=wavebands$blues, green=wavebands$green, coefs=best_alg_coefs)
# plot the results
plotfun <- function(df, y) {
  tmp_lm <- lmodel2::lmodel2(log10(y) ~ log10(x), data=df)$regression.results[3,2:3]
  ggplot(df, aes(x=x,y=y)) +
    geom_point(alpha=0.6, color="red", size=3) +
    geom_abline(slope=1,intercept=0) +
    scale_x_log10(limits=c(0.05,30)) +
    scale_y_log10(limits=c(0.05,30)) +
    theme_bw() +
    labs(x = "In situ chl", y = y) +
    geom_abline(intercept=tmp_lm$Intercept, slope=tmp_lm$Slope, linetype="dashed", color="red")
}
plotfun(data.frame(x=chl_insitu, y=chl_ocx), y="Old OCx chl") +
  plotfun(data.frame(x=chl_insitu, y=chl_ocx_new), y="New OCx chl")

Here's an example using a RasterStack instead of a matrix, where each layer of the stack contains the pixels from a different waveband. This can also be done with hu(), oci(), and eof_chl().

library(raster)
rrs_stack <- stack(lapply(1:ncol(rrs), function(i) raster(matrix(rrs[,i], nrow=7))))
names(rrs_stack) <- colnames(rrs)
chl_raster <- ocx(rrs=rrs_stack, blues=wavebands$blues, green=wavebands$green, coefs=coefficients)
spplot(chl_raster, par.settings=list(layout.heights=list(top.padding=0,bottom.padding=0),
                                     layout.widths=list(left.padding=0,right.padding=0)))

Hu

The Hu algorithm is simple to use, but only works for chlorophyll < ~0.3 mg m-3 (see plot below), and is usually blended with OCx for values 0.15-0.2 (see OCI algorithm below). The wavebands that you must use depend only on the sensor (modisaqua, seawifs, or viirssnpp), and the coefficients only depend on the version you want to use (version 1 or 2, see ?hu for references). Here we'll use another piece of a matchup dataset for MODIS-Aqua made up of smaller concentrations:

chl_insitu_hu <- c(0.085,0.09,0.1,0.106,0.107,0.122,0.122,0.126,0.128,0.133,0.134,0.143,0.153,0.166,0.166,0.173,0.175,0.182,0.189,0.194,0.197,0.198,0.199,0.208,0.225,0.236,0.249,0.254,0.254,0.258,0.259,0.262,0.266,0.274,0.276,0.28,0.283,0.305,0.311,0.314,0.314,0.318,0.326,0.327,0.34,0.355,0.404,0.405,0.413,0.423,0.435,0.46,0.473,0.493,0.53,0.66,0.66,0.672,0.691,0.726,0.752,0.803,0.803,0.806,0.817,0.829,0.889,0.895,0.897,0.976,1.004,1.006,1.036,1.161,1.219,1.252,1.253,1.835,2.075,2.551,4.117,5.297)
rrs_hu <- matrix(c(0.01042,0.01143,0.00645,0.01166,0.00751,0.00482,0.01019,0.01377,0.00759,0.00724,0.00627,0.00987,0.0101,0.0076,0.00818,0.00637,0.0116,0.00718,0.0091,0.00814,0.0158,0.00721,0.0137,0.00832,0.00442,0.00858,0.01847,0.01217,0.01029,0.00905,0.006,0.01071,0.01411,0.00882,0.00682,0.0096,0.0103,0.00565,0.01648,0.01592,0.01125,0.00791,0.01568,0.01148,0.00789,0.01236,0.00632,0.01144,0.01533,0.0119,0.01201,0.02022,0.00814,0.00768,0.00603,0.00911,0.01096,0.01402,0.00946,0.01063,0.01438,0.01168,0.0117,0.01114,0.01317,0.0099,0.01077,0.00826,0.01352,0.01026,0.01035,0.01166,0.01033,0.01148,0.00933,0.01217,0.01078,0.01174,0.01237,0.00587,0.01591,0.01053,0.00343,0.00407,0.00237,0.00291,0.00323,0.00166,0.00346,0.00377,0.00321,0.00352,0.00281,0.00339,0.0039,0.00357,0.00262,0.00297,0.00358,0.00341,0.00365,0.00331,0.00498,0.00351,0.00499,0.00346,0.00195,0.00361,0.00623,0.00458,0.00348,0.00388,0.00272,0.00458,0.00485,0.00317,0.00313,0.00457,0.004,0.00244,0.00487,0.00597,0.00487,0.00375,0.00578,0.0047,0.00358,0.0035,0.00305,0.00475,0.00558,0.00373,0.00454,0.00624,0.00366,0.00374,0.00268,0.00441,0.00417,0.00431,0.00339,0.00388,0.00476,0.00494,0.00434,0.00476,0.0052,0.00352,0.00461,0.00329,0.00507,0.00431,0.00384,0.00494,0.00441,0.0034,0.0041,0.00477,0.0034,0.00524,0.00518,0.00242,0.00628,0.00384,0.00038,0.00051,0.00023,0.00012,0.00039,2e-05,0.00028,0.00027,0.00028,0.00048,0.00033,0.00029,0.00014,0.00039,9e-05,0.00035,0.00024,0.00031,0.00039,2e-04,0.00054,0.00037,0.00044,0.00033,1e-04,0.00048,0.00065,0.00033,0.00029,0.00053,2e-04,0.00065,0.00051,0.00033,0.00025,0.00067,0.00055,0.00015,0.00034,0.00066,0.00047,0.00038,6e-04,0.00029,0.00044,0.00019,0.00038,5e-04,0.00082,0.00019,0.00058,0.00042,0.00041,4e-04,0.00022,0.00068,0.00032,0.00045,0.00025,0.00015,0.00031,0.00034,0.00065,5e-04,0.00074,0.00048,0.00061,0.00051,0.00037,0.00048,0.00023,0.00032,0.00049,0.00049,0.00052,0.00034,0.00051,0.00031,0.00119,0.00039,0.00061,0.00022), ncol=3)
colnames(rrs_hu) <- c("Rrs_443", "Rrs_547", "Rrs_667")
wavebands <- get_ci_bands("modisaqua")
coefficients <- get_ci_coefs(version=2)
chl_hu <- hu(rrs_hu, wavebands, coefficients)
ggplot(data.frame(x=chl_insitu_hu, y=chl_hu)) +
  geom_point(aes(x=x,y=y), alpha=0.6, color="darkgreen", size=3) +
  geom_abline(slope=1,intercept=0) +
  scale_x_log10(limits=c(0.05,30)) +
  scale_y_log10(limits=c(0.05,30)) +
  theme_bw() +
  labs(x = "In situ chl", y = "Hu chl") +
  geom_vline(xintercept=0.3, color="red", linetype="dashed")

OCI

OCI is a blended algorithm using Hu for the lower concentrations (<0.15), OCx for the higher concentrations (>0.2) and a blend of the two algorithms in between. This is the algorithm currently used by NASA OBPG for the chlor_a variable. Note that we have the region in get_ocx_coefs() set to global to use the same coefficients as NASA OBPG, but this could also be set to nwa or nep to use the coefficients optimized to those regions for the OCx portion of the algorithm.

chl_oci <- oci(rrs, sensor="seawifs")$oci_chl
str(chl_oci)

POLY4

POLY4 is simply the OCx algorithm with coefficients optimized for the Northwest Atlantic or Northeast Pacific. The 443nm waveband is not used in this algorithm (see Clay et al 2019).

POLY4 can be implemented like this, using the Northwest Atlantic (NWA) as an example:

use_443nm <- FALSE
wavebands <- get_ocx_bands("seawifs", use_443nm = use_443nm)
coefficients <- get_ocx_coefs(sensor = "seawifs", region = "nwa", alg= "poly4")
chl_poly4 <- ocx(rrs, wavebands$blues, wavebands$green, coefficients, use_443nm=use_443nm)
str(chl_poly4)

GSM

This is the GSM (Garver-Siegel-Maritorena) semi-analytical algorithm, which requires Rrs just below the sea surface. Rrs can be converted from above to below-surface using rrs_above_to_below_sea_level(). This algorithm uses every available waveband for the selected sensor, and requires the proper coefficients and exponents, which depend on sensor, region, and the user's choice of a quadratic model with constant coefficients, or a model with spectrally-dependent coefficients and exponent on the second polynomial term. You must use get_gsm_IOPexps() to retrieve the tuned exponents (exponents on the IOPs chl, adg, and bbp).

GSM_GC

This is the original version of the GSM algorithm, which is in the form of a quadratic and uses the same coefficients for each waveband.

# convert rrs from above to below sea level
tmp_rrs <- rrs_above_to_below_sea_level(rrs)
# get wavebands for seawifs
lambda <- all_lambda[["SeaWiFS"]]
# run gsm to process multiple records stored in an rrs matrix, where
# rows=records and columns=wavelengths
# this gives a matrix of iops (chl, adg, bbp, and a column indicating if the result
# should be considered valid)
# NOTE: we're using the globally-tuned exponents on adg, bbp, and chl (the defaults
# in the gsm() function)
iops <- apply(X=tmp_rrs, MARGIN=1, FUN=gsm, lambda=lambda, gtype="gc")
chl_gsmgc_original <- iops[1,]
str(chl_gsmgc_original)

To use this version of the algorithm with IOP exponents tuned to the NWA, set the region to "nwa" in the gsm exps function:

tuned_exps <- get_gsm_IOPexps("seawifs", "nwa", "gc")
chl_exp <- tuned_exps[1]
adg_exp <- tuned_exps[2]
bbp_exp <- tuned_exps[3]
iops <- apply(X=tmp_rrs, MARGIN=1, FUN=gsm, lambda=lambda, adg_exp=adg_exp, bbp_exp=bbp_exp,
              chl_exp=chl_exp, gtype="gc")
chl_gsmgc_nwa <- iops[1,]
str(chl_gsmgc_nwa)
GSM_GS

In this version of the model, the coefficients are spectrally-dependent, as well as the exponent on the second term (i.e. it is not forced into a quadratic shape). Set the gtype argument in get_gsm_IOPexps() and gsm() to gs to retrieve the optimal IOP exponents that were tuned using this model (note in this example, we're also specifying that we want the exponents tuned using NWA data):

tuned_exps <- get_gsm_IOPexps("seawifs", "nwa", "gs")
chl_exp <- tuned_exps[1]
adg_exp <- tuned_exps[2]
bbp_exp <- tuned_exps[3]
iops <- apply(X=tmp_rrs, MARGIN=1, FUN=gsm, lambda=lambda, adg_exp=adg_exp, bbp_exp=bbp_exp,
              chl_exp=chl_exp, gtype="gs")
chl_gsmgs_nwa <- iops[1,]
str(chl_gsmgs_nwa)

EOF

To use this algorithm, you need a training set containing in situ chlorophyll data and corresponding satellite Rrs matchups at every waveband you want to use in the model. Ideally, the training set and the chlorophyll images that you want to create should be restricted to the same general area / time span so that the training set can fully capture the variability in that area / time.

NOTE: the training set in the example below is a subset of the training set used in the Gulf of Saint Lawrence, but the example Rrs are from outside that area in the Northwest Atlantic, so they might not work very well with this algorithm. Also, a training set should be much larger than this, but for the purpose of this example we'll just use a tiny subset.

# create training set dataframe
training <- data.frame(chla = c(0.26, 1.94, 4.33, 0.43, 1.8975, 0.635,
                                2.3025, 0.51, 0.6025, 0.245),
                       Rrs_412 = c(0.00186, 0.00182, 0.00045, 0.00165, 0.00351,
                                   0.00242, 0.00187, 0.00063, 0.00014, 0.00433),
                       Rrs_443 = c(0.00233, 0.00199, 0.00099, 0.00166, 0.00382,
                                   0.00274, 0.00201, 0.00144, 0.00072, 0.00409),
                       Rrs_490 = c(0.00283, 0.00197, 0.00162, 0.00161, 0.00352,
                                   0.00312, 0.0022, 0.00204, 0.00125, 0.0038),
                       Rrs_510 = c(0.00251, 0.00205, 0.0017, 0.00153, 0.00316,
                                   0.00316, 0.00225, 0.00206, 0.0014, 0.00334),
                       Rrs_555 = c(0.00186, 0.00189, 0.00182, 0.00126, 0.00214,
                                   0.00282, 0.00236, 0.00177, 0.00134, 0.0028),
                       Rrs_670 = c(0.00025, 0.00022, 0.00029, 6e-05, 0.00032,
                                   0.00043, 0.00063, 0.00017, 0.00014, 3e-04),
                       stringsAsFactors = FALSE)
chl_eof <- eof_chl(rrs=as.data.frame(rrs), training_set=training)
str(chl_eof)

QAA

This algorithm has not yet been thoroughly tested and tuned for use in the NWA and NEP, unlike the other algorithms.

# select wavelengths
wavebands <- all_lambda[["SeaWiFS"]]
# run qaa to process multiple records stored in an rrs matrix, where rows=records and columns=wavelengths
chl_qaa <- apply(X=rrs, MARGIN=1, FUN=qaa, lambda=wavebands)
# extract chl from the results
chl_qaa <- as.numeric(sapply(chl_qaa, "[[", 3))
str(chl_qaa)

Now a quick graph showing the differences between the algorithms for this short example dataset:

library(ggplot2)
library(patchwork)
library(dplyr)
chl_df <- data.frame(In_situ_chl = chl_insitu,
                     OCx = chl_ocx,
                     OCI = chl_oci,
                     POLY4 = chl_poly4,
                     GSMGC_global = chl_gsmgc_original,
                     GSMGC_nwa = chl_gsmgc_nwa,
                     GSMGS_nwa = chl_gsmgs_nwa,
                     EOF = chl_eof,
                     QAA = chl_qaa,
                     stringsAsFactors = FALSE) %>%
  tidyr::pivot_longer(cols = OCx:QAA, names_to = "Algorithm", values_to = "Satellite_chl")
ggplot(chl_df, aes(x=In_situ_chl, y=Satellite_chl, color=Algorithm)) +
  geom_point() +
  geom_abline(slope=1, intercept=0) +
  geom_smooth(method=lm, se=FALSE, linetype="dashed", size=0.7) +
  scale_x_log10(limits=c(0.09, 15)) +
  scale_y_log10(limits=c(0.09, 15)) +
  theme_bw()

References

OCx:
O'Reilly, John & Maritorena, S. & Mitchell, B.G. & Siegel, David & Carder, Kendall & Garver, S.A. & Kahru, Mati & Mcclain, Charles. (1998). Ocean color chlorophyll algorithms for SeaWiFS. Journal of Geophysical Research. 103. 937-953.

GSM:
Maritorena, Stephane & Siegel, David & Peterson, Alan. (2002). Optimization of a semianalytical ocean color model for global-scale application. Applied optics. 41. 2705-14. 10.1364/AO.41.002705.

QAA:
Lee, Zhongping & Carder, Kendall & Arnone, Robert. (2002). Deriving Inherent Optical Properties from Water Color: a Multiband Quasi-Analytical Algorithm for Optically Deep Waters. Applied optics. 41. 5755-72. 10.1364/AO.41.005755.

Hu, OCI:
Hu, Chuanmin & Lee, Zhongping & Franz, Bryan. (2012). Chlorophyll a algorithms for oligotrophic oceans: A novel approach based on three-band reflectance difference. Journal of Geophysical Research. 117. C01011. 10.1029/2011JC007395.

EOF:
Laliberté, Julien & Larouche, Pierre & Devred, Emmanuel & Craig, Susanne. (2018). Chlorophyll-a Concentration Retrieval in the Optically Complex Waters of the St. Lawrence Estuary and Gulf Using Principal Component Analysis. Remote Sensing. 10. 10.3390/rs10020265.

Regional tuning for OCx (POLY4), GSM (GSM_GC, GSM_GS):
Clay, S.; Pena, A.; DeTracey, B.; Devred, E. Evaluation of Satellite-Based Algorithms to Retrieve Chlorophyll-a Concentration in the Canadian Atlantic and Pacific Oceans. Remote Sens. 2019, 11, 2609.

Integerized Sinusoidal Binning Scheme:
https://oceancolor.gsfc.nasa.gov/docs/format/l3bins/



Examples using other functions

[Coming soon]

# Testing `sparkle_fill()`:

# library(oceancolouR)
# library(raster)
# library(sp)
# data("nwa_lats_4km")
# data("nwa_lons_4km")
# data("nwa_bath_4km")
# df = data.frame(lon=nwa_lons_4km,
#                 lat=nwa_lats_4km,
#                 bathy=nwa_bath_4km, stringsAsFactors = FALSE)
# coordinates(df) = ~lon+lat
# 
# xres2 <- 0.04
# yres2 <- (2/3) * xres2
# rast2 <- raster(ext=extent(df), resolution = c(xres2,yres2))
# rast2 <- rasterize(df, rast2, df$bathy, fun = mean, na.rm = T)
# test2_bilinear = sparkle_fill(x=rast2, min_sides=8, fun="bilinear")
# test2_mean = sparkle_fill(x=rast2, min_sides=8, fun="mean")
# test2_median = sparkle_fill(x=rast2, min_sides=8, fun="median")
# rast2_stack = stack(rast2, test2_bilinear, test2_mean, test2_median)
# names(rast2_stack) <- c("rast2", "bilinear", "mean", "median")
# spplot(rast2_stack)


BIO-RSG/oceancolouR documentation built on April 30, 2024, 7:54 a.m.