In retostauffer/Rfoehnix: Objective Foehn Diagnosis

suppressPackageStartupMessages(library("foehnix"))

Data Set Description

The "Californian" data set consists of hourly meteorological observations from station "Viejas Casino and Ranch" and station "Lucky Five Ranch" located in South California.

Viejas is located at the foot of the westerly slope of the Sierra Nevada mountain range and exhibits strong easterly winds during downslope wind situations. The Lucky Five Ranch is located northeast of and provides information about the upstream air mass for the classification algorithm.

library("leaflet")
library("sp")

stations <- data.frame(lon  = c(-116.70437, -116.528),
                       lat  = c(32.84559, 32.9331),
                       alt  = c(715, 1445),
                       name = c("Viejas", "Lucky Five Ranch"),
                       stringsAsFactors = FALSE)

b <- list(x0 = min(stations$lon) - .2, x1 = max(stations$lon) + .2,
          y0 = max(stations$lat) + .2, y1 = max(stations$lat) + .2)

m <- leaflet(width = "100%") %>% addTiles() %>% fitBounds(b$x0, b$y0, b$x1, b$x2)
m <- setView(m, mean(stations$lon), mean(stations$lat), zoom = 10.5)
for (i in 1:nrow(stations)) {
    m <- addPopups(m, stations$lon[i], stations$lat[i],
                   sprintf("Station %s, %d m a.m.s.l.", stations$name[i],
                           stations$alt[i]))
}
m <- addProviderTiles(m, "OpenTopoMap")
m

Loading the Data Set

demodata("california") returns a data set which combines hourly meteorological observations of both sites (Viejas; Lucky Five). In addition, the potential temperature difference between the two stations is calculated by reducing the dry air temperature from "Lucky Five Ranch" to the height of "Viejas" (dry adiabatic lapse rate of 1K per 100m; stored on diff_temp). For details see demodata.

data <- demodata("california")
head(data, n = 3)
# Check if our object is a numeric zoo object:
c("is.zoo" = is.zoo(data), "is.numeric" = is.numeric(data))

Missing values in the data set (NA) are allowed and will be properly handled by all functions. One restriction is that the time series object has to be regular (but not strictly regular). "Regular" means that the time steps have to be divisible by the smallest time step, "strictly regular" means that we have no missing observations (if our smallest time interval is 1 hour observations have to be available every hour have to be available to be strictly regular). The foehnix will inflate the data set and make it strictly regular, if needed.

c("is regular" = is.regular(data),
  "is strictly regular" = is.regular(data, strict = TRUE))

After preparing the data set (regular or strictly regular zoo object withnumeric` values) we can investigate the observed wind information.

par(mfrow = c(1,2))
# Observed wind speed/wind direction "Viejas"
windrose(data, ddvar = "wind_direction", ffvar = "wind_speed",
         main = "Windrose\nViejas Casino and Resort",
         breaks = seq(0, 22, by = 2))
# Observed wind speed/wind direction "Lucky Five"
windrose(data, ddvar = "crest_wind_direction", ffvar = "crest_wind_speed",
         main = "Windrose\nLucky Five Ranch", hue = 270,
         breaks = seq(0, 22, by = 2))

Given the plots above we define the foehn wind direction at Viejas between 305 and 160 degrees (a r 360 - (305 - 160) degree wind sector centered northeast). This wind sector can be chosen rather wide, but should leave out non-foehn wind directions to exclude upslope winds. The wind sector(s) can be added to the windrose as follows:

# Windrose plot with custom variable names (ddvar, ffvar),
# title, breaks, polygon borders, and the wind sector from above
# with a custom color.
windrose(data, ddvar = "wind_direction", ffvar = "wind_speed",
         main = "Windrose Viejas with Custom foehn Wind Sector",
         breaks = seq(0, 16, by = 2),
         windsector = list(c(305, 160)),
         windsector.col = "#DFEFF6",
         border = "gray50", lwd = .5)

The windsector is solely used for visual justification, the same restriction will be used in the following step when estimating the foehnix classification model.

Estimate Classification Model

The next step (the core feature of this package) is to estimate the two-component mixture model for foehn classification. The following model assumptions are used here:

Main variable: diff_temp (potential temperature difference) is used as the main covariate to separate 'foehn' from 'no foehn' events.
Concomitant variable: wind_speed (wind speed at target station Viejas).
Wind filter: the wind_direction at station Viejas has to lie within 305 and 160 degrees (northeasterly wind direction; see above).
Option switch: switch = TRUE as high diff_temp indicate stable stratification (no foehn).

# Estimate the foehnix classification model
mod <- foehnix(diff_temp ~ wind_speed,
               data   = data,
               switch = TRUE,
               filter = list(wind_direction = c(305, 160)))

Model Summary

# Model summary
summary(mod, detailed = TRUE)

The full data set contains $N = r nrow(mod$data)$ rows, $r nrow(data)$ from the data set itself (data) and $r mod$inflated$ due to inflation used to make the time series object strictly regular.

Thereof, $r length(mod$filter_obj$ugly)$ are not considered due to missing data, $r length(mod$filter_obj$bad)$ as they do not fulfil the filter constraint (wind_direction outside defined wind sector), wherefore the final model is based on $r length(mod$filter_obj$good)$ observations (or rows).

One good indication whether the model well separates the two clusters is the "Cluster separation" output (summary) or the posterior probability plot:

# Cluster separation (summary)
summary(mod)$separation

The separation matrix shows the prior probabilities, the size (number of observations assigned to each component; posterior probability), number of probabilities exceeding a threshold (default eps = 1e-4), and the ratio between the latter two. Ratios close to $1.0$ indicate that the two clusters are well separated ($ratio > 0.5$ are already good for this application).

Another indication is the which = "posterior" plot which shows the empirical histogram of estimated probabilities (for within-windsector observations). Point masses around $0.0$ and $1.0$ indicate that we have two well separated clusters (the probability to fall in one of the clusters is always close to either $0$ or $1$).

plot(mod, which = "posterior", breaks = seq(0, 1, by = 0.05))

Model coefficients

The following parameters are estimated for the two r mod$control$family$name clusters:

No-foehn cluster: $\mu_1 = r round(coef(mod)["mu1"], 2)$, $\sigma_1 = r round(coef(mod)["sigma1"], 2)$ (parameter scale)
Foehn cluster: $\mu_2 = r round(coef(mod)["mu2"], 2)$, $\sigma_2 = r round(coef(mod)["sigma2"], 2)$ (parameter scale)
Concomitant model: r ifelse(sign(coef(mod)["wind_speed"]), "positive", "negative") wind_speed effect, r sprintf("%+.1f", 100 * exp(coef(mod)["wind_speed"]) - 100) percent per $m~s^{-1}$

coef(mod)

wzxhzdk:11

wzxhzdk:12

Graphical Model Assessment

A foehnix object comes with generic plots for graphical model assessment.

The following figure shows the 'log-likelihood contribution' of

the main component (left hand side of formula),
the concomitant model (right hand side of formula),
and the full log-likelihood sum which is maximised by the optimization algorithm.

The abscissa shows (by default) the logarithm of the iterations during optimization.

# Log-likelihood contribution
plot(mod, which = "loglikcontribution")

In addition, the coefficient paths during optimization can be visualized:

# Coefficient path
plot(mod, which = 3L)

The left plot shows the parameters of the two components ($\mu_1$, $\log(\sigma_1)$, $\mu_2$, $\log(\sigma_2)$), the right one the standardized coefficients of the concomitant model.

Last but not least a histogram with the two clusters is plotted. which = "hist"creates an empirical density histogram separating "no foehn" and "foehn" events adding the estimated distribution for these two clusters.

devtools::load_all("..")
plot(mod, which = "hist")

Time Series Plot

The Californian demo data set has non-standard variable names (by purpose). Thus, when calling tsplot (time series plot) we do have to manually specify these names.

# Some smaller quality issues in the data (should not be a big deal)
start <- as.POSIXct("2012-03-01")
end   <- as.POSIXct("2012-03-12")

# As we dont have the standard names: re-specify variable names.
# In addition, use 'style = "advanced"' to show more details.
tsplot(mod, style = "advanced", 
       diff_t = "diff_temp",     rh = "relative_humidity",
       t   = "air_temp",         crest_t   = "crest_air_temp",
       dd  = "wind_direction",   crest_dd  = "crest_wind_direction",
       ff  = "wind_speed",       crest_ff  = "crest_wind_speed",
       ffx = "wind_gust",        crest_ffx = "crest_wind_gust",
       windsector = list(wind_direction = c(305, 160)),
       start = start, end = end)

Wind Rose Plot

devtools::load_all("..")
windrose(mod, dd = "wind_direction", ff = "wind_speed",
         type = "hist", which = c("foehn", "nofoehn"),
         windsector = list(c(305, 160)),
         breaks = seq(0, 16, by = 2))

Hovmöler Diagram

# Default image plot
image(mod)

Customized plot which shows the "foehn frequency" for the interesting time period from August to April with custom colors and additional contour lines and custom aggregation period (two-weeks, 3-hourly).

# Customizing image plot
image(mod, deltad = 14L, deltat = 3*3600, contours = TRUE,
      contour.col = "white", lwd = 2, labcex = 1.5,
      col = colorspace::sequential_hcl(51, "Purple-Yellow", rev = TRUE),
      xlim = c(212, 119), zlim = c(0, 0.4))