suppressPackageStartupMessages(library("foehnix"))
The "Californian" data set consists of hourly meteorological observations from station "Viejas Casino and Ranch" and station "Lucky Five Ranch" located in South California.
Viejas is located at the foot of the westerly slope of the Sierra Nevada mountain range and exhibits strong easterly winds during downslope wind situations. The Lucky Five Ranch is located northeast of and provides information about the upstream air mass for the classification algorithm.
library("leaflet") library("sp") stations <- data.frame(lon = c(-116.70437, -116.528), lat = c(32.84559, 32.9331), alt = c(715, 1445), name = c("Viejas", "Lucky Five Ranch"), stringsAsFactors = FALSE) b <- list(x0 = min(stations$lon) - .2, x1 = max(stations$lon) + .2, y0 = max(stations$lat) + .2, y1 = max(stations$lat) + .2) m <- leaflet(width = "100%") %>% addTiles() %>% fitBounds(b$x0, b$y0, b$x1, b$x2) m <- setView(m, mean(stations$lon), mean(stations$lat), zoom = 10.5) for (i in 1:nrow(stations)) { m <- addPopups(m, stations$lon[i], stations$lat[i], sprintf("Station %s, %d m a.m.s.l.", stations$name[i], stations$alt[i])) } m <- addProviderTiles(m, "OpenTopoMap") m
demodata("california")
returns a data set which
combines hourly meteorological observations of both sites (Viejas; Lucky Five).
In addition, the potential temperature difference between the two stations
is calculated by reducing the dry air temperature from "Lucky Five Ranch"
to the height of "Viejas" (dry adiabatic lapse rate of 1K per 100m;
stored on diff_temp
).
For details see demodata
.
data <- demodata("california") head(data, n = 3) # Check if our object is a numeric zoo object: c("is.zoo" = is.zoo(data), "is.numeric" = is.numeric(data))
Missing values in the
data set (NA
) are allowed and will be properly handled by all functions.
One restriction is that the time series object has to be regular (but not
strictly regular). "Regular" means that the time steps have to be divisible
by the smallest time step, "strictly regular" means that we have no missing
observations (if our smallest time interval is 1 hour observations have
to be available every hour have to be available to be strictly regular).
The foehnix
will inflate the data set and make
it strictly regular, if needed.
c("is regular" = is.regular(data), "is strictly regular" = is.regular(data, strict = TRUE))
After preparing the data set (regular or strictly regular zoo object with
numeric` values) we can investigate the observed wind information.
par(mfrow = c(1,2)) # Observed wind speed/wind direction "Viejas" windrose(data, ddvar = "wind_direction", ffvar = "wind_speed", main = "Windrose\nViejas Casino and Resort", breaks = seq(0, 22, by = 2)) # Observed wind speed/wind direction "Lucky Five" windrose(data, ddvar = "crest_wind_direction", ffvar = "crest_wind_speed", main = "Windrose\nLucky Five Ranch", hue = 270, breaks = seq(0, 22, by = 2))
Given the plots above we define the foehn
wind direction
at Viejas between 305 and 160 degrees (a r 360 - (305 - 160)
degree
wind sector centered northeast). This wind sector can be chosen
rather wide, but should leave out non-foehn wind directions to exclude
upslope winds. The wind sector(s) can be added to the
windrose
as follows:
# Windrose plot with custom variable names (ddvar, ffvar), # title, breaks, polygon borders, and the wind sector from above # with a custom color. windrose(data, ddvar = "wind_direction", ffvar = "wind_speed", main = "Windrose Viejas with Custom foehn Wind Sector", breaks = seq(0, 16, by = 2), windsector = list(c(305, 160)), windsector.col = "#DFEFF6", border = "gray50", lwd = .5)
The windsector
is solely used for visual justification, the same restriction will
be used in the following step when estimating the foehnix
classification model.
The next step (the core feature of this package) is to estimate the two-component mixture model for foehn classification. The following model assumptions are used here:
diff_temp
(potential temperature difference) is used as
the main covariate to separate 'foehn' from 'no foehn' events.wind_speed
(wind speed at target station Viejas).wind_direction
at station Viejas has to lie within
305 and 160 degrees (northeasterly wind direction; see above).switch = TRUE
as high diff_temp
indicate stable stratification (no foehn).# Estimate the foehnix classification model mod <- foehnix(diff_temp ~ wind_speed, data = data, switch = TRUE, filter = list(wind_direction = c(305, 160)))
# Model summary summary(mod, detailed = TRUE)
The full data set contains $N = r nrow(mod$data)
$ rows, $r nrow(data)
$ from
the data set itself (data
) and $r mod$inflated
$ due to inflation used to make the
time series object strictly regular.
Thereof, $r length(mod$filter_obj$ugly)
$ are not considered
due to missing data, $r length(mod$filter_obj$bad)
$ as they do not fulfil the filter constraint
(wind_direction
outside defined wind sector), wherefore the final
model is based on $r length(mod$filter_obj$good)
$ observations (or rows).
One good indication whether the model well separates the two clusters
is the "Cluster separation" output (summary
)
or the posterior probability plot:
# Cluster separation (summary) summary(mod)$separation
The separation
matrix shows the prior probabilities,
the size (number of observations assigned to each component; posterior probability),
number of probabilities exceeding a threshold (default eps = 1e-4
), and the
ratio between the latter two. Ratios close to $1.0$ indicate that the two clusters
are well separated ($ratio > 0.5$ are already good for this application).
Another indication is the which = "posterior"
plot which shows the empirical
histogram of estimated probabilities (for within-windsector observations).
Point masses around $0.0$ and $1.0$ indicate that we have two well separated
clusters (the probability to fall in one of the clusters is always close to
either $0$ or $1$).
plot(mod, which = "posterior", breaks = seq(0, 1, by = 0.05))
The following parameters are estimated for the two r mod$control$family$name
clusters:
r round(coef(mod)["mu1"], 2)
$,
$\sigma_1 = r round(coef(mod)["sigma1"], 2)
$ (parameter scale)r round(coef(mod)["mu2"], 2)
$,
$\sigma_2 = r round(coef(mod)["sigma2"], 2)
$ (parameter scale)r ifelse(sign(coef(mod)["wind_speed"]), "positive", "negative")
wind_speed
effect, r sprintf("%+.1f", 100 * exp(coef(mod)["wind_speed"]) - 100)
percent per
$m~s^{-1}$coef(mod)
A foehnix
object comes with generic plots for graphical model
assessment.
The following figure shows the 'log-likelihood contribution' of
The abscissa shows (by default) the logarithm of the iterations during optimization.
# Log-likelihood contribution plot(mod, which = "loglikcontribution")
In addition, the coefficient paths during optimization can be visualized:
# Coefficient path plot(mod, which = 3L)
The left plot shows the parameters of the two components ($\mu_1$, $\log(\sigma_1)$, $\mu_2$, $\log(\sigma_2)$), the right one the standardized coefficients of the concomitant model.
Last but not least a histogram with the two clusters is plotted.
which = "hist"
creates an empirical density histogram separating "no foehn"
and "foehn" events adding the estimated distribution for these two clusters.
devtools::load_all("..") plot(mod, which = "hist")
The Californian demo data set has non-standard variable names (by purpose).
Thus, when calling tsplot
(time series plot) we do have to manually specify
these names.
# Some smaller quality issues in the data (should not be a big deal) start <- as.POSIXct("2012-03-01") end <- as.POSIXct("2012-03-12") # As we dont have the standard names: re-specify variable names. # In addition, use 'style = "advanced"' to show more details. tsplot(mod, style = "advanced", diff_t = "diff_temp", rh = "relative_humidity", t = "air_temp", crest_t = "crest_air_temp", dd = "wind_direction", crest_dd = "crest_wind_direction", ff = "wind_speed", crest_ff = "crest_wind_speed", ffx = "wind_gust", crest_ffx = "crest_wind_gust", windsector = list(wind_direction = c(305, 160)), start = start, end = end)
devtools::load_all("..") windrose(mod, dd = "wind_direction", ff = "wind_speed", type = "hist", which = c("foehn", "nofoehn"), windsector = list(c(305, 160)), breaks = seq(0, 16, by = 2))
# Default image plot image(mod)
Customized plot which shows the "foehn frequency" for the interesting time period from August to April with custom colors and additional contour lines and custom aggregation period (two-weeks, 3-hourly).
# Customizing image plot image(mod, deltad = 14L, deltat = 3*3600, contours = TRUE, contour.col = "white", lwd = 2, labcex = 1.5, col = colorspace::sequential_hcl(51, "Purple-Yellow", rev = TRUE), xlim = c(212, 119), zlim = c(0, 0.4))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.