wxgenR - Lower Santa Cruz River Basin, AZ"
In wxgenR: A Stochastic Weather Generator with Seasonality

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

A weather generator is a numerical tool that resamples a daily time series of precipitation, temperature, and season many times, while preserving observed or projected characteristics of importance, such as the statistics of the transition between wet and dry days. The resulting large group, or ensemble, of likely rainfall and temperature time series represents a range of possible amounts, daily patterns, and seasonality. This weather generator is, to our knowledge, novel in that it includes seasons (up to 26) in training the simulation algorithm.

The goal of wxgenR is to provide users a tool that can simulate, with fidelity, an ensemble of precipitation and temperature based on training data that could include, for example, station based point measurements, grid cell values derived from models or remotely sensed data, or basin averages. The incorporation of seasonality as a covariate in the training algorithm allows users to examine the effects of shifts in seasonality due to climate warming (e.g., earlier snowmelt seasons or prolonged summer dry periods). wxgenR is an effective and robust scenario planning tool for a wide variety of potential futures.

Running `wxgenR`

All that is needed to run wxgenR is a single time series of precipitation, temperature, and season. Up to 20 seasons may be defined, but most users will likely only need two to four based on their study region. For example, wxgenR is provided with basin-average data from the Lower Santa Cruz River Basin (LSCRB) in Arizona, a monsoon dominated region with three distinct seasons. Within the data used to train the weather generator, these three seasons should be noted with an index of either 1, 2, or 3 for each day in the time series. The varying statistics of each season will impact the resulting simulations.

Tutorial

For example, using the Lower Santa Cruz River basin-average precipitation, temperature, and season from 1970 to 1999, we can generate simulated precipitation and temperature for any desired time length and ensemble size. Your variables must be named as the following: 'year', 'month', 'day', 'prcp', 'temp', 'season', whether they are input as a dataframe or a text file. All input variables must be contained within the same dataframe or text file. If inputting a text file, it must be comma separated (.csv). The weather generator can handle NA values for precipitation or temperature, but all other variables should be numeric values.

Step 1: Load your data

library(wxgenR)
library(lubridate)
library(dplyr)
library(tidyr)
library(ggpubr)
library(moments)
library(seas)

data(LowerSantaCruzRiverBasinAZ)

head(LowerSantaCruzRiverBasinAZ)

Step 2: Select your run settings and run the weather generator

Use the variables within the wx() function like syr and eyr (start and end year) to set the temporal boundaries from which to sample, otherwise, if left empty the start and end years will default to the beginning and end of your training data. Use nsim to set the length (in years) of your simulated weather, and nrealz to set the ensemble size (number of traces).

The variable wwidth will set the sampling window for each day of year (Jan. 1 through Dec. 31) for every year in the simulation. The sampling window for each day of year is +/- wwidth + 1, effectively sampling wwidth number of days before and after each day of year, including that day of year. A lower value for wwidth will sample fewer surrounding days and a higher value will sample more days, resulting in dampened and heightened variability, respectively. Typical setting of wwidth is between 1 and 15, resulting in a daily sampling window of 3 days and 31 days, respectively. Generally, higher and lower values of wwidth result in higher and lower variance, respectively, in the simulated data.

For example, to simulate precipitation on day 1 of the simulation (Jan. 1 of year 1), with wwidth = 1 (a 3-day window), the algorithm will sample days in the training record between (and including) December 31 and January 2 (for all years in the training record). For day 2 of the simulation (Jan. 2 of year 1), the algorithm will sample days in the training record between January 1 and January 3. Simulation day 3 (Jan. 3) will sample between January 2 and January 4, and so on. Increasing wwidth to 2 (a 5-day window) will sample between December 30 and January 3 for Jan. 1 simulations, December 31 to January 4 for Jan. 2, and January 1 to January 5 for Jan. 3, and so on.

In some cases, the wwidth will be automatically increased through an adaptive window width if precipitation occurred on a given day but there were less than two daily precipitation values over 0.01 inches during the window for that day. wwidth will adaptively increase by 1 until two or more daily precipitation values over 0.01 inches are in each window. Adaptive window width is most likely to occur in regions with high aridity, dry seasons, a small initial value of wwidth is used, or if the number of years in the training data is relatively short (e.g., less than 30 years). To display the results of the adaptive window width, set awinFlag = T.

Here, our training data spans 1970-01-01 to 1999-12-31, but we don't want to use the full historical record, so we set syr and eyr to 1970 and 1974, respectively, in the wx() function so that the training data is subset between those years. We want a simulation length of 5-years (nsim) in order to match the length of the subset training record and 2 traces in our ensemble (nrealz) for computational efficiency (although more traces, e.g. 50, are recommended). Sampling for each day of the year will sample from the preceding 3-days, the day of, and the following 3-days (wwidth) for a total window size of 7-days.

We may also want to increase the variability of our simulated precipitation by sampling outside the historical envelope with an Epanechnikov Kernel (ekflag = T). For more details on the Epanechnikov kernel and its use in a weather generator, see Rajagopalan et al. (1996). Setting tempPerturb = T will increase the variability of the simulated temperature by adding random noise from a normal distribution fit using a mean of zero and a standard deviation equal to the monthly standard deviation of simulated temperature residuals. Given that simulated daily temperature at time t is a function of temperature(t-1), cosine(t), sine(t), precipitation occurrence(t), and monthly mean temperature(t), the standard deviation of daily residuals from this model is calculated for each month and used to add random noise to the simulated temperature. The temperature simulation approach is inspired by- and adapted from- Verdin et al. (2015, 2018).

Since the training data has units of inches and degrees Fahrenheit for precipitation and temperature, respectively, we must set unitSystem = "U.S. Customary". Setting numbCores to 2 or greater will enable parallel computing for precipitation simulation which is the most computationally intensive aspect of the weather generator.

nsim = 5   #number of simulation years
nrealz = 2 #number of traces in ensemble

startTime <- Sys.time() #benchmark run time

z = wx(trainingData = LowerSantaCruzRiverBasinAZ,
       syr = 1970, eyr = 1974, smo = 10, emo = 9,
       nsim = nsim, nrealz = nrealz, aseed = 123,
       wwidth = c(7,1,3), unitSystem = "U.S. Customary", ekflag = TRUE,
       awinFlag = T, tempPerturb = TRUE, pcpOccFlag = FALSE,
       numbCores = 2)

endTime = Sys.time()

The wx() function will return a list containing both your input/training data, and a variety of processed outputs, named here as the variable z. Within z, dat.d is the original input data as well as some intermediary variables. simyr1 contains the years within your training data that were sampled to generate simulated values for each trace. X is the occurrence of daily precipitation for each trace, where 1 and 0 indicate the presence and absence of precipitation, respectively. Xseas is the season index for each day and trace. Xpdate shows which days from the training data were sampled for each simulated day and trace, if precipitation was simulated to occur on a given day. Xpamt is the simulated precipitation amount for each day and trace. Xtemp is the simulated temperature for each day and trace.

Generally, Xpamt and Xtemp will be of most interest to users as these are the desired outputs of simulated daily precipitation and temperature.

glimpse(z)

Step 3: Analyze simulated weather

First, use modified approach from writeSim function to post-process/format output

#parse variables from wx() output
dat.d = z$dat.d
simyr1 = z$simyr1
X = z$X
Xseas = z$Xseas
Xpdate = z$Xpdate
Xpamt = z$Xpamt
Xtemp = z$Xtemp

#write simulation output
#
it1 <- seq(1, length(X[,1]), 366)
it2 = it1+366-1

#initialize storage
sim.pcp = matrix(NA, nrow = nsim*366, ncol = nrealz+3)
sim.tmp = matrix(NA, nrow = nsim*366, ncol = nrealz+3)
sim.szn = matrix(NA, nrow = nsim*366, ncol = nrealz+3)

#loop through realization
irealz = 1
for (irealz in 1:nrealz){

  outmat <- vector()

  #loop through simulation years
  isim = 1
  for (isim in 1:nsim){
    leapflag = FALSE
    ayr = simyr1[isim, irealz]
    if (lubridate::leap_year(ayr)) leapflag = TRUE
    col1 = rep(isim, 366)        #column 1, simulation year
    d1 = ayr*10^4+01*10^2+01; d2 = ayr*10^4+12*10^2+31
    i1 = which(dat.d$date1 == d1)
    i2 = which(dat.d$date1 == d2)
    col2 = dat.d$date1[i1:i2]   #column 2, simulation date
    if (leapflag == FALSE) col2 = c(col2,NA)
    i1 = it1[isim]
    i2 = it2[isim]
    col3 = Xseas[i1:i2, irealz]  #column 3, simulation season
    col4 = X[i1:i2, irealz]      #column 4, precipitation occurrence
    col5 = Xpdate[i1:i2, irealz] #column 5, precipation resampling date
    col6 = Xpamt[i1:i2, irealz]  #column 6, resampled precipitation amount
    col7 = Xtemp[i1:i2, irealz]  #column7, simulated temperature

    #create time series of 'simulation day'
    sim.yr = rep(isim, length(col2))
    sim.month = month(ymd(col2))
    sim.day = day(ymd(col2))

    outmat = rbind(outmat, cbind(sim.yr, sim.month, sim.day, col6, col7, col3))
  } #isim

  colnames(outmat) = c("simulation year", "month", "day", "prcp", "temp", "season")

  if(irealz == 1){
    sim.pcp[,1:3] = outmat[,1:3]
    sim.tmp[,1:3] = outmat[,1:3]
    sim.szn[,1:3] = outmat[,1:3]
  }

    sim.pcp[,irealz+3] = outmat[,4]
    sim.tmp[,irealz+3] = outmat[,5]
    sim.szn[,irealz+3] = outmat[,6]

} #irealz

Format dataframes for simulated precip, temperature, and season

# df = sim.pcp

formatting = function(df){

  df = as.data.frame(df)

  colnames(df) = c("simulation year", "month", "day", paste0("Trace_", 1:nrealz))

  #remove 366 days for non-leap years
  df = drop_na(df, c(month, day))

  #assign simulation year to start at the same time as training data
  df$`simulation year` = df$`simulation year` + dat.d$year[1] - 1    

  #format date
  df$Date = ymd(paste(df$`simulation year`, df$month, df$day, sep = "-"))

  #remove years that aren't leap years
  # df = drop_na(df, Date)

  df = df %>%
    mutate(yday = as.numeric(yday(Date)),
           week = as.numeric(week(Date))) %>%
    relocate(c(Date,yday,week), .after = day) %>%
    pivot_longer(cols = starts_with("Trace_"), 
                 names_to = "variable", 
                 values_to = "value")
  return(df)
}

sim.pcp = formatting(sim.pcp)
sim.tmp = formatting(sim.tmp)
sim.szn = formatting(sim.szn)

Format training data

colnames(dat.d)[11] = "yday"
obs.pcp = dat.d[,c(1:3,8:9,11,4)]
obs.tmp = dat.d[,c(1:3,8:9,11,5)]

First you might want to plot the daily time series for verification

If your data contained NA values, they can propagate to simulated temperature values (NA precip values in your data are set to 0), so use na.rm = T for any subsequent analysis. You may also choose to replace NA values with daily or monthly averages. Additionally, leap years may be included in the simulated weather if they are included in your training data, so all non-leap years include a row of 'NA' values at the end of the calendar year as a book-keeping measure so that the total number of rows in each trace is the same.

#plot simulated daily data
# simDat = sim.tmp
# obsDat = obs.tmp
# Tag = "Temp"

dailyPlot = function(simDat, obsDat, Tag){

  simD = simDat %>%
    drop_na() %>%
    group_by(variable, yday) %>%
    summarise(
      mean = mean(value, na.rm = T),
      max = max(value, na.rm = T),
      sd = sd(value, na.rm = T),
      skew = skewness(value, na.rm = T)
      ) %>%
    ungroup()

  simDq <- simD %>%
    group_by(yday) %>%
    summarise(
      mean_q5 = quantile(mean, 0.05, na.rm = T),
      mean_med = median(mean, na.rm = T),
      mean_q95 = quantile(mean, 0.95, na.rm =T),
      max_q5 = quantile(max, 0.05, na.rm = T),
      max_med = median(max, na.rm = T),
      max_q95 = quantile(max, 0.95, na.rm = T),
      sd_q5 = quantile(sd, 0.05, na.rm = T),
      sd_med = median(sd),
      sd_q95 = quantile(sd, 0.95, na.rm = T),
      skew_q5 = quantile(skew, 0.05, na.rm = T),
      skew_med = median(skew, na.rm = T),
      skew_q95 = quantile(skew, 0.95, na.rm = T)
      ) %>%
      drop_na() %>%
    ungroup()

  if(Tag == "Temp"){
    obs <- obsDat %>%
      drop_na() %>%
      group_by(yday) %>%
      summarise(
        mean = mean(temp, na.rm = T),
        max = max(temp, na.rm = T),
        sd = sd(temp, na.rm = T),
        skew = skewness(temp, na.rm = T)
        ) %>%
      ungroup()
  } else if(Tag == "Precip"){
    obs <- obsDat %>%
      drop_na() %>%
      group_by(yday) %>%
      summarise(
        mean = mean(prcp, na.rm = T),
        max = max(prcp, na.rm = T),
        sd = sd(prcp, na.rm = T),
        skew = skewness(prcp, na.rm = T)
        ) %>%
      ungroup()
    }

  colnames(obs)[-1] = paste0("obs_", colnames(obs)[-1])

  df.comb = left_join(simDq, obs, by = "yday")

  #plotting --------------------------------
  lgdLoc = c(0.8, 0.9)

  if(Tag == "Temp"){
    yLabel = "Daily Temperature "
    units = "(°F)"
  } else if(Tag == "Precip"){
    yLabel = "Daily Precipitation "
    units = "(inches)"
  }

  trnAlpha = 0.65

  #daily mean
  p1 = ggplot(df.comb) +
    geom_ribbon(aes(x = yday, ymin = mean_q5, ymax = mean_q95, fill = "95% Confidence"), alpha = 0.25) +
    geom_line(aes(x = yday, y = mean_med, color = "Simulation Median"), size = 1, alpha = 0.8) +
    geom_line(aes(x = yday, y = obs_mean, color = "Training Data"), size = 0.3, alpha = trnAlpha, linetype = "solid") +
    geom_point(aes(x = yday, y = obs_mean, color = "Training Data"), size = 0.6, alpha = trnAlpha) +

    # Corrected scale_colour_manual()
    scale_colour_manual(values = c("Training Data" = "blue", "Simulation Median" = "red")) +

    # Ensure fill is mapped for the ribbon
    scale_fill_manual(values = c("95% Confidence" = "grey")) +

    theme_classic() +
    theme(axis.title = element_text(face = "bold"),
          panel.grid.major = element_line(),
          legend.title = element_blank(),
          legend.position = lgdLoc,
          legend.background = element_blank(),
          legend.box.background = element_blank(),
          legend.key = element_blank()) +
    xlab("Day of Year") + ylab(paste0("Mean ", yLabel, units))

  #daily SD
  p2 = ggplot(df.comb) +
    geom_ribbon(aes(x = yday, ymin = sd_q5, ymax = sd_q95, fill = "95% Confidence"), alpha = 0.25) +
    geom_line(aes(x = yday, y = sd_med, color = "Simulation Median"), size = 1, alpha = 0.8) +
    geom_line(aes(x = yday, y = obs_sd, color = "Training Data"), size = 0.3, alpha = trnAlpha, linetype = "solid") +
    geom_point(aes(x = yday, y = obs_sd, color = "Training Data"), size = 0.6, alpha = trnAlpha) +

    scale_colour_manual(values = c("Training Data" = "blue", "Simulation Median" = "red")) +
    scale_fill_manual(values = c("95% Confidence" = "grey")) +

    theme_classic() +
    theme(axis.title = element_text(face = "bold"),
          panel.grid.major = element_line(),
          legend.title = element_blank(),
          legend.position = lgdLoc,
          legend.background = element_blank(),
          legend.box.background = element_blank(),
          legend.key = element_blank()) +
    xlab("Day of Year") + ylab(paste0("Std. Deviation of ", yLabel, units))

  #daily skew
  p3 = ggplot(df.comb) +
    geom_ribbon(aes(x = yday, ymin = skew_q5, ymax = skew_q95, fill = "95% Confidence"), alpha = 0.25) +
    geom_line(aes(x = yday, y = skew_med, color = "Simulation Median"), size = 1, alpha = 0.8) +
    geom_line(aes(x = yday, y = obs_skew, color = "Training Data"), size = 0.3, alpha = trnAlpha, linetype = "solid") +
    geom_point(aes(x = yday, y = obs_skew, color = "Training Data"), size = 0.6, alpha = trnAlpha) +

    scale_colour_manual(values = c("Training Data" = "blue", "Simulation Median" = "red")) +
    scale_fill_manual(values = c("95% Confidence" = "grey")) +

    theme_classic() +
    theme(axis.title = element_text(face = "bold"),
          panel.grid.major = element_line(),
          legend.title = element_blank(),
          legend.position = lgdLoc,
          legend.background = element_blank(),
          legend.box.background = element_blank(),
          legend.key = element_blank()) +
    xlab("Day of Year") + ylab(paste0("Skew of ", yLabel, " (-)"))



  #daily Max
  p4 = ggplot(df.comb) +
    geom_ribbon(aes(x = yday, ymin = max_q5, ymax = max_q95, fill = "95% Confidence"), alpha = 0.25) +
    geom_line(aes(x = yday, y = max_med, color = "Simulation Median"), size = 1, alpha = 0.8) +
    geom_line(aes(x = yday, y = obs_max, color = "Training Data"), size = 0.3, alpha = trnAlpha, linetype = "solid") +
    geom_point(aes(x = yday, y = obs_max, color = "Training Data"), size = 0.6, alpha = trnAlpha) +

    scale_colour_manual(values = c("Training Data" = "blue", "Simulation Median" = "red")) +
    scale_fill_manual(values = c("95% Confidence" = "grey")) +

    theme_classic() +
    theme(axis.title = element_text(face = "bold"),
          panel.grid.major = element_line(),
          legend.title = element_blank(),
          legend.position = lgdLoc,
          legend.background = element_blank(),
          legend.box.background = element_blank(),
          legend.key = element_blank()) +
    xlab("Day of Year") + ylab(paste0("Maximum ", yLabel, units))

  p.comb = ggarrange(p1, p2, p3, p4, nrow = 2, ncol = 2, common.legend = TRUE, legend = "bottom")

  print(p.comb)

  # p.out = paste0(tempdir(), "/outputPlots/dailyStats_", Tag, ".png")
  # ggsave(filename = p.out, plot = p.comb, device = "png")

}

plot daily precipitation

dailyPlot(sim.pcp, obs.pcp, "Precip")

plot daily temperature

dailyPlot(sim.tmp, obs.tmp, "Temp")

Looking at just the daily mean may not be representative since weather may be very different depending on the season, so plot monthly statistics as well for more detail.

Boxplot whiskers are in the style of Tukey (1.5 x interquartile range)

#plot simulated daily data
# simDat = sim.tmp
# obsDat = obs.tmp
# Tag = "Temp"

monthlyPlot = function(simDat, obsDat, Tag){

  if(Tag == "Temp"){

    simM = simDat %>%
      drop_na() %>%
      group_by(variable, month, `simulation year`) %>%
      summarise(
        mean = mean(value, na.rm = T),
        max = max(value, na.rm = T),
        sd = sd(value, na.rm = T),
        skew = skewness(value, na.rm = T)
        ) %>%
      ungroup()

    simMM <- simM %>%
      group_by(variable, month) %>%
      summarise(
        mean=mean(mean),
        max=mean(max),
        sd=sqrt(mean(sd^2)),
        skew=mean(skew, na.rm=T)
        ) %>%
      ungroup()

    obs <- obsDat %>%
      drop_na() %>%
      group_by(month, year) %>%
      summarise(
        mean = mean(temp, na.rm = T),
        max = max(temp, na.rm = T),
        sd = sd(temp, na.rm = T),
        skew = skewness(temp, na.rm = T)
        ) %>%
      ungroup()

    obsMM <- obs %>%
      group_by(month) %>%
      summarise(
        mean = mean(mean, na.rm = T),
        max = mean(max, na.rm = T),
        sd = sqrt(mean(sd^2)),
        skew = mean(skew, na.rm=T)
        ) %>%
      mutate(variable = "Observed") %>%
      relocate(variable) %>%
      ungroup()

  # colnames(obsMM)[-1] = paste0("obs_", colnames(obsMM)[-1])

  }else if(Tag == "Precip"){

    simM = simDat %>%
      drop_na() %>%
      group_by(variable, month, `simulation year`) %>%
      summarise(
        sum = sum(value, na.rm = T),
        max = max(value, na.rm = T),
        sd = sd(value, na.rm = T),
        skew = skewness(value, na.rm = T)
        ) %>%
      ungroup()  

      simMM <- simM %>%
        group_by(variable, month) %>%
        summarise(
          sum=mean(sum),
          max=mean(max),
          sd=sqrt(mean(sd^2)),
          skew=mean(skew, na.rm=T)
          ) %>%
        ungroup()

      obs <- obsDat %>%
        drop_na() %>%
        group_by(month, year) %>%
        summarise(
          sum = sum(prcp, na.rm = T),
          max = max(prcp, na.rm = T),
          sd = sd(prcp, na.rm = T),
          skew = skewness(prcp, na.rm = T)
          ) %>%
        ungroup()

      obsMM <- obs %>%
        group_by(month) %>%
        summarise(
          sum = mean(sum, na.rm = T),
          max = mean(max, na.rm = T),
          sd = sqrt(mean(sd^2)),
          skew = mean(skew, na.rm=T)
          ) %>%
        mutate(variable = "Observed") %>%
        relocate(variable) %>%
        ungroup()

  # colnames(obsMM)[-1] = paste0("obs_", colnames(obsMM)[-1])

  }

  df.comb = rbind(obsMM, simMM)

  #plotting --------------------------------
  if(Tag == "Temp"){

    p1 = ggplot(df.comb) + 
      geom_boxplot(data = subset(df.comb, variable != "Observed"), aes(x = month, y = mean, group = month)) +
      geom_line(data = subset(df.comb, variable == "Observed"), size = 0.5, aes(x = month, y = mean, color = "Observed")) +
      geom_point(data = subset(df.comb, variable == "Observed"), size = 1.5, aes(x = month, y = mean, color = "Observed")) +
      xlab("Month") + ylab("Temperature (°F)") +
      theme_classic() +
      theme(axis.title = element_text(face = "bold"),
            text=element_text(size=12),
            panel.grid.major = element_line(),
            legend.title=element_blank(),
            plot.title = element_text(size=10)
            ) +
      scale_x_continuous(breaks = 1:12) +
      ggtitle("Average Mean Monthly Temperature")

  }else if(Tag == "Precip"){

     p1 = ggplot(df.comb) + 
      geom_boxplot(data = subset(df.comb, variable != "Observed"), aes(x = month, y = sum, group = month)) +
      geom_line(data = subset(df.comb, variable == "Observed"), size = 0.5, aes(x = month, y = sum, color = "Observed")) +
      geom_point(data = subset(df.comb, variable == "Observed"), size = 1.5, aes(x = month, y = sum, color = "Observed")) +
      xlab("Month") + ylab("Precipitation (inches)") +
      theme_classic() +
      theme(axis.title = element_text(face = "bold"),
            text=element_text(size=12),
            panel.grid.major = element_line(),
            legend.title=element_blank(),
            plot.title = element_text(size=10)
            ) +
      scale_x_continuous(breaks = 1:12) +
      ggtitle("Average Total Monthly Precipitation")

  }

  if(Tag == "Temp"){
    yLabel = "Temperature "
    units = "(°F)"
  } else if(Tag == "Precip"){
    yLabel = "Precipitation "
    units = "(inches)"
  }

  #monthly SD
  p2 = ggplot(df.comb) + 
      geom_boxplot(data = subset(df.comb, variable != "Observed"), aes(x = month, y = sd, group = month)) +
      geom_line(data = subset(df.comb, variable == "Observed"), size = 0.5, aes(x = month, y = sd, color = "Observed")) +
      geom_point(data = subset(df.comb, variable == "Observed"), size = 1.5, aes(x = month, y = sd, color = "Observed")) +
      xlab("Month") + ylab(paste0("Standard Deviation ", units)) +
      theme_classic() +
      theme(axis.title = element_text(face = "bold"),
            text=element_text(size=12),
            panel.grid.major = element_line(),
            legend.title=element_blank(),
            plot.title = element_text(size=10)
            ) +
      scale_x_continuous(breaks = 1:12) +
      ggtitle(paste0("Average Standard Deviation in Monthly ", yLabel))


  #monthly Skew
  p3 = ggplot(df.comb) + 
      geom_boxplot(data = subset(df.comb, variable != "Observed"), aes(x = month, y = skew, group = month)) +
      geom_line(data = subset(df.comb, variable == "Observed"), size = 0.5, aes(x = month, y = skew, color = "Observed")) +
      geom_point(data = subset(df.comb, variable == "Observed"), size = 1.5, aes(x = month, y = skew, color = "Observed")) +
      xlab("Month") + ylab("Skew (-)") +
      theme_classic() +
      theme(axis.title = element_text(face = "bold"),
            text=element_text(size=12),
            panel.grid.major = element_line(),
            legend.title=element_blank(),
            plot.title = element_text(size=10)
            ) +
      scale_x_continuous(breaks = 1:12) +
      ggtitle(paste0("Average Skew in Monthly ", yLabel))


  #monthly max
  p4 = ggplot(df.comb) + 
      geom_boxplot(data = subset(df.comb, variable != "Observed"), aes(x = month, y = max, group = month)) +
      geom_line(data = subset(df.comb, variable == "Observed"), size = 0.5, aes(x = month, y = max, color = "Observed")) +
      geom_point(data = subset(df.comb, variable == "Observed"), size = 1.5, aes(x = month, y = max, color = "Observed")) +
      xlab("Month") + ylab(paste0("Maximum ", units)) +
      theme_classic() +
      theme(axis.title = element_text(face = "bold"),
            text=element_text(size=12),
            panel.grid.major = element_line(),
            legend.title=element_blank(),
            plot.title = element_text(size=10)
            ) +
      scale_x_continuous(breaks = 1:12) +
      ggtitle(paste0("Average Monthly Maximum ", yLabel))

  p.comb = ggarrange(p1, p2, p3, p4, nrow = 2, ncol = 2, common.legend = TRUE, legend = "bottom")

  print(p.comb)

  # p.out = paste0(tempdir(), "/outputPlots/monthlyStats_", Tag, ".png")
  # ggsave(filename = p.out, plot = p.comb, device = "png", height = 8, width = 8, units = "in")

}

plot monthly precipitation

monthlyPlot(sim.pcp, obs.pcp, "Precip")

plot monthly temperature

monthlyPlot(sim.tmp, obs.tmp, "Temp")

Weekly statistics offer a finer resolution than monthly statistics but are not as noisy as daily values.

Boxplot whiskers are in the style of Tukey (1.5 x interquartile range)

#plot simulated daily data
simDat = sim.tmp
obsDat = obs.tmp
Tag = "Temp"

weeklyPlot = function(simDat, obsDat, Tag){

  if(Tag == "Temp"){

    simW = simDat %>%
      drop_na() %>%
      group_by(variable, week, `simulation year`) %>%
      summarise(
        mean = mean(value, na.rm = T),
        max = max(value, na.rm = T),
        sd = sd(value, na.rm = T),
        skew = skewness(value, na.rm = T)
        ) %>%
      ungroup()

    simWW <- simW %>%
      group_by(variable, week) %>%
      summarise(
        mean=mean(mean),
        max=mean(max),
        sd=sqrt(mean(sd^2)),
        skew=mean(skew, na.rm=T)
        ) %>%
      ungroup()

    obs <- obsDat %>%
      drop_na() %>%
      group_by(week, year) %>%
      summarise(
        mean = mean(temp, na.rm = T),
        max = max(temp, na.rm = T),
        sd = sd(temp, na.rm = T),
        skew = skewness(temp, na.rm = T)
        ) %>%
      ungroup()

    obsWW <- obs %>%
      group_by(week) %>%
      summarise(
        mean = mean(mean, na.rm = T),
        max = mean(max, na.rm = T),
        sd = sqrt(mean(sd^2)),
        skew = mean(skew, na.rm=T)
        ) %>%
      mutate(variable = "Observed") %>%
      relocate(variable) %>%
      ungroup()

  # colnames(obsMM)[-1] = paste0("obs_", colnames(obsMM)[-1])

  }else if(Tag == "Precip"){

    simW = simDat %>%
      drop_na() %>%
      group_by(variable, week, `simulation year`) %>%
      summarise(
        sum = sum(value, na.rm = T),
        max = max(value, na.rm = T),
        sd = sd(value, na.rm = T),
        skew = skewness(value, na.rm = T)
        ) %>%
      ungroup()  

      simWW <- simW %>%
        group_by(variable, week) %>%
        summarise(
          sum=mean(sum),
          max=mean(max),
          sd=sqrt(mean(sd^2)),
          skew=mean(skew, na.rm=T)
          ) %>%
        ungroup()

      obs <- obsDat %>%
        drop_na() %>%
        group_by(week, year) %>%
        summarise(
          sum = sum(prcp, na.rm = T),
          max = max(prcp, na.rm = T),
          sd = sd(prcp, na.rm = T),
          skew = skewness(prcp, na.rm = T)
          ) %>%
        ungroup()

      obsWW <- obs %>%
        group_by(week) %>%
        summarise(
          sum = mean(sum, na.rm = T),
          max = mean(max, na.rm = T),
          sd = sqrt(mean(sd^2)),
          skew = mean(skew, na.rm=T)
          ) %>%
        mutate(variable = "Observed") %>%
        relocate(variable) %>%
        ungroup()

  # colnames(obsMM)[-1] = paste0("obs_", colnames(obsMM)[-1])

  }

  df.comb = rbind(obsWW, simWW)

  #plotting --------------------------------
  if(Tag == "Temp"){

    p1 = ggplot(df.comb) + 
      geom_boxplot(data = subset(df.comb, variable != "Observed"), aes(x = week, y = mean, group = week)) +
      geom_line(data = subset(df.comb, variable == "Observed"), size = 0.5, aes(x = week, y = mean, color = "Observed")) +
      geom_point(data = subset(df.comb, variable == "Observed"), size = 1.5, aes(x = week, y = mean, color = "Observed")) +
      xlab("Week") + ylab("Temperature (°F)") +
      theme_classic() +
      theme(axis.title = element_text(face = "bold"),
            text=element_text(size=12),
            panel.grid.major = element_line(),
            legend.title=element_blank(),
            plot.title = element_text(size=10)
            ) +
      scale_x_continuous(breaks = seq(1,52,2)) +
      ggtitle("Average Mean Weekly Temperature")

  }else if(Tag == "Precip"){

     p1 = ggplot(df.comb) + 
      geom_boxplot(data = subset(df.comb, variable != "Observed"), aes(x = week, y = sum, group = week)) +
      geom_line(data = subset(df.comb, variable == "Observed"), size = 0.5, aes(x = week, y = sum, color = "Observed")) +
      geom_point(data = subset(df.comb, variable == "Observed"), size = 1.5, aes(x = week, y = sum, color = "Observed")) +
      xlab("Week") + ylab("Precipitation (inches)") +
      theme_classic() +
      theme(axis.title = element_text(face = "bold"),
            text=element_text(size=12),
            panel.grid.major = element_line(),
            legend.title=element_blank(),
            plot.title = element_text(size=10)
            ) +
      scale_x_continuous(breaks = seq(1,52,2)) +
      ggtitle("Average Total Weekly Precipitation")

  }

  if(Tag == "Temp"){
    yLabel = "Temperature "
    units = "(°F)"
  } else if(Tag == "Precip"){
    yLabel = "Precipitation "
    units = "(inches)"
  }

  #weekly SD
  p2 = ggplot(df.comb) + 
      geom_boxplot(data = subset(df.comb, variable != "Observed"), aes(x = week, y = sd, group = week)) +
      geom_line(data = subset(df.comb, variable == "Observed"), size = 0.5, aes(x = week, y = sd, color = "Observed")) +
      geom_point(data = subset(df.comb, variable == "Observed"), size = 1.5, aes(x = week, y = sd, color = "Observed")) +
      xlab("Week") + ylab(paste0("Standard Deviation ", units)) +
      theme_classic() +
      theme(axis.title = element_text(face = "bold"),
            text=element_text(size=12),
            panel.grid.major = element_line(),
            legend.title=element_blank(),
            plot.title = element_text(size=10)
            ) +
      scale_x_continuous(breaks = seq(1,52,2)) +
      ggtitle(paste0("Average Standard Deviation in Weekly ", yLabel))


  #weekly Skew
  p3 = ggplot(df.comb) + 
      geom_boxplot(data = subset(df.comb, variable != "Observed"), aes(x = week, y = skew, group = week)) +
      geom_line(data = subset(df.comb, variable == "Observed"), size = 0.5, aes(x = week, y = skew, color = "Observed")) +
      geom_point(data = subset(df.comb, variable == "Observed"), size = 1.5, aes(x = week, y = skew, color = "Observed")) +
      xlab("Week") + ylab("Skew (-)") +
      theme_classic() +
      theme(axis.title = element_text(face = "bold"),
            text=element_text(size=12),
            panel.grid.major = element_line(),
            legend.title=element_blank(),
            plot.title = element_text(size=10)
            ) +
      scale_x_continuous(breaks = seq(1,52,2)) +
      ggtitle(paste0("Average Skew in Weekly ", yLabel))


  #weekly max
  p4 = ggplot(df.comb) + 
      geom_boxplot(data = subset(df.comb, variable != "Observed"), aes(x = week, y = max, group = week)) +
      geom_line(data = subset(df.comb, variable == "Observed"), size = 0.5, aes(x = week, y = max, color = "Observed")) +
      geom_point(data = subset(df.comb, variable == "Observed"), size = 1.5, aes(x = week, y = max, color = "Observed")) +
      xlab("Week") + ylab(paste0("Maximum ", units)) +
      theme_classic() +
      theme(axis.title = element_text(face = "bold"),
            text=element_text(size=12),
            panel.grid.major = element_line(),
            legend.title=element_blank(),
            plot.title = element_text(size=10)
            ) +
      scale_x_continuous(breaks = seq(1,52,2)) +
      ggtitle(paste0("Average Weekly Maximum ", yLabel))

  p.comb = ggarrange(p1, p2, p3, p4, nrow = 2, ncol = 2, common.legend = TRUE, legend = "bottom")

  print(p.comb)

  # p.out = paste0(tempdir(), "/outputPlots/weeklyStats_", Tag, ".png")
  # ggsave(filename = p.out, plot = p.comb, device = "png", height = 8, width = 10, units = "in")

}

plot weekly precipitation

weeklyPlot(sim.pcp, obs.pcp, "Precip")

plot weekly temperature

weeklyPlot(sim.tmp, obs.tmp, "Temp")

We can also calculate dry- and wet- spell length statistics for further verification.

Here, lower and upper boxplot whiskers are 5th and 95th percentiles, respectively.

# dat.d = z$dat.d
# X = z$Xpamt
# syr = head(dat.d$year, 1)
# eyr = tail(dat.d$year, 1)
# wLO=0.05; wHI=0.95     #Set whisker percentile for boxplots

spellstats <- function(dat.d,X,syr,eyr,nrealz,nsim,wLO,wHI){
  #get these variables after running the driver_wx#.R code
  #dat.d <- z$dat.d
  #X <- z$X
  uyr=syr:eyr
  nyr=length(uyr)

  #Training Data
  Tdat=LowerSantaCruzRiverBasinAZ %>%
    mutate(occ = if_else(prcp>=0.01, 1, 0))

  #### END DATA PREPARATION BLOCK TO RUN STATS CODE BELOW ####

  #to get month sequence
  nobs=length(unique(Tdat$year))
  lpyear=dat.d$year[min(which(lubridate::leap_year(dat.d$year)))]
  aday <- ymd(paste(lpyear,1,1,sep="-")) #jan 1 of a leap year to have a 366-day year
  it1=which(dat.d$date==aday)
  it2=it1+366-1
  jdaymth <- dat.d$month[it1:it2]
  zz <- rep(jdaymth,nsim)
  yy <- rep(1:nsim,each=366)
  X1 <- cbind(yy,zz,X)

  #get spell length stats 3 statistics - mean, var and max
  Y <- array(NA,dim=c(nrealz,3,2,nsim,12)) #save dry and wet spell lengths by sim yr
  W <- array(NA,dim=c(3,2,nobs,12))        #save dry and wet spell lengths by obs yr
  Z <- array(NA,dim=c(3,2,nrealz,12))      #save average spell length by realz
  V <- array(NA,dim=c(3,2,12))             #save average spell length for obs

  fidx=seq(1,dim(X1)[1],by=366)

  for (irealz in 1:nrealz){
    for (isim in 1:nsim){
      i1=fidx[isim]
      i2=i1+366-1
      for (imth in 1:12){
        idxlist=i1 + which(X1[i1:i2,2]==imth) - 1
        s <- na.omit(X[idxlist,irealz])
        z.f <- spellLengths(s)
        if (length(z.f$`0`)>0){
          Y[irealz,1,1,isim,imth]=mean(z.f$`0`)
          Y[irealz,2,1,isim,imth]=var(z.f$`0`)
          Y[irealz,3,1,isim,imth]=max(z.f$`0`)
        }
        if (length(z.f$`1`)>0){
          Y[irealz,1,2,isim,imth]=mean(z.f$`1`)
          Y[irealz,2,2,isim,imth]=var(z.f$`1`)
          Y[irealz,3,2,isim,imth]=max(z.f$`1`)
        }
      } #imth
    }#isim
  } #irealz

  for (isim in 1:nobs){
    for (imth in 1:12){
      s <- dplyr::filter(Tdat, year==unique(Tdat$year)[isim], month==imth)$occ
      z.f <- spellLengths(s)
      if (length(z.f$`0`)>0){
        W[1,1,isim,imth]=mean(z.f$`0`)
        W[2,1,isim,imth]=var(z.f$`0`)
        W[3,1,isim,imth]=max(z.f$`0`)
      }
      if (length(z.f$`1`)>0){
        W[1,2,isim,imth]=mean(z.f$`1`)
        W[2,2,isim,imth]=var(z.f$`1`)
        W[3,2,isim,imth]=max(z.f$`1`)
      }
    } #imth
  }#isim

  for (imth in 1:12){
    Z[1,1,,imth]=apply(Y[,1,1,,imth],1,"mean",na.rm=T)
    Z[2,1,,imth]=apply(Y[,2,1,,imth],1,"mean",na.rm=T)
    Z[3,1,,imth]=apply(Y[,3,1,,imth],1,"mean",na.rm=T)
    Z[1,2,,imth]=apply(Y[,1,2,,imth],1,"mean",na.rm=T)
    Z[2,2,,imth]=apply(Y[,2,2,,imth],1,"mean",na.rm=T)
    Z[3,2,,imth]=apply(Y[,3,2,,imth],1,"mean",na.rm=T)
    V[1,1,imth]=mean(W[1,1,,imth],na.rm=T)
    V[2,1,imth]=mean(W[2,1,,imth],na.rm=T)
    V[3,1,imth]=mean(W[3,1,,imth],na.rm=T)
    V[1,2,imth]=mean(W[1,2,,imth],na.rm=T)
    V[2,2,imth]=mean(W[2,2,,imth],na.rm=T)
    V[3,2,imth]=mean(W[3,2,,imth],na.rm=T)
  } #imth

#Boxplots  
  d01 <- as.data.frame(Z[1,1,,])  #average mean dry spell length
  d02 <- as.data.frame(sqrt(Z[2,1,,]))  #average sd dry spell length
  d03 <- as.data.frame(Z[3,1,,])  #average max dry spell length

  d01T=V[1,1,]
  d02T=sqrt(V[2,1,])
  d03T=V[3,1,]

  RecRed = "red"
  RecBlue = "blue"

  # pdf(file=paste0(tempdir(), "/outputPlots/DryWetStats.pdf"), width=9, height=4)
  oldpar = par(mfrow=c(1,3), mar=c(2,2.5,2,1),
               oma=c(2,2,0,0), mgp=c(2,1,0), cex.axis=0.8)

  par(mfrow=c(1,3), mar=c(2,2.5,2,1), oma=c(2,2,0,0), mgp=c(2,1,0),cex.axis=0.8)
  #Mean
  #ylimit=range(c(d01,d01hist),na.rm=T)
  bb=boxplot(d01, plot=F, na.rm=T, names=1:12)
  ylimit=c(range(c(d01,d01T),na.rm=T))
  out=matrix(nrow=nsim, ncol=12)
  for(b in 1:12){
    x=d01[,b]
    quants=quantile(x, c(wLO,wHI),na.rm=T)
    bb$stats[c(1,5),b] = quants
    outs=which(x < quants[1] | x > quants[2])
    out[1:length(outs), b]= x[outs]
  }
  bxp(bb, ylim=ylimit,na.rm=T, outline=F,xlab="",ylab="")
  mtext("days", side=2, outer = T)
  title(main="average mean dry spell length")
  for(m in 1:12){points(rep(m, length(which(is.na(out[,m])==F))), out[!is.na(out[,m]),m])}
  points(1:12, d01T, pch=17, cex=1, col=RecRed)
  lines(1:12, d01T, col=RecRed)

  #Standard Deviation
  bb=boxplot(d02, plot=F, na.rm=T, names=1:12)
  ylimit=c(range(c(d02,d02T),na.rm=T))
  out=matrix(nrow=nsim, ncol=12)
  for(b in 1:12){
    x=d02[,b]
    quants=quantile(x, c(wLO,wHI),na.rm=T)
    bb$stats[c(1,5),b] = quants
    outs=which(x < quants[1] | x > quants[2])
    out[1:length(outs), b]= x[outs]
  }
  bxp(bb, ylim=ylimit,na.rm=T, outline=F,xlab="",ylab="")
  title(main="average sd dry spell length")
  mtext("month", side=1, outer = T, line=0.5)
  for(m in 1:12){points(rep(m, length(which(is.na(out[,m])==F))), out[!is.na(out[,m]),m])}
  points(1:12, d02T, pch=17, cex=1, col=RecRed)
  lines(1:12, d02T, col=RecRed)

  #Max Length
  bb=boxplot(d03, plot=F, na.rm=T, names=1:12)
  ylimit=c(range(c(d03,d03T),na.rm=T))
  out=matrix(nrow=nsim, ncol=12)
  for(b in 1:12){
    x=d03[,b]
    quants=quantile(x, c(wLO,wHI),na.rm=T)
    bb$stats[c(1,5),b] = quants
    outs=which(x < quants[1] | x > quants[2])
    out[1:length(outs), b]= x[outs]
  }
  bxp(bb, ylim=ylimit,xlab="",ylab="",na.rm=T, outline=F)
  title(main="average max dry spell length")
  for(m in 1:12){points(rep(m, length(which(is.na(out[,m])==F))), out[!is.na(out[,m]),m])}
  points(1:12, d03T, pch=17, cex=1, col=RecRed)
  lines(1:12, d03T, col=RecRed)


  dev.off()

  par(oldpar)
  ##########################################
}

spellstats(dat.d = z$dat.d, X = z$Xpamt,
           syr = head(dat.d$year, 1),
           eyr = tail(dat.d$year, 1),
           nrealz = nrealz, nsim = nsim,
           wLO = 0.05, wHI = 0.95)

Step 4: Save your data to file

Save your simulated weather ensemble to a file via the writeSim function. It will conveniently save each trace to a .csv file.

# setwd() to desired location for writeSim to save .csv files containing the simulated precipitation and temperature
# setwd(tempdir())
# 
# writeSim(wxOutput = z, nsim = nsim, nrealz = nrealz, debug = TRUE)

Performance Benchmarking

Running the wx() weather generator code for a 5-year, 10-trace simulation on a laptop with the following characteristics results in the below run time. Parallel computing was enabled via numbCores = 11 in the wx() function. OS: Microsoft Windows 10 Enterprise 10.0.19044 Build 19044 Hardware: Intel(R) Core(TM) i7-10850H CPU @2.70GHz, 2712 Mhz, 6 Cores, 12 Logical Processors. 16 GB installed physical memory.

  #wxgenR weather generation run time:
  print(difftime(endTime, startTime, units='mins'))

Final notes

The weather simulations use a 366-day per year framework in order to handle leap years. During leap years, all 366 days will have precipitation and temperature values (i.e., February 29th exists and contains data), but during non-leap years February 29th does not exist and a row of NULL values is added after December 31st in order to maintain the same length between leap years and non-leap years. Other datasets and algorithms use various approaches to handle leap years, such as avoiding leap years altogether, using a 360-day year, etc.
Because leap years are acceptable in the wxgenR algorithm, it is possible (but unlikely) to have two or more leap years in a row in the weather simulations since years are sampled at random.
Please report any bugs or issues to either sgangopadhyay@usbr.gov or dwoodson@usbr.gov

Citations

For more details and examples, including analysis of the dataset used in this vignette, see the following works:

Bearup, L., Gangopadhyay, S., & Mikkelson, K. (2021). Hydroclimate Analysis Lower Santa Cruz River Basin Study (Technical Memorandum No ENV-2020-056). Bureau of Reclamation. <https://www.usbr.gov/lc/phoenix/programs/lscrbasin/LSCRBStudy.html>.

Gangopadhyay, S., Bearup, L. A., Verdin, A., Pruitt, T., Halper, E., & Shamir, E. (2019). A collaborative stochastic weather generator for climate impacts assessment in the Lower Santa Cruz River Basin, Arizona. Fall Meeting 2019, American Geophysical Union. <https://ui.adsabs.harvard.edu/abs/2019AGUFMGC41G1267G>.

Rajagopalan, B., Lall, U., and Tarboton, D. G. (1996). Nonhomogeneous Markov Model for Daily Precipitation, Journal of Hydrologic Engineering, 1, 33–40, <https://doi.org/10.1061/(ASCE)1084-0699(1996)1:1(33)>.

Verdin, A., Rajagopalan, B., Kleiber, W., and Katz, R. W. (2015). Coupled stochastic weather generation using spatial and generalized linear models, Stoch Environ Res Risk Assess, 29, 347–356, <https://doi.org/10.1007/s00477-014-0911-6>.

Verdin, A., Rajagopalan, B., Kleiber, W., Podestá, G., and Bert, F. (2018). A conditional stochastic weather generator for seasonal to multi-decadal simulations, Journal of Hydrology, 556, 835–846, <https://doi.org/10.1016/j.jhydrol.2015.12.036>.

Disclaimer

This information is preliminary and is subject to revision. It is being provided to meet the need for timely best science. The information is provided on the condition that neither the U.S. Bureau of Reclamation nor the U.S. Government may be held liable for any damages resulting from the authorized or unauthorized use of the information.

Any scripts or data that you put into this service are public.

wxgenR documentation built on April 15, 2025, 1:10 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

wxgenR
A Stochastic Weather Generator with Seasonality

wxgenR - Lower Santa Cruz River Basin, AZ"
In wxgenR: A Stochastic Weather Generator with Seasonality

Running `wxgenR`

Tutorial

Step 1: Load your data

Step 2: Select your run settings and run the weather generator

Step 3: Analyze simulated weather

First, use modified approach from writeSim function to post-process/format output

Format dataframes for simulated precip, temperature, and season

Format training data

First you might want to plot the daily time series for verification

plot daily precipitation

plot daily temperature

Looking at just the daily mean may not be representative since weather may be very different depending on the season, so plot monthly statistics as well for more detail.

plot monthly precipitation

plot monthly temperature

Weekly statistics offer a finer resolution than monthly statistics but are not as noisy as daily values.

plot weekly precipitation

plot weekly temperature

We can also calculate dry- and wet- spell length statistics for further verification.

Step 4: Save your data to file

Performance Benchmarking

Final notes

Citations

Disclaimer

Try the wxgenR package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

wxgenR A Stochastic Weather Generator with Seasonality

wxgenR - Lower Santa Cruz River Basin, AZ" In wxgenR: A Stochastic Weather Generator with Seasonality

Running wxgenR

Tutorial

Step 1: Load your data

Step 2: Select your run settings and run the weather generator

Step 3: Analyze simulated weather

First, use modified approach from writeSim function to post-process/format output

Format dataframes for simulated precip, temperature, and season

Format training data

First you might want to plot the daily time series for verification

plot daily precipitation

plot daily temperature

Looking at just the daily mean may not be representative since weather may be very different depending on the season, so plot monthly statistics as well for more detail.

plot monthly precipitation

plot monthly temperature

Weekly statistics offer a finer resolution than monthly statistics but are not as noisy as daily values.

plot weekly precipitation

plot weekly temperature

We can also calculate dry- and wet- spell length statistics for further verification.

Step 4: Save your data to file

Performance Benchmarking

Final notes

Citations

Disclaimer

Try the wxgenR package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

wxgenR
A Stochastic Weather Generator with Seasonality

wxgenR - Lower Santa Cruz River Basin, AZ"
In wxgenR: A Stochastic Weather Generator with Seasonality

Running `wxgenR`