knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(error = TRUE)
knitr::opts_chunk$set(eval = FALSE)
library(swnsmodelr)
library(ggplot2)

Introduction

Exract Rasters Values to Weather Station Data

The first step is to extract raster values to corresponding weather stations, in the weather station data frame. For many rasters, (ex: elevation, proximity to coast, aspect) the value will be the same every day at one station. Other rasters (ex: solar radiation) change on a temporal basis, and will have a unique value every day at one station. The two type of rasters, constant, and temporally changing, will be treated differently in the value extraction process.

The swnsmodelr package was designed for users to easily model SWNS weather station data with rasters, stations, input variable(s) and models of their chosing. The package contains functions and weather station data, and input rasters are contained within the package's R project. This document follows a workflow of preparing data, generating models, and generating outputs. The objects generated at the various steps in the example are stored in swnsmodelr.

Weather Stations Data Frame

The data frame, swns_stations_df, stored within swnsmodelr, contains SWNS daily temperature data from 2012 - 2017. The data frame has columns for: station I.D. (stationid), date (date_time), minimum temperature (temp_min), maximum temperature (temp_max), and mean temperature (temp_mean). One row of data represents one daily record for one station. Use str(), to see the column names and variables types.

str(swns_stations_df)

Constant Rasters

The constant rasters should be stored as a list of raster objects. The code below is an example of how to create a list of raster objects from rasters available in the package project. The rasters are from the 20m subdirectory within the Rasters folder. In this subdirectory, rasters have a spatial resolution of 20m.

# Names of rasters in Rasters directory (everything before extension)
rasters_names_list <- list("dem20m",  # elevation
                          "ptoc20m", # proximity to coast
                          "east20m", # easting
                          "north20m",# northing
                          "aspg20m",  # aspect
                           "tpi300m20m", # topographic position index
                          "slp20m") # slope
# Make into rasters objects list
rasters_list <- lapply(FUN = raster, X   = paste0("d:\\data\\rasters\\20m\\",rasters_names_list, ".tif"))

# Plot brick of rasters
#rasters_list %>% brick() %>% plot()

The constant raster values are added to the swns_stations_df table with extract_constant_rasters_values(). A new frame will be made called swns_stations_df_200, which is the original table plus constant raster values at 200m resolution

swns_stations_df_20 <- extract_constant_raster_values(swns_stations_df, rasters_list)
str(swns_stations_df_20)

Temporally Changing Rasters

In this modelling procedure, a solar radiation raster for each day is used an input into the model. The large quantity of rasters makes it problematic to store rasters as a list of raster objects, as with the constant rasters. To overcome this issue, there are two steps to adding temporally changing rasters to swns_stations_df. In the first step, the file path to each raster is stored as a row in a new dataframe. The date is then parsed from the file path name, to a new column of corresponding dates. This way, rasters can be selected and converted to rasters objects based on dates. The data frame, referred to as a 'temporal rasters data frame', is made with make_temporal_rasters_df(). The user must define the "date characters" or date_chars argrument, which is the index of characters that contain the date in each file path. The date format (date_format agrument) must also be defined (see ?as.Date). The user also defines a start and end date of rasters to include, and the extension of the rasters. Using the file "Sum_Irradiance_2012_1.tif" as an example: the date is formed with the characters 16 to -5, and the format is "%Y_%j". A sample use of the function for the solar radiation rasters is shown below:

solar_irradiance_rasters_df <- make_temporal_raster_df(
  in_folder = "F:\\rasters\\GOES_200m_2",
  start_date = ymd('2012-01-01'),
  end_date   = ymd('2012-12-31'),
  date_chars = c(16,-5),
  date_format = "%Y_%j",
  extension = ".tif")

head(solar_irradiance_rasters_df)

Finally, to add the values of each temporal raster to the weather station data, use extract_temporal_raster_values(). This may take a while if there are many temporal rasters. The function add a column for the set of temporally changing rasters. The argument col_name is used to specify the name of the column. By setting verbose to true, the function will print a completion report for every raster that has been succesfully extracted to the weather stations data frame. The output data frame was named swns_stations_df_200, because it has the raster values extracted from 200m resolution.

swns_stations_df_200  <- extract_temporal_raster_values(temporal_rasters_df = solar_irradiance_rasters_df,
                               temperatures_df = swns_stations_df_200,
                               col_name = "sum_irradiance",
                               verbose = FALSE)
head(swns_stations_df_200)

Separate Stations into Modelling and Validating Stations

There are weather stations from AGRG (study stations), ECAN and DNR (external stations) in the data frame. Ideally, the study stations would be used to model the data, and the external stations to validate the models. To be able to differeniate at any time between study (AGRG) and external (ECAN and DNR) stations, the station I.D.'s were sorted into two lists: study_stations_list and ext_stations_list. In the code below, the lists were used to separate the stations, and they were plotted to show their locations.

ggplot(data = swns_stations_df %>%
         filter(!duplicated(stationid)) %>%
         dplyr::mutate(origin = if_else(stationid %in% nscc_stations_list,
                                                               "study",
                                                               "ext"))) +
  geom_point(aes(x = EASTING, y = NORTHING, colour = origin)) +
  coord_fixed()

The following code was used to separate the stations, and can be easily altered to exclude or include stations in either category:

# Choose modelling stations
model_stations_df <- swns_stations_df_200 %>%
          # 1. Include all study stations
  filter(stationid %in% nscc_stations_list )

# Choose validation stations
val_stations_df   <- swns_stations_df_200 %>% 
          # 1. Include all ext stations
  filter(stationid %in% ext_stations_list  ) 

This is the resulting modelling and validation stations:

ggplot(data = swns_stations_df %>%
         filter(!duplicated(stationid)) %>%
         dplyr::mutate(role = if_else(stationid %in% unique(model_stations_df$stationid),
                                                               "model",
                                                               "validate"))) +
  geom_point(aes(x = EASTING, y = NORTHING, colour = role)) +
  coord_fixed()

Examine Raster Variable Relationship with Weather Station Temperature

Histograms of input rasters

lapply(rasters_list, hist)

Weather Station Coverage of Raster Variability

model_stations_sp <- model_stations_df %>% 
  filter(!duplicated(stationid)) %>%
  dplyr::select(stationid, EASTING, NORTHING)
coordinates(model_stations_sp) = ~ EASTING + NORTHING

par(mfrow = c(2,2))
for(i in seq_along(rasters_list)){
  raster_values <- raster::extract(rasters_list[[i]], model_stations_sp) 
  if(names(rasters_list[[i]]) == "slp20"){
    plot(x = 1:(length(raster_values)),
       y = raster_values[sort.list(raster_values)], 
       ylab = names(rasters_list[[i]]),
       xlab = "Stations",
       ylim = c(0,90))
  }else{
  plot(x = 1:(length(raster_values)),
       y = raster_values[sort.list(raster_values)], 
       ylab = names(rasters_list[[i]]),
       xlab = "Stations",
       ylim = c(minValue(rasters_list[[i]]), maxValue(rasters_list[[i]])))
}

}

Generate and Validate Models

A set of functions in swnsmodelr generate GAMs based on a model station data, then apply the models to validation station data, and computes the error at each validation record. The output from the functions is the validation station data frame with columns for the error (resid), GCV score(gcv), and adjusted R square () added. There is a function written for the three timeframes, daily, weekly, and monthly, respectively: validate_daily_GAMs(), validate_weekly_GAMs(), validate_monthly_GAMs().

The functions allow the user to specify an alternative formula, with the argument alt_formula. This feature was specifically implemented for the case that solar radiation data is missing for a particular date. The function will fail if the data is missing, and fall back on the alternative formula. If there is no suspiscion that a missing variable will cause the function to fail, this argument can be left blank.

daily_val_df <- validate_daily_GAMs(model_stations_df,
                                     val_stations_df,
                                     years = 2012,
                                     days = 1:3,
                                    formula = "temp_mean ~ s(dem)",
                                    alt_formula ="temp_mean ~ s(dem)")

Test Dependent Variable/Independent Variable Combinations at Different Timeframes

In the swns_stations_df_200 data frame there are three dependent variables: daily minimum, maxmimum, and mean temperatures, and seven independent variables extracted from rasters.

Two ways of visualizing the errors are by histogram, and a scatterplot against the dependent variable.

ggplot(data = daily_val_df, aes(resid))+
  labs(x = "Residuals",y = "Count") +
  geom_histogram(position = "dodge", colour = "black", fill = "pink",bins = 30)

ggplot(data = daily_val_df, aes(x = temp_mean, y = resid)) +
  geom_point() +
  geom_smooth()

Generating Output

Overview

With the combination of a model stations data frame, model formula, and input rasters, functions in swnsmodelr generate different types of raster outputs. The functions daily_model_predict_rasters(), and weekly_model_predict_rasters() generates daily rasters of the dependent variable with the specified model applied in daily or weekly timeframes, respectively.

The prediction functions will fail if the rasters in the rasters_list, and the rasters pointed to in the temporal_rasters_df, do not have the same extents, and number of rows and columns. The function raster::compareRaster() will be return TRUE for the rasters_list if the properties are the same. If not, the function raster::resample() can be used to correct the issue.

The function generate_mean_rasters(), calculates mean rasters for sets of maximum and minimum rasters that correpsond by date. This function was written incase a user would rather model daily minimum and maximum temperatures before calculating daily mean temperature.

Finally, the function generate_output_gdd(), generates GDD rasters from daily mean temperature rasters. The user can specify the GDD base, and frequency of outputs (daily, weekly, monthly or yearly). There is also an agrument called growing_season that when set to TRUE, only accumulates GDD for April to November.

Predict Daily Mean Temperature Rasters

Continuing from the previous section, the code below demonstrates how to use daily_model_predict_rasters().As with the model validation, a formula and an alternative formula (alt_formula) can be specified. The alt_formula is used if the formula causes gam() to fail. This would occur is a variable is a missing for the date being modelling, which is the case randomly with sum_irradiance.

# Predict daily mean temperature
daily_model_predict_rasters(start_date = ymd('2012-01-01'),
                               end_date = ymd('2017-12-31'),
                               formula = "temp_mean ~
                                          s(east,north) +
                                          s(dem)",
                               alt_formula = "temp_mean ~
                                          s(east,north) +
                                          s(dem)",
                               temperatures_df = model_stations_df,
                               input_rasters = rasters_list,
                               temporal_rasters_df = solar_irradiance_rasters_df,
                               output_folder = "F://daily_models//mean_temperature",
                               output_ext = "tif",
                               verbose = TRUE)

Calculate Growing Degree Day (GDD) Rasters

Rasters of GDD are calculated by raster algebra with the daily mean temperature rasters. The first step is to make a data frame with make_temporal_rasters_df() that contains the file paths to the daily mean temperature rasters, and their corresponding date. The function output_gdd_rasters() applies the specified GDD base to the mean temperature rasters, accumulates them daily, and sends them to the output folder at the specified timeframe, output_timeframe. The accumulation only occurs within a year, and can be set to only include April 1st - November 30th by setting growing_season to TRUE. The code below demonstrates how to use the two functions to generate GDD rasters.

temp_mean_df <- make_temporal_raster_df("F://daily_models//mean_temperature",
                                        start_date = ymd('2012-01-01'),
                                        end_date   = ymd('2017-12-31'),
                                        date_chars = c(10,19),
                                        date_format = "%Y-%m-%d")

generate_gdd_output(temp_mean_df,
                    gdd_base = 5,
                    start_date = start_date,
                    end_date = end_date,
                    output_time_slice = "monthly",
                    growing_season = TRUE,
                    output_folder = "F:\\temp",
                    plot_gdd_raster = TRUE)


danamelamed/swnsmodelr documentation built on May 13, 2023, 5:09 a.m.