downscaleTrain: Calibration of downscaling methods

View source: R/downscaleTrain.R

downscaleTrainR Documentation

Calibration of downscaling methods

Description

Calibration of downscaling methods. Currently analogs, generalized linear models (GLM) and Neural Networks (NN) are available.

Usage

downscaleTrain(
  obj,
  method,
  condition = NULL,
  threshold = NULL,
  model.verbose = TRUE,
  predict = TRUE,
  simulate = FALSE,
  ...
)

Arguments

obj

The object as returned by prepareData.

method

Character string indicating the type of method/transfer function. Currently accepted values are "analogs", "GLM" or "NN".

condition

Inequality operator to be applied to the given "threshold". Only the days that satisfy the condition will be used for training the model. "GT" = greater than the value of threshold, "GE" = greater or equal, "LT" = lower than, "LE" = lower or equal than.

threshold

Numeric value. Threshold used as reference for the condition. Default is NULL. If a threshold value is supplied with no specificaction of the argument condition. Then condition is set to "GE".

model.verbose

A logic value. Indicates wether the information concerning the model infered is limited to the essential information (model.verbose = FALSE) or a more detailed information (model.verbose = TRUE, DEFAULT). This is recommended when you want to save memory. Only operates for GLM.

predict

A logic value. Should the prediction on the training set should be returned? Default is TRUE.

simulate

A logic value indicating whether we want to simulate or not based on the GLM distributional parameters when prediting on the train set. Only relevant when perdicting with a GLM. Default to FALSE.

...

Optional parameters. These parameters are different depending on the method selected. Every parameter has a default value set in the atomic functions in case that no selection is wanted. Everything concerning these parameters is explained in the section Details. However, if wanted, the atomic functions can be seen here: glm.train and nn.train.

Details

The function can downscale in both global and local mode, though not simultaneously. If there is perfect collinearity among predictors, then the matrix will not be invertible and the downscaling will fail. We recommend to get rid of the NaN/NA values before calling the function.

Analogs The optional parameters of this method are:

  • n.analogs An integer. Number of analogs. Default is 4.

  • sel.fun A string. Select a function to apply to the analogs selected for a given observation. Options are "mean", "wmean" (i.e., weighted mean), "max", "min", "median", "prcXX" (i.e., prc85 means the 85th percentile of the analogs values distribution). Default is "mean". the function applied to the analogs values, (i.e., sel.fun = c("mean","max","min","median","prcXX"), with default "mean") and the temporal window, (i.e., window = 0).

  • window An integer. Window of days removed when selecting analogs. If window = 7, then 7 days after the observation date and the 7 days before the observation date are removed. Default is 0.

  • n.random An integer. Choose N random analogs among the closest n.analogs. Default is NULL.

More information can be found in analogs.train

Generalized Linear Models (GLM) The optional parameters depends on the fitting optional parameter:

  • fitting A string indicating the types of objective functions and how to fit the linear model.

    • fitting = NULL In this case the generalized linear model uses the glm function to fit the linear model. This is the default option. The optional parameters when fitting = NULL are:

      • family A string indicating a description of the error distribution. Options are family = c("gaussian","binomial","Gamma","inverse.gaussian","poisson","quasi","quasibinomial","quasipoisson"). The links can be also specified and can be found in family.

      • na.action A function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The ‘factory-fresh’ default is na.omit. Another possible value is NULL, no action. Value na.exclude can be useful.

    • fitting = "stepwise" Indicates a stepwise regression via glm and step. The optional parameters are the same than for fitting = NULL. Stepwise can be performed backward or forward, as well as we can limit the number of steps. This can be done by the additional optional parameter stepwise.arg, which is a list contatining two parameters that belong to step: steps and direction. An example would be: stepwise.arg = list(steps = 5, direction = "backward"). Default is NULL what indicates an unlimited forward stepwise search.

    • fitting = c("L1","L2","L1L2","gLASSO"). These four options refer to ridge regression (L1 penalty), lasso regression (L2 penalty), elastic-net regression (L1L2 penalty) and group Lasso regression (group L2 penalty). The model is fitted via glmnet and the corresponding penalties are found via cv.glmnet. This function glmnet forces by default to standardize predictors, however we have changed it to standardize = FALSE, and standardization should be done prior to the downscaling process. The optional parameters when fitting = c("L1","L2","L1L2","gLASSO") are:

      • family A string indicating a description of the error distribution. Options are family = c("gaussian","binomial","Gamma","inverse.gaussian","poisson","quasi","quasibinomial","quasipoisson"). The links CAN NOT be specified as the glmnet has not been programmed to handle links. However, the default ones can be found in family. If fitting = "gLASSO" then family must be "mgaussian".

      • offset A vector of length nobs that is included in the linear predictor (a nobs x nc matrix for the "multinomial" family). Useful for the "poisson" family (e.g. log of exposure time), or for refining a model by starting at a current fit. Default is NULL. If supplied, then values must also be supplied to the predict function.

    There are two things to consider. 1) If family = "binomial" then type = "response" when predicting values. 2) Except for fitting = "MP", for the rest of the fitting options, the parameter site must be TRUE, unless we want a gLASSO, in this case site must be FALSE.

Neural Networks Neural network is based on the library deepnet. The optional parameters corresponds to those in nn.train and are: initW = NULL, initB = NULL, hidden = c(10), activationfun = "sigm", learningrate = 0.001, momentum = 0.5, learningrate_scale = 1, output = "sigm", numepochs = 5000, batchsize = 100, hidden_dropout = 0, visible_dropout = 0. The values indicated are the default values.

Help

If there are still doubts about the optional parameters despite the description here, we encourage to look for further details in the atomic functions: analogs.train, glm.train and nn.train.

Value

A list of objects that contains the prediction on the train dataset and the model.

  • pred: An object with the same structure as the predictands input parameter, but with pred$Data being the predictions and not the observations.

  • model: A list with the information of the model: method, coefficients, fitting ...

downscaleR Wiki for downscaling seasonal forecasting and climate projections.

Author(s)

J. Bano-Medina

See Also

Other downscaling.functions: downscaleCV(), downscaleChunk(), downscalePredict(), downscale()

Examples


# Loading data
require(transformeR)
require(climate4R.datasets)
data("VALUE_Iberia_tas")
y <- VALUE_Iberia_tas
data("NCEP_Iberia_hus850", "NCEP_Iberia_psl", "NCEP_Iberia_ta850")
x <- makeMultiGrid(NCEP_Iberia_hus850, NCEP_Iberia_psl, NCEP_Iberia_ta850)
# Preparing the predictors
data <- prepareData(x = x, y = y, spatial.predictors = list(v.exp = 0.95))
# Training downscaling methods
model.analogs <- downscaleTrain(data, method = "analogs", n.analogs = 1)
model.regression <- downscaleTrain(data, method = "GLM",family = gaussian)
model.nnets <- downscaleTrain(data, method = "NN", hidden = c(10,5), output = "linear")
# Plotting the results for station 5
plot(y$Data[,5],model.analogs$pred$Data[,5], xlab = "obs", ylab = "pred")

SantanderMetGroup/downscaleR documentation built on July 4, 2023, 4:28 a.m.