hxlr: Heteroscedastic Extended Logistic Regression

View source: R/hxlr.R

hxlrR Documentation

Heteroscedastic Extended Logistic Regression

Description

This is a wrapper function for clm (from package ordinal) to fit (heteroscedastic) extended logistic regression (HXLR) models (Messner et al. 2013).

Usage

hxlr(formula, data, subset, na.action, weights, thresholds, link, control, ...)

Arguments

formula

a formula expression of the form y ~ x | z where y is the response and x and z are regressor variables for the location and the scale of the latend distribution respectively. Response can either be a continuous variable or a factor.

data

an optional data frame containing the variables occurring in the formulas.

subset

an optional vector specifying a subset of observations to be used for fitting.

na.action

a function which indicates what should happen when the data contain NAs. Default is na.omit

weights

optional case weights in fitting.

thresholds

vector of (transformed) thresholds that are used to cut the continuous response into categories. Data frames or matrices with multiple columns are allowed as well. Then each column is used as separate predictor variable for the intercept model.

link

link function, i.e., the type of location-scale distribution assumed for the latent distribution. Default is logit.

control

a list of control parameters passed to optim. Default is hxlr.control

...

arguments to be used to form the default control argument if it is not supplied directly.

Details

Extended logistic regression (Wilks 2009) extends binary logistic regression to multi-category responses by including the thresholds, that are used to cut a continuous variable into categories, in the regression equation. Heteroscedastic extended logistic regression (Messner et al. 2013) extends this model further and allows to add additional predictor variables that are used to predict the scale of the latent logistic distribution.

Value

An object of class "hxlr", i.e., a list with the following elements.

coefficients

list of CLM coefficients for intercept, location, and scale model.

fitted.values

list of fitted location and scale parameters.

optim

output from optimization from optim.

method

Optimization method used for optim.

control

list of control parameters passed to optim

start

starting values of coefficients used in the optimization.

weights

case weights used for fitting.

n

number of observations.

nobs

number of observations with non-zero weights.

loglik

log-likelihood.

vcov

covariance matrix.

converged

logical variable whether optimization has converged or not.

iterations

number of iterations in optimization.

call

function call.

scale

the formula supplied.

terms

the terms objects used.

levels

list of levels of the factors used in fitting for location and scale respectively.

thresholds

the thresholds supplied.

References

Messner JW, Mayr GJ, Zeileis A, Wilks DS (2014). Extending Extended Logistic Regression to Effectively Utilize the Ensemble Spread. Monthly Weather Review, 142, 448–456. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1175/MWR-D-13-00271.1")}.

Wilks DS (2009). Extending Logistic Regression to Provide Full-Probability-Distribution MOS Forecasts. Meteorological Applications, 368, 361–368.

See Also

predict.hxlr, clm

Examples

data("RainIbk")
## mean and standard deviation of square root transformed ensemble forecasts
RainIbk$sqrtensmean <- 
  apply(sqrt(RainIbk[,grep('^rainfc',names(RainIbk))]), 1, mean)
RainIbk$sqrtenssd <- 
  apply(sqrt(RainIbk[,grep('^rainfc',names(RainIbk))]), 1, sd)

## climatological deciles
q <- unique(quantile(RainIbk$rain, seq(0.1, 0.9, 0.1)))

## fit ordinary extended logistic regression with ensemble mean as 
## predictor variable
XLR <- hxlr(sqrt(rain) ~ sqrtensmean, data = RainIbk, thresholds = sqrt(q))
## print
XLR
## summary
summary(XLR)


## fit ordinary extended logistic regression with ensemble mean 
## and standard deviation as predictor variables
XLRS <- hxlr(sqrt(rain) ~ sqrtensmean + sqrtenssd, data = RainIbk, 
  thresholds = sqrt(q))
## fit heteroscedastic extended logistic regression with ensemble 
## standard deviation as predictor for the scale
HXLR <- hxlr(sqrt(rain) ~ sqrtensmean | sqrtenssd, data = RainIbk, 
  thresholds = sqrt(q))

## compare AIC of different models
AIC(XLR, XLRS, HXLR)

## XLRS and HXLR are nested in XLR -> likelihood-ratio-tests
if(require("lmtest")) {
  lrtest(XLR, XLRS)
  lrtest(XLR, HXLR)
}

## Not run: 
###################################################################
## Cross-validation and bootstrapping RPS for different models 
## (like in Messner 2013). 
N <- NROW(RainIbk)
## function that returns model fits
fits <- function(data, weights = rep(1, N)) {
  list(
    "XLR"    = hxlr(sqrt(rain) ~ sqrtensmean, data = data, 
      weights = weights, thresholds = sqrt(q)),
    "XLR:S"  = hxlr(sqrt(rain) ~ sqrtensmean + sqrtenssd, data = data, 
      weights = weights, thresholds = sqrt(q)),
    "XLR:SM" = hxlr(sqrt(rain) ~ sqrtensmean + I(sqrtensmean*sqrtenssd), 
      data = data, weights = weights, thresholds = sqrt(q)),
    "HXLR"   = hxlr(sqrt(rain) ~ sqrtensmean | sqrtenssd, data = data, 
      weights = weights, thresholds = sqrt(q)),
    "HXLR:S" = hxlr(sqrt(rain) ~ sqrtensmean + sqrtenssd | sqrtenssd, 
      data = data, weights = weights, thresholds = sqrt(q))
  )
}


## cross validation
id <- sample(1:10, N, replace = TRUE)
obs <- NULL
pred <- list(NULL)
for(i in 1:10) {
  ## splitting into test and training data set
  trainIndex <- which(id != i)     
  testIndex <- which(id == i)			     
  ## weights that are used for fitting the models
  weights <- as.numeric(table(factor(trainIndex, levels = c(1:N))))
  ## testdata
  testdata <- RainIbk[testIndex,]
  ## observations    
  obs <- c(obs, RainIbk$rain[testIndex])
  ## estimation
  modelfits <- fits(RainIbk, weights)
  ## Prediction
  pred2 <- lapply(modelfits, predict, newdata = testdata, type = "cumprob")
  pred <- mapply(rbind, pred, pred2, SIMPLIFY = FALSE)
}
names(pred) <- c(names(modelfits))

## function to compute RPS
rps <- function(pred, obs) {
  OBS <- NULL
  for(i in 1:N) 
    OBS <- rbind(OBS, rep(0:1, c(obs[i] - 1, length(q) - obs[i] + 1)))
  apply((OBS-pred)^2, 1, sum)
}

## compute rps
RPS <- lapply(pred, rps, obs = as.numeric(cut(obs, c(-Inf, q, Inf))))

## bootstrapping mean rps 
rpsall <- NULL
for(i in 1:250) {
  index <- sample(length(obs), replace = TRUE)
  rpsall <- rbind(rpsall, sapply(RPS, function(x) mean(x[index])))
}
  
rpssall <- 1 - rpsall/rpsall[,1]
boxplot(rpssall[,-1], ylab = "RPSS", main = "RPSS relative to XLR")
abline(h = 0, lty = 2)

## End(Not run)

crch documentation built on March 31, 2023, 11:08 p.m.