Description Usage Arguments Details Value Author(s) References Examples
The shash
family implements the four-parameter sinh-arcsinh (shash) distribution of
Jones and Pewsey (2009). The location, scale, skewness and kurtosis of the density can depend
on additive smooth predictors. Useable only with gam, the linear predictors are specified
via a list of formulae.
1 2 |
link |
vector of four characters indicating the link function for location, scale, skewness and kurtosis parameters. |
b |
positive parameter of the logeb link function, see Details. |
phiPen |
positive multiplier of a ridge penalty on kurtosis parameter. Do not touch it unless you know what you are doing, see Details. |
The density function of the shash family is
p(y|μ,σ,ε,δ)=C(z) exp{-S(z)^2/2} / σ{2π(1+z^2)}^1/2,
where C(z)={1+S(z)^2}^1/2 , S(z)=sinh{δ sinh^(-1)(z)-ε} and
z=(y-μ)/(σδ). Here μ and σ > 0 control, respectively, location and
scale, ε determines skewness, while δ > 0 controls tailweight.
shash
can model skewness to either side, depending on the sign of ε.
Also, shash can have tails that are lighter (δ>1) or heavier (0<δ<1) that a normal.
For fitting purposes, here we are using τ = log(σ) and φ = log(δ).
The link function used for τ is logeb with is η = log{exp(τ)-b} so that the inverse link is τ = log(σ) = log{exp(η)+b}. The point is that we are don't allow σ to become smaller than a small constant b. The likelihood includes a ridge penalty - phiPen * φ^2, which shrinks φ toward zero. When sufficient data is available the ridge penalty does not change the fit much, but it is useful to include it when fitting the model to small data sets, to avoid φ diverging to +infinity (a problem already identified by Jones and Pewsey (2009)).
An object inheriting from class general.family.
Matteo Fasiolo <matteo.fasiolo@gmail.com> and Simon N. Wood.
Jones, M. and A. Pewsey (2009). Sinh-arcsinh distributions. Biometrika 96 (4), 761<e2><80><93>780.
Wood, Simon N., Pya, N. and Safken, B. (2017). Smoothing parameter and model selection for general smooth models. Journal of the American Statistical Association.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 | #########
# Shash dataset
#########
## Simulate some data form shash
set.seed(847)
n <- 1000
x <- seq(-4, 4, length.out = n)
X <- cbind(1, x, x^2)
beta <- c(4, 1, 1)
mu <- X %*% beta
sigma = .5+0.4*(x+4)*.5 # Scale
eps = 2*sin(x) # Skewness
del = 1 + 0.2*cos(3*x) # Kurtosis
dat <- mu + (del * sigma) * sinh((1/del) * asinh(qnorm(runif(n))) + (eps/del))
dataf <- data.frame(cbind(dat, x))
names(dataf) <- c("y", "x")
plot(x, dat, xlab = "x", ylab = "y")
## Fit model
fit <- gam(list(y ~ s(x), # <- model for location
~ s(x), # <- model for log-scale
~ s(x), # <- model for skewness
~ s(x, k = 20)), # <- model for log-kurtosis
data = dataf,
family = shash, # <- new family
optimizer = "efs") # Here we are using a new optimizer
## Plotting truth and estimates for each parameters of the density
muE <- fit$fitted[ , 1]
sigE <- exp(fit$fitted[ , 2])
epsE <- fit$fitted[ , 3]
delE <- exp(fit$fitted[ , 4])
par(mfrow = c(2, 2))
plot(x, muE, type = 'l', ylab = expression(mu(x)), lwd = 2)
lines(x, mu, col = 2, lty = 2, lwd = 2)
legend("top", c("estimated", "truth"), col = 1:2, lty = 1:2, lwd = 2)
plot(x, sigE, type = 'l', ylab = expression(sigma(x)), lwd = 2)
lines(x, sigma, col = 2, lty = 2, lwd = 2)
plot(x, epsE, type = 'l', ylab = expression(epsilon(x)), lwd = 2)
lines(x, eps, col = 2, lty = 2, lwd = 2)
plot(x, delE, type = 'l', ylab = expression(delta(x)), lwd = 2)
lines(x, del, col = 2, lty = 2, lwd = 2)
## Plotting true and estimated conditional density
par(mfrow = c(1, 1))
plot(x, dat, pch = '.', col = "grey", ylab = "y", ylim = c(-35, 70))
for(qq in c(0.001, 0.01, 0.1, 0.5, 0.9, 0.99, 0.999)){
est <- fit$family$qf(p=qq, mu = fit$fitted)
true <- mu + (del * sigma) * sinh((1/del) * asinh(qnorm(qq)) + (eps/del))
lines(x, est, type = 'l', col = 1, lwd = 2)
lines(x, true, type = 'l', col = 2, lwd = 2, lty = 2)
}
legend("topleft", c("estimated", "truth"), col = 1:2, lty = 1:2, lwd = 2)
##########
## Motorcycle example
##########
# Here shash is an overkill, in fact the fit is not good, relative
# to what we would get with mgcv::gaulss
library(MASS)
b <- gam(list(accel~s(times, k=20, bs = "ad"), ~ s(times, k = 10), ~ 1, ~ 1),
data=mcycle, family=shash)
par(mfrow = c(1, 1))
xSeq <- data.frame(cbind("accel" = rep(0, 1e3), "times" = seq(2, 58, length.out = 1e3)))
pred <- predict(b, newdata = xSeq)
plot(mcycle$times, mcycle$accel, ylim = c(-180, 100))
for(qq in c(0.1, 0.3, 0.5, 0.7, 0.9)){
est <- b$family$qf(p=qq, mu = pred)
lines(xSeq$times, est, type = 'l', col = 2)
}
plot(b, pages = 1, scale = FALSE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.