Description Usage Arguments Details Value References See Also Examples
GGeDS
constructs a Geometrically Designed (univariate) variable
knots spline regression model for the predictor in the context of Generalized (Non-)Linear Models,
referred to as a GeDS model, for a response with a pre-specified distribution from
the Exponential Family.
1 2 3 |
formula |
a description of the structure of the predictor model to be fitted, including the
dependent and independent variables. See |
data |
an optional data frame, list or environment containing the variables of the predictor model.
In case the variables are not found in |
family |
a description of the error distribution and link function to be used in the model. This can be a
character string naming a family function (e.g. |
weights |
an optional vector of ‘prior weights’ to be put on
the observations in the fitting
process in case the user requires weighted GeDS fitting.
It is |
beta |
numeric parameter in the interval [0,1] tuning the knot placement in stage A of GeDS. See details below. |
phi |
numeric parameter in the interval [0,1] specifying the threshold for
the stopping rule (model selector) in stage A of GeDS. See also |
min.intknots |
optional parameter allowing the user to set a minimum number of internal knots required. By default equal to zero. |
max.intknots |
optional parameter allowing the user to set a maximum number of internal knots to be added by the GeDS estimation algorithm. By default equal to the number of knots for the saturated GeDS model (i.e. N-2, where N is the number of observations). |
q |
numeric parameter which allows to fine-tune the stopping rule of stage A of GeDS, by default equal to 2. See details below. |
Xextr |
numeric vector of 2 elements representing the left-most and right-most limits of the interval embedding the observations of the independent variable. See details. |
show.iters |
logical variable indicating whether or not to print
information at each step. By default equal to |
stoptype |
a character string indicating the type of GeDS stopping rule to
be used. It should be either one of |
The GGeDS
function extends the GeDS methodology, recently developed by Kaishev et al. (2016) and implemented
in the NGeDS
function for the Normal case,
to the more general GNM (GLM) context, allowing for the response to have any
distribution from the Exponential Family.
Under the GeDS-GNM approach the (non-)linear predictor is viewed as a spline with
variable knots which are estimated along with the regression coefficients and the
order of the spline, using a two stage procedure.
In stage A, a linear variable-knot spline is fitted to the data applying iteratively
re-weighted least squares (see IRLSfit
function). In stage B, a Schoenberg variation
diminishing spline approximation to the fit
from stage A is constructed, thus simultaneously producing spline fits of order 2,
3 and 4, all of which are included in the output, a GeDS-Class
object.
A detailed description of the underlying algorithm can be found in Dimitrova et al. (2017).
As noted in formula
, the argument formula
allows the user to specify predictor models
with two components, a
spline regression (non-parametric) component involving part of the independent variables identified through
the function f
and an optional parametric component involving the remaining independent variables.
For GGeDS
only one independent variable is allowed for the spline component and arbitrary many independent variables for the
parametric component of the predictor.
Failure to specify the independent variable for the spline regression component through
the function f
will return an error.
See formula
.
Within the argument formula
, similarly as in other R functions, it is possible to
specify one or more offset variables, i.e. known terms with fixed regression coefficients equal to 1.
These terms should be identified via the function offset
.
The parameter beta
tunes the placement of a new knot in stage A of the algorithm.
Once a current second-order spline is fitted to the data the
'working' residuals (see IRLSfit
) are computed and grouped by their sign.
A new knot is placed at a location defined by the group
for which a certain measure attains its maximum.
The latter measure is defined as a weighted linear combination of the range
of each group and the mean of the absolute residuals within it.
The parameter beta
determines the weights in this measure correspondingly as beta
and 1 - beta
.
The higher it is, the more weight is put to the mean of the residuals and the less to the range
of their corresponding x-values (see Kaishev et al., 2016, for further details).
The default values of beta
are beta = 0.5
if
the response is assumed to be Gaussian, beta = 0.2
if it is Poisson (or Quasipoisson),
while if it is Binomial, Quasibinomial or Gamma beta = 0.1
, which reflect our experience of
running GeDS for different underlying functional dependencies.
The argument stoptype
allows to choose between three alternative stopping rules for the knot selection
in stage A of GeDS, the "RD"
, that stands for Ratio of Deviances, the "SR"
,
that stands for Smoothed Ratio of deviances and the "LR"
, that stands for Likelihood Ratio.
The latter is based on the difference of deviances rather than on their ratio as in the case of
"RD"
and "SR"
. Therefore "LR"
can be viewed as
a log likelihood ratio test performed at each iteration of the knot placement.
In each of these cases the corresponding stopping criterion is compared with a threshold value
phi
(see below).
The argument phi
provides a threshold value required for the stopping rule to exit
the knot placement in stage A of GeDS.
The higher the value of phi
, the more knots are added under the "RD"
and "SR"
stopping rules
contrary to the case of the stopping rule "LR"
where the lower phi
is,
more knots are included in the spline regression. Further details for
each of the three alternative stopping rules can be found in Dimitrova et al. (2017).
The argument q
is an input parameter that allows to fine-tune the stopping rule in stage A.
It identifies the number of consecutive iterations
over which the deviance should exhibit stable convergence so as the knot placement in stage A is terminated.
More precisely,
under any of the rules "RD"
, "SR"
or "LR"
the deviance at the current iteration is compared to the deviance computed
q
iterations before, i.e. before selecting
the last q
knots. Setting a higher q
will lead to more knots
being added before exiting stage A of GeDS.
A GeDS-Class
object, i.e. a list of items that summarizes the main
details of the fitted GeDS regression. See GeDS-Class
for details.
Some S3 methods are available in order to make these objects tractable, such as coef
,
deviance
, knots
, predict
and print
as well as S4 methods for lines
and plot
.
Kaishev, V.K., Dimitrova, D.S., Haberman, S. and Verrall, R.J. (2016).
Geometrically designed, variable knot regression splines.
Computational Statistics, 31, 1079–1105.
DOI: doi.org/10.1007/s00180-015-0621-7
Dimitrova, D.S., Kaishev, V.K., Lattuada A. and Verrall, R.J. (2017). Geometrically designed, variable knot splines in Generalized (Non-)Linear Models. Available at openaccess.city.ac.uk
NGeDS
; GeDS-Class
; S3 methods such as coef.GeDS
, deviance.GeDS
,
knots.GeDS
, print.GeDS
and predict.GeDS
; Integrate
and Derive
; PPolyRep
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 | ######################################################################
# Generate a data sample for the response variable Y and the covariate
# X assuming Poisson distributed error and log link function
# See section 4.1 in Dimitrova et al. (2017)
set.seed(123)
N <- 500
f_1 <- function(x) (10*x/(1+100*x^2))*4+4
X <- sort(runif(N, min = -2, max = 2))
# Specify a model for the mean of Y to include only a component
# non-linear in X, defined by the function f_1
means <- exp(f_1(X))
# Generate Poisson distributed Y according to the mean model
Y <- rpois(N, means)
# Fit a Poisson GeDS regression using GGeDS
(Gmod <- GGeDS(Y ~ f(X), beta = 0.2, phi = 0.995, family = poisson(),
Xextr = c(-2,2)))
# Plot the quadratic and cubic GeDS fits
plot(X,log(Y),xlab = "x", ylab = expression(f[1](x)))
lines(Gmod, n = 3, col = "red")
lines(Gmod, n = 4, col = "blue", lty = 2)
legend("topleft", c("Quadratic", "Cubic"),
col = c("red", "blue"), lty = c(1,2))
# Generate GeDS prediction at X=0, first on the response scale and then on
# the predictor scale
predict(Gmod, n = 3, newdata = data.frame(X = 0))
predict(Gmod, n = 3, newdata = data.frame(X = 0), type = "link")
# Apply some of the other available methods, e.g.
# knots, coefficients and deviance extractions for the
# quadratic GeDS fit
knots(Gmod)
coef(Gmod)
deviance(Gmod)
# the same but for the cubic GeDS fit
knots(Gmod, n = 4)
coef(Gmod, n = 4)
deviance(Gmod, n = 4)
##########################################
# A real data example
# See Dimitrova et al. (2017), Section 4.2
data("coalMining")
(Gmod2 <- GGeDS(formula = accidents ~ f(years), beta = 0.1, phi = 0.98,
family = poisson(), data = coalMining))
(Gmod3 <- GGeDS(formula = accidents ~ f(years), beta = 0.1, phi = 0.985,
family = poisson(), data = coalMining))
plot(coalMining$years, coalMining$accidents, type = "h", xlab = "Years",
ylab = "Accidents")
lines(Gmod2, tr = exp, n = 4, col = "red")
lines(Gmod3, tr = exp, n = 4, col = "blue", lty = 2)
legend("topright", c("phi = 0.98","phi = 0.985"), col = c("red", "blue"),
lty=c(1, 2))
## Not run:
##########################################
# The same regression in the example of GeDS
# but assuming Gamma and Poisson responses
# See Dimitrova et al. (2017), Section 4.2
data('BaFe2As2')
(Gmod4 <- GGeDS(intensity ~ f(angle), data = BaFe2As2, beta = 0.6, phi = 0.995, q = 3,
family = Gamma(log), stoptype = "RD"))
plot(Gmod4)
(Gmod5 <- GGeDS(intensity ~ f(angle), data = BaFe2As2, beta = 0.1, phi = 0.995, q = 3,
family = poisson(), stoptype = "SR"))
plot(Gmod5)
## End(Not run)
##########################################
# Life tables
# See Dimitrova et al. (2017), Section 4.2
data(EWmortality)
attach(EWmortality)
(M1 <- GGeDS(formula = Deaths ~ f(Age) + offset(log(Exposure)),
family = poisson(), phi = 0.99, beta = 0.1, q = 3,
stoptype = "LR"))
Exposure_init <- Exposure + 0.5 * Deaths
Rate <- Deaths / Exposure_init
(M2 <- GGeDS(formula = Rate ~ f(Age), weights = Exposure_init,
family = quasibinomial(), phi = 0.99, beta = 0.1,
q = 3, stoptype = "LR"))
op <- par(mfrow=c(2,2))
plot(Age, Deaths/Exposure, ylab = expression(mu[x]), xlab = "Age")
lines(M1, n = 3, tr = exp, lwd = 1, col = "red")
plot(Age, Rate, ylab = expression(q[x]), xlab = "Age")
lines(M2, n = 3, tr = quasibinomial()$linkinv, lwd = 1, col = "red")
plot(Age, log(Deaths/Exposure), ylab = expression(log(mu[x])), xlab = "Age")
lines(M1, n = 3, lwd = 1, col = "red")
plot(Age, quasibinomial()$linkfun(Rate), ylab = expression(logit(q[x])), xlab = "Age")
lines(M2, n = 3, lwd = 1, col = "red")
par(op)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.