Description Usage Arguments Details Value Note Author(s) References See Also Examples
View source: R/PointProcessModel.R
The function pointProcessModel
constructs objects of class
PointProcessModel
or MultivariatePointProcess
. These
models are generalized linear point processes with the intensity function
given in terms of a linear predictor process, that is, in terms of a linear
combination of linear filters. The linear filters are specified through a symbolic
description of the linear predictor process. The function
ppmFit
fits the model to the data.
1 2 3 4 5 6 7 8 9 | pointProcessModel(formula, data, family, support = 1, N = 200, Delta,
lambda, coefficients, modelMatrix = TRUE, fit = modelMatrix,
varMethod = 'Fisher', basisEnv, selfStart = TRUE, ...)
## S4 method for signature 'PointProcessModel'
ppmFit(model, control = list(), optim = 'optim', selfStart = TRUE, ...)
## S4 method for signature 'MultivariatePointProcess'
ppmFit(model, control = list(), ...)
|
formula |
an object of class |
data |
an object of class |
family |
an object of class
|
support |
a |
N |
a |
Delta |
a |
lambda |
an optional vector of penalization weights in a quadratic penalization of the parameter vector. If missing, no penalization is added to the negative log-likelihood when estimates are computed. |
coefficients |
an optional specification of the initial parameters used for the numerical optimization. |
modelMatrix |
a |
fit |
a |
varMethod |
a |
basisEnv |
an optional |
model |
a point process model as an object of class
|
control |
a |
optim |
a |
selfStart |
a |
... |
additional parameters that are passed on to |
pointProcessModel
is the main function for estimation of a generalized
linear point process model. The function sets up the model
specified in terms of a formula, and, if required, computes the
model matrix and fits the model to the data.
When calling pointProcessModel
the three arguments
formula
, data
and family
must be
specified. In addition, a specification of the argument support
and
optionally N
or Delta
should be considered.
The formula
specifies together with the family
the
conditional intensity for a point process in terms of the history of
the point process itself as well as additional processes and
variables. The formula
alone specifies a linear predictor
process, which consists of a linear combination of stochastic
processes determined by the terms in the formula. The parameters to be
estimated in the model are the coefficients in this linear
combination. The stochastic processes are computed and stored in the
model matrix. The family
specifies a phi function
that transforms the linear predictor process into the conditional
intensity.
The possibilities for specifying the linear predictor process in terms of the formula are
quite extensive. For instance, any variable corresponding to
continuous process observations, unit specific variables, the unit
identification variable or the position/time variable can enter in the
formula specification by name just as if we did an ordinary regression
on these variables. The variables enter directly, as specified by the
formula, into the intensity through the inverse link function of the
linear predictor process. Interactions can be specified, e.g. if
there is a column named V
and the unit identification variable
is id
, the inclusion of id*V
in the formula
specification will result in unit specific parameters for the
effect of V
on the intensity. Standard rules for formula
specification applies, e.g. id*V
expands to id + V +
id:V
, and id:V
by itself results in a different
parametrization. In addition, variables can be transformed using either
a single function or basis expansions. A model formula specification
of the intensity that only relies on the variables listed above
results in a (conditional) inhomogeneous Poisson process model.
Other possible terms in the formula specification are terms involving the mark variables, that is, the point process data. The response (the left hand side of the formula) must always be one of the mark variables. If the response is not specified, the model is set up, and the model matrix is computed - if requested - but the model is not fitted.
How a point process, as given by a term in the formula involving
a mark variable, enters in the model is partly determined by the family.
For the Hawkes
-family, point processes enter in the model
through linear filters specified by filter functions. That is, if points for
the mark variable M
are observed at s_1, ...,
s_n, and f(M)
is a term in the formula, the corresponding
linear filter
sum_{i: t-s_i in support} f(t-s_i)
then enters as a term in the linear predictor process. Expansions of
a filter function in terms of basis functions is possible, using
e.g. bSpline
to generate an expansion in terms of
B-splines. See also Family
for details on the
family and phi function specification. The term M
results in a linear filter function on the support, whereas the term
const(M)
results in a constant function on the support. Thus
an affine filter function can be obtained by including I(const(M)
+ M)
in the formula.
As described above, there is a fundamental difference in the ways that
a point process and a continuous process enter in the model
specification. A continuous process enters as is
whereas a point process enters through a linear filter. This is
considered to be natural in most cases - the point history needs to be
aggregated through a linear filter before it enters in the
specification of the conditional intensity, whereas it is the
instantaneous value of the continuous process that determines the
conditional intensity. However, it may be desirable to apply a linear
filter to a continuous process as well. This is possible by including
terms involving a continuous process variable name concatenated with a
".d". That is, if V
is a continuous process variable and
f(V.d)
is a term in the formula, the corresponding
linear filter
int_{support} f(s) V((t-s)-)ds
then enters as a term in the linear predictor process. As for point processes, the filter function can be expanded using any set of basis functions. Currently, the implementation only supports causal filters. Note that the specified filter functions are normalized to integrate, numerically, to 1 and that the process is padded with its initial value to the left.
The left hand side of the formula can be a point process variable or a sum of point process variables, in which case a univariate point process model is fitted. It can also be a vector of point process variables in which case a multivariate point process model is fitted.
ppmFit
is called for the estimation of the model. If the model
is set up with fit = FALSE
initially, the model parameters can
subsequently be estimated by calling ppmFit
. It is usually not
called directly by the user. Depending on the optim
argument
it calls the appropriate internal function for the actual
optimization. The current default does numerical minimization of
the (time-discretized) minus-log-likelihood function using optim
- in general
using the BFGS-algorithm. However, if the identity phi
function is used (choose link = "identity"
in the family
specification) the L-BFGS-B-algorithm is used instead, with the
default lower bound sqrt(Machine$double.eps)
on the intercept
parameter and 0 on other parameters. The alternatives are described
below.
Control parameters for the numerical optimization may be passed to the
optim
function, or the other optimization functions, via the
control
argument for ppmFit
. The maximal number of
iterations for the algorithm is set to 1000 when used in ppmFit
in contrast to the default for optim
, which is 100. See
optim
for details. A control
argument in a call
of pointProcessModel
is automatically passed on to the call of
ppmFit
.
The optim
method can always be specified via a named
method
argument, which is passed to optim
. The L-BFGS-B
algorithm allows for box-constraints on the parameters. This may be
required, e.g. if phi is the identity, as mentioned
above. The lower and upper bounds are specified in the call by setting
the additional arguments lower
and upper
that are passed
to the optim
function. See optim
for details.
Note that for estimates computed using box-constraints with one or
more parameter value(s) on the boundary of the parameter space, the
estimated standard errors based on the inverse Fisher information are
unreliable. This holds true for the default method with the identity
phi function as well.
Alternatives to optim = 'optim'
, all experimental, include
optim = 'poisson'
based on glm.fit
for the Poisson
family. The implementation is based on the same time-discretized
approximation to the minus-log-likelihood as with the optim
function and gives comparable results. For large sparse model matrices
the optim
function is usually faster because function and
gradient evaluations exploit sparse matrix operations. With
optim = 'poisson'
the model matrix is coerced to a dense
matrix. With optim = 'iwls'
an R implementation exploiting
sparse matrix operations of the iterative weighted least squares
algorithm is used as with optim = 'poisson'
. The result should
be comparable to that of the two methods described above. With
optim = 'ls'
a least squares criterion is used, which only
works with phi the identity. In this case the resulting
quadratic optimization problem is faster to solve than minimizing the
minus-log-likelihood. Technically, for the resulting model to be a
well specified point process, the solution has to fulfill a positivity
contraint, which is not enforced. Finally, with optim
= 'glmnet'
the parameters are penalized with an l1-norm
and the solution is computed using the glmnet
function. This
is implemented with phi the identity, in which case a least squares
criterion is used, and with phi the exponential (that is,
with a log-link function), in which case the minus-log-likelihood is
used.
An object of class PointProcessModel
or
MultivariatePointProcess
containing the estimated parameters.
Use summary
to get a standard summary output from the
fitted model.
Use termPlot
on the resulting model to plot the estimated
filter function(s).
The current implementation supports linear filters of continuous
processes through the .d
notation in the formula. However, the
data structure allows for non-equidistant sampling, and the filter
computations are done using numerical integration and not
FFT. Consequently, the computation of the model matrix for large data
sets can be prohibitively slow.
Niels Richard Hansen Niels.R.Hansen@math.ku.dk.
Maintainer: Niels.R.Hansen@math.ku.dk
Hansen, N. R. Penalized maximum likelihood estimation for generalized linear point processes. arXiv:1003.0848v1. http://arxiv.org/abs/1003.0848
PointProcessModel
, update
,
optim
, bSpline
, termPlot
,
registerParBackend
, glm.fit
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | data(toyData)
toyPPM <- pointProcessModel(BETA ~ const(ALPHA),
data = toyData,
family = Hawkes(),
support = 2)
summary(toyPPM)
## Not run: ## Plot of the filter function
termPlot(toyPPM)
## End(Not run)
toyPPM <- pointProcessModel(BETA ~ cut(ALPHA, c(0,1,2),
include.lowest = TRUE),
data = toyData,
family = Hawkes(),
support = 2)
summary(toyPPM)
## Not run: ## Plot of the filter function
termPlot(toyPPM)
## End(Not run)
## Removing the baseline intensity.
toyPPM <- update(toyPPM, .~. -1)
summary(toyPPM)
## Different baselines for each unit. This 'update' adds a term
## not originally in the model, thus the model matrix will be
## recomputed.
toyPPM <- update(toyPPM, .~. + id)
summary(toyPPM)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.