mleDb: Maximum likelihood estimates of db parameters.
In dbd: Discretised Beta Distribution

Description Usage Arguments Details Value Author(s) See Also Examples

Calculates maximum likelihood estimates of the alpha and beta parameters of a db distribution. Calls upon optim() with the "BFGS" method.

1 2	mleDb(x, ntop, zeta=FALSE, par0=NULL, UB=10, maxit=1000, covmat=TRUE, useGinv=FALSE)

`x`	A random sample from the db distribution whose parameters are being estimated. Missing values are allowed.
`ntop`	The `ntop` parameter of the db distribution whose parameters are being estimated. I.e. it is the maximum possible value of the distribution, whose values are integers between 1 and `ntop`, or between 0 and `ntop` if `zeta` (see below) is `TRUE`.
`zeta`	See `ddb()`.
`par0`	Optional starting values for the iterative estimation procedure. A vector with entries `alpha` and `beta`. Ideally this vector should be named; if not it is assumed that the entries are in the order `alpha`, `beta`. If not supplied starting values are calculated using `meDb()`.
`UB`	Positive numeric scalar, providing an upper bound on the starting values used by `mleDb()`. It appears that if these starting values are too large (it is not clear how large) then `optim()` will throw an error. This bound is ignored if `par0` is supplied.
`maxit`	Integer scalar. The maximum number of iterations to be undertaken by `optim()`. What happens if this number is exceeded depends on the value of `options()[["maxitErrorOrWarning"]]`. This may be `"error"` (in which case an error is thrown if `maxit` is exceeded) or `"warning"` (in which case a warning is issued). The values is set equal to `"error"` at startup. It may be switched, from on possibility to the other, by means of the function `set.eow()`.
`covmat`	Logical scalar. Should the covariance matrix of the parameter estimates be calculated? In simulation studies, in which the covariance matrix is not of interest, calculations might be speeded up a bit by setting `covmat=FALSE`.
`useGinv`	Logical scalar. Should the `ginv()` (generalised inverse) function from the `MASS` package be used to calculate a surrogate covariance matrix if the hessian is numerically singular? This is probably not advisable; the possibility of using the generalised inverse is provided for the sake of completeness. Caveat utilitor. This argument is ignored if `covmat` is `FALSE`.

The ntop and zeta parameters must be supplied; they are not formally estimated (although the choice of ntop may be influenced by the data — see below). The parameter zeta has a default value, FALSE.

If the generating mechanism from which the observed data x arose has a (known) theoretical least upper bound, then ntop should probably be set equal to this upper bound. If the data are theoretically unbounded, then ntop should probably be set equal to 1 + max(x). In this case Pr(X=ntop) should probably be interpreted as Pr(X >= ntop). Otherwise ntop should should probably be set equal to max(x). The choice depends on circumstances and is up to the user.

Missing values are removed from x before it is passed to optim(). (Note that ddb() doesn't mind missing values but returns missing values when evaluated at them. This in turn produces a missing value for the log likelihood.)

In previous versions of this package (0.1-17 and earlier) optim() was called with method "L-BFGS-B". The change was made possible by the fact that, with the new “direct” version of ddb(), it is no longer necessary to bound the parameters away from (above) zero.

An object of class "mleDb". Such an object consists of a named vector with entries "alpha" and "beta", which are the estimates of the corresponding parameters. It has a number of attributes:

"ntop" The value of the ntop argument.
"zeta" The value of the zeta argument.
"log.like" The (maximised) value of the log likelihood of the data.
"covMat" An estimate of the (2 x 2) covariance matrix of the parameter estimates. This is formed as the inverse of the hessian (of the negative log likelihood) calculated by aHess().
ndata The number of non-missing values in the data set for which the likelihood was maximised, i.e. sum(!is.na(x)).

Rolf Turner r.turner@auckland.ac.nz

ddb meDb() optim() aHess() vcov.mleDb()

set.seed(42)
x <- rdb(500,3,5,2)
ests <- mleDb(x,2)  # Bad!  Mind you, 2 is a "bad" value for ntop!
                    # Hessian is singular; covMat is NA.
# Get much better results using true parameter values
# as starting values; pity we can't do this in real life!
ests <- mleDb(x,2,par0=c(alpha=3,beta=5))
x <- rdb(500,3,5,20)
ests <- mleDb(x,20) # Pretty good.
print(vcov(ests))

# Binomial, n = 10, p = 0.3.
set.seed(42)
x   <- rbinom(1000,10,0.3)
fit <- mleDb(x,10,zeta=TRUE)
print(vcov(fit))
p1  <- dbinom(0:10,10,0.3)
p2  <- dbinom(0:10,10,mean(x)/10)
p3  <- table(factor(x,levels=0:10))/1000
plot(fit,obsd=x,legPos=NULL,ylim=c(0,max(p1,p2,p3,
     ddb(0:10,fit[1],fit[2],10,zeta=TRUE))))
lines(0.2+(0:10),p1,col="orange",type="h",ylim=c(0,max(p1,p2)))
lines(0.3+(0:10),p2,col="green",type="h")
legend("topright",lty=1,col=c("red","blue","orange","green"),
       legend=c("db","observed","true binomial","fitted binomial"),bty="n")
print(attr(fit,"log.like")) # -1778.36
print(sum(dbinom(x,10,mean(x)/10,log=TRUE))) # -1777.36
# Slightly better fit with only one estimated parameter,
# but then binomial is the true distribution, so you'd
# kind of expect a better fit.
print(sum(dbinom(x,10,0.3,log=TRUE))) # -1778.37

# Poisson mean = 5
set.seed(42)
x    <- rpois(1000,5)
fit  <- mleDb(x,14,zeta=TRUE) # max(x) = 13, take ntop = 1+13
print(vcov(fit))
p1   <- c(dpois(0:13,5),1-ppois(13,5))
lhat <- mean(x)
p2   <- c(dpois(0:13,lhat),1-ppois(13,lhat))
plot(fit,obsd=x,legPos=NULL,ylim=c(0,max(p1,p2,p3,
     ddb(0:14,fit[1],fit[2],14,zeta=TRUE))))
lines(0.2+0:14,p1,col="orange",type="h")
lines(0.3+(0:14),p2,col="green",type="h")
legend("topright",lty=1,col=c("red","blue","orange","green"),
       legend=c("db","observed","true Poisson","fitted Poisson"),bty="n")
print(attr(fit,"log.like")) # -2198.594
print(sum(dpois(x,lhat,log=TRUE))) # -2197.345
# Slightly better fit with only one estimated parameter,
# but then Poisson is the true distribution, so you'd
# kind of expect a better fit.
print(sum(dpois(x,5,log=TRUE))) # -2198.089