Description Usage Arguments Details Value Notes Remark on the Examples Author(s) References See Also Examples

Either a chi-squared or a Monte Carlo test of goodness of fit of a db distribution.

1 2 3 4 5 6 7 |

`object` |
An object of class |

`obsd` |
The data to which |

`...` |
Not used. |

`test` |
Logical scalar. Should a hypothesis test be carried out? If |

`MC` |
Logical scalar. Should a Monte Carlo test be used rather than a chi squared test? |

`seed` |
Integer scalar. The seed for the random number generator used
when |

`nsim` |
The number of simulated replicates on which the Monte Carlo test is
to be based. Ignored if |

`maxit` |
Integer scalar. The maximum number of iterations to be undertaken
by |

`verb` |
Logical scalar. Should rudimentary “progress reports” be
issued during the course of the simulations invoked by the Monte
Carlo test procedure? Ignored if |

The function `gof()`

is a generic function with two methods,
`gof.mleDb()`

and `gof.mleBb()`

.

The test statistic is calculated as

*Sum((O-E)^2/E)*

where *O* means “observed” and *E* means “expected”.
If the mean of *E* is less than 5 or if any of the entries of *E*
is less than 1, then the chi squared test is invalid and a warning to this
effect is issued. In this case the expected values are returned as an
attribute of the value returned by `gof()`

. The foregoing applies
of course only if a chi squared test (as opposed to a Monte Carlo test)
is being used.

The degrees of freedom for the chi squared test are `length(E) - 3`

.
The value 3 is equal to 2 (for the number of parameters estimated) plus
1 (for the costraint that the probabilities of the values sum to 1).

If it were actually true that, under the null hypothesis, the
observed test statistic and those calculated from simulated
data are *exchangeable*, the Monte Carlo test would
be *exact*. However the real data are distributed as
*f(x,theta)* whereas the simulated data
are distributed as *f(x,theta.hat)*
where *theta.hat* is the estimate of
*theta* based on the observed data. Consequently the
observed test statistic and simulated test statistics are
“not quite” exchangeable. Nevertheless it appears that
in practice the Monte Carlo test is very close to being exact.

The meaning of “exact” here is that if the null hypothesis
is true then, over the set of instances of collecting the data
**and** simulating the required replicates, the *p*-value
is uniformly distributed on the set *\{1/N, 2/N, …,
(N-1)/N, 1\}* where *N* is equal to `nsim`

.

A list with components

`stat` |
The test statistic. |

`pval` |
The p-value of the test. |

`degFree` |
The degrees of freedom of the chi squared test. |

The last component is present only if a chi squared test (rather than a Monte Carlo test) is used.

If a chi squared test is used and turns out to be invalid, then
the returned value has an attribute `"expVals"`

, consisting
of the (problematic) expected values.

If a Monte Carlo test is used the returned value has an attribute
`"seed"`

which is equal to the `seed`

argument or to the
random value selected to replace it if the `seed`

argument was
not supplied.

The Monte Carlo *p*-value is calculated as
`(m+1)/(nsim+1)`

where `m`

is the number of simulated
statistics which greater than or equal to the observed statistic
(computed from the “real” data.

The *smallest* that the Monte Carlo
*p*-value can be is `1/(nsim + 1)`

, e.g. 0.01 when
`nsim`

is 99. For “finer distinctions” you must use
larger values of `nsim`

, such as 999 or 9999.

The *p*-value is *random*; if you repeat the test (with
the same data) you may well get a different *p*-value.
Resist the temptation to repeat the test until you get a
*p*-value that you like!!! This invalidates your inference!

In the **Examples**, db and beta binomial distributions are
fitted to the *Parsonnet scores* from the `cardiacsurgery`

data set which comes from the `spcadjust`

package. It is not
completely clear what the value of `ntop`

(db distribution)
or `size`

(beta binomial distribution) should be. The data
are not actually counts, and in particular they are not counts
of successes out of a given number (“`size`

”) of trials.
In the event I chose to use the value 71, the maximium value of the
Parsonnet scores, for the value of both `ntop`

and `size`

.
This was the value chosen for use as `size`

by Wittenberg
(2021) when he fitted a beta binomial distribution to these data.

Rolf Turner r.turner@auckland.ac.nz

Philipp Wittenberg (2021). Modeling the patient mix for
risk-adjusted CUSUM charts. To appear in *Statistical
Methods in Medical Research*.

Axel Gandy and Jan Terje Kvaloy (2013). Guaranteed
conditional performance of control charts via bootstrap
methods. *Scandinavian Journal of Statistics* **40**,
pp. 647–668. (Reference for `spcadjust`

package.)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | ```
X <- hmm.discnp::Downloads
f <- mleDb(X,15,TRUE)
tst1 <- gof(f,X) # Gives warning that the chi squared test is invalid.
tst2 <- gof(f,X,MC=TRUE,seed=42)
# The p-value is 0.03 so we reject the adequacy of the fit at the 0.05
# significance level. Note that the p-value that we get, when the
# random number generator seed is set equal to 42, is very similar in
# value to the p-value (0.0347) from the "invalid" chi squared test.
#
## Not run: # Takes too long.
if(requireNamespace("spcadjust")) {
data("cardiacsurgery", package = "spcadjust")
xxx <- cardiacsurgery$Parsonnet
fit1 <- mleDb(xxx,ntop=71,zeta=TRUE)
g1 <- gof(fit1,obsd=xxx,MC=TRUE,verb=TRUE,seed=42)
fit2 <- mleBb(xxx,size=71)
g2 <- gof(fit2,obsd=xxx,MC=TRUE,verb=TRUE,seed=17)
}
## End(Not run)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.