MDgof-hybrid"

knitr::opts_chunk$set(error=TRUE,
  collapse = TRUE,
  comment = "#>"
)
library(MDgof)
B=200 
st=function() knitr::knit_exit()
examples=MDgof::hybrid.mdgof.vignette

Note the examples are not actually run but their results are stored in the included data set hybrid.mdgof.vignette. This is done to satisfy CRAN submission rules regarding the execution time of vignettes. If you want to run the Rmarkdown file yourself, set ReRunExamples=TRUE.

ReRunExamples=FALSE

In a standard goodness-of-fit problem we have a data set $x$ and a probability model, given in form of the cumulative distribution function $F$, and want to test $H_0: X\sim F$, that is the data set was generated by $F$.

There are however circumstances where evaluation of the function $F(x)$ is difficult or even impossible, especially for high-dimensional data. It is however possible to generate data from $F$. Then a test can be run as follows:

  1. Generate a Monte Carlo data set $y$ from $F$.
  2. Run a two-sample test n $x$ and $y$.

In what follows we will call this a hybrid test.

One additional "feature" of this approach is that often we are free to choose the sample size of $y$. Generally on would expect the power of the tests to increase.

The companion R package MD2sample has a large number of methods already implemented and is used in routines that follow.

Example 1: Continuous data without parameter estimation

We generate a two-dimensional data set of size 100 from a multivariate standard normal distribution and run the test with the null hypothesis $H_0:X\sim N(\mathbf{0}, \mathbf{I_2})$:

set.seed(123)
rnull=function() mvtnorm::rmvnorm(100, c(0, 0))
x=rnull()
examples[["ex1a"]]=hybrid_test(x, rnull, B=B)
examples[["ex1a"]]

$B$ is the number of simulation runs to estimated the p-value, $B=5000$ is the default but if the data set is large a smaller value should suffice.

Arguments of hybrid_test for continuous data

Example 2: Continuous data with parameter estimation

In this example we will test whether the data comes from bivariate normal distribution, with means and variance-covariance estimated from the data.

rnull=function(p) mvtnorm::rmvnorm(200, 
      mean=p[1:2],
      sigma=matrix(c(p[3], p[5], p[5], p[4]), 2, 2))
phat=function(x) 
   c(apply(x, 2, mean), c(apply(x, 2, var), cov(x[,1],x[,2])))
x=rnull(c(1, 2, 4, 9,0.6))
phat(x)
examples[["ex2a"]]=hybrid_test(x, rnull, phat, B=B)
examples[["ex2a"]]

In contrast in the next case we will generate data from a multivariate t distribution with 5 degrees of freedom. We also use a larger sample size.

y=mvtnorm::rmvt(300, sigma=diag(2), df=5)
examples[["ex2b"]]=hybrid_test(y, rnull, phat, B=B)
examples[["ex2b"]]

New Tests

hybrid_test allows the user to supply their own (two-sample) test method, either one that finds its own p-value or a method that requires simulation.

The routine chiTS.cont (included in the MD2sample package) finds either the test statistic or the p value of a chi square test in two dimensions. First we need to set

TSextra=list(which="statistic", nbins=cbind(c(3,3), c(3,4)))

this list is passed to chiTS.cont and tells the routine to

Example 3

In this example we use the same setup as in example 2.

examples[["ex3a"]]=hybrid_test(x, rnull, phat, 
         TS=MD2sample::chiTS.cont, TSextra=TSextra,  
         B=B)
examples[["ex3a"]]
examples[["ex3b"]]=hybrid_test(y, rnull, phat, 
         TS=MD2sample::chiTS.cont, TSextra=TSextra,  
         B=B)
examples[["ex3b"]]

Arguments and output of new test routine for continuous data

see the vignette MD2sample.

Power Estimation

The routine hybrid_power allows the user to estimate the power of the tests via two-sample methods.

Example 4

Let's say we want to estimate the power when the null hypothesis specifies a standard bivariate normal distribution, but the data actually comes from a bivariate normal distribution with mean vector $(0,0)$, variances equal to 1 and a covariance $\rho$. We first need a function that generates these data sets:

rnull=function() mvtnorm::rmvnorm(100, c(0, 0))
ralt=function(s) mvtnorm::rmvnorm(100, c(0, 0),
                sigma=matrix(c(1, s, s, 1), 2, 2))

Now we can run

examples[["ex4a"]]=hybrid_power(rnull, ralt, 
      c(0, 0.8), B=B, maxProcessor=1)
examples[["ex4a"]]

Arguments of hybrid_power

Again the user can provide their own routine:

TSextra=list(which="pvalue", nbins=c(3,4))
examples[["ex4b"]]=hybrid_power(rnull, ralt, c(0, 0.8), TS=MD2sample::chiTS.cont, 
          TSextra=TSextra, B=500, 
          With.p.value=TRUE)
examples[["ex4b"]]

Discrete Data

Example 5

We consider the case of data from binomial distributions:

rnull=function() {
   x=rbinom(1000, 5, 0.5)
   y=rbinom(1000, 3, 0.5+x/20)
   MDgof::sq2rec(table(x, y))
}
x=rnull()
examples[["ex5"]]=hybrid_test(x, rnull, B=B)
examples[["ex5"]]

The MDgof routine sq2rec above changes the format of the data from that output by the table command to what is required by hybrid_test.

The arguments of hybrid_test for discrete data are the same as those for continuous data. The routines determine that the data is discrete if x is a matrix with three columns and the the values in the third column are integers.

User supplied tests

The routine chiTS.disc (included in the MD2sample package) does a chi-square test for discrete data:

TSextra=list(which="statistic")
examples[["ex6c"]]=hybrid_test(x, rnull,  TS=MD2sample::chiTS.disc, TSextra=TSextra,  B=B)
examples[["ex6c"]]

Power Estimation

Power estimation for discrete data is done the same way as for continuous data:

Example 7

We consider again the case of data from binomial distributions as in example 5. As an alternative we change the distribution of Y.

rnull=function() {
   x=rbinom(1000, 5, 0.5)
   y=rbinom(1000, 3, 0.5+x/20)
   MDgof::sq2rec(table(x, y))
}
ralt=function(p) {
   x=rbinom(1000, 5, p)
   y=rbinom(1000, 3, 0.5+x/20)
   MDgof::sq2rec(table(x, y))
}
x=ralt(0.475)
examples[["ex7b"]]= hybrid_power(rnull, ralt, c(0.5, 0.475), B=200)
examples[["ex7b"]]

or using a user-supplied test that find a p value:

TSextra=list(which="pvalue")
examples[["ex7d"]]=hybrid_power(rnull, ralt, c(0.5, 0.475), TS=MD2sample::chiTS.disc, TSextra=TSextra,  
         With.p.value = TRUE, B=B)
examples[["ex7d"]]
saveRDS(examples, "hybrid.mdgof.vignette.rds")


Try the MDgof package in your browser

Any scripts or data that you put into this service are public.

MDgof documentation built on Feb. 13, 2026, 1:06 a.m.