MDgof-hybrid"
In MDgof: Various Methods for the Goodness-of-Fit Problem in D>1 Dimensions

knitr::opts_chunk$set(error=TRUE,
  collapse = TRUE,
  comment = "#>"
)

library(MDgof)
B=200 
st=function() knitr::knit_exit()
examples=MDgof::hybrid.mdgof.vignette

Note the examples are not actually run but their results are stored in the included data set hybrid.mdgof.vignette. This is done to satisfy CRAN submission rules regarding the execution time of vignettes. If you want to run the Rmarkdown file yourself, set ReRunExamples=TRUE.

ReRunExamples=FALSE

In a standard goodness-of-fit problem we have a data set $x$ and a probability model, given in form of the cumulative distribution function $F$, and want to test $H_0: X\sim F$, that is the data set was generated by $F$.

There are however circumstances where evaluation of the function $F(x)$ is difficult or even impossible, especially for high-dimensional data. It is however possible to generate data from $F$. Then a test can be run as follows:

Generate a Monte Carlo data set $y$ from $F$.
Run a two-sample test n $x$ and $y$.

In what follows we will call this a hybrid test.

One additional "feature" of this approach is that often we are free to choose the sample size of $y$. Generally on would expect the power of the tests to increase.

The companion R package MD2sample has a large number of methods already implemented and is used in routines that follow.

Example 1: Continuous data without parameter estimation

We generate a two-dimensional data set of size 100 from a multivariate standard normal distribution and run the test with the null hypothesis $H_0:X\sim N(\mathbf{0}, \mathbf{I_2})$:

set.seed(123)

rnull=function() mvtnorm::rmvnorm(100, c(0, 0))
x=rnull()

examples[["ex1a"]]=hybrid_test(x, rnull, B=B)

examples[["ex1a"]]

$B$ is the number of simulation runs to estimated the p-value, $B=5000$ is the default but if the data set is large a smaller value should suffice.

Arguments of hybrid_test for continuous data

x a matrix of numbers with different observations in different rows (for continuous data).
rnull routine that generates data under the null hypothesis.
phat routine that finds parameter estimates.
TS a routine to calculate test statistics or p values for new tests.
TSextra additional info passed to TS, if necessary.
maxProcessor maximum number of cores to use. If set to 1 no parallel processing is used.
doMethods="all" names of method to include.
B=5000 number of simulation runs.

Example 2: Continuous data with parameter estimation

In this example we will test whether the data comes from bivariate normal distribution, with means and variance-covariance estimated from the data.

rnull=function(p) mvtnorm::rmvnorm(200, 
      mean=p[1:2],
      sigma=matrix(c(p[3], p[5], p[5], p[4]), 2, 2))
phat=function(x) 
   c(apply(x, 2, mean), c(apply(x, 2, var), cov(x[,1],x[,2])))
x=rnull(c(1, 2, 4, 9,0.6))
phat(x)

examples[["ex2a"]]=hybrid_test(x, rnull, phat, B=B)

examples[["ex2a"]]

In contrast in the next case we will generate data from a multivariate t distribution with 5 degrees of freedom. We also use a larger sample size.

y=mvtnorm::rmvt(300, sigma=diag(2), df=5)

examples[["ex2b"]]=hybrid_test(y, rnull, phat, B=B)

examples[["ex2b"]]

New Tests

hybrid_test allows the user to supply their own (two-sample) test method, either one that finds its own p-value or a method that requires simulation.

The routine chiTS.cont (included in the MD2sample package) finds either the test statistic or the p value of a chi square test in two dimensions. First we need to set

TSextra=list(which="statistic", nbins=cbind(c(3,3), c(3,4)))

this list is passed to chiTS.cont and tells the routine to

return the test statistic (and not the p value)
Run two tests, first with 3 and 3 bins in the x and y direction and then 3 and 4 bins.

Example 3

In this example we use the same setup as in example 2.

examples[["ex3a"]]=hybrid_test(x, rnull, phat, 
         TS=MD2sample::chiTS.cont, TSextra=TSextra,  
         B=B)

examples[["ex3a"]]

examples[["ex3b"]]=hybrid_test(y, rnull, phat, 
         TS=MD2sample::chiTS.cont, TSextra=TSextra,  
         B=B)

examples[["ex3b"]]

Arguments and output of new test routine for continuous data

see the vignette MD2sample.

Power Estimation

The routine hybrid_power allows the user to estimate the power of the tests via two-sample methods.

Example 4

Let's say we want to estimate the power when the null hypothesis specifies a standard bivariate normal distribution, but the data actually comes from a bivariate normal distribution with mean vector $(0,0)$, variances equal to 1 and a covariance $\rho$. We first need a function that generates these data sets:

rnull=function() mvtnorm::rmvnorm(100, c(0, 0))
ralt=function(s) mvtnorm::rmvnorm(100, c(0, 0),
                sigma=matrix(c(1, s, s, 1), 2, 2))

Now we can run

examples[["ex4a"]]=hybrid_power(rnull, ralt, 
      c(0, 0.8), B=B, maxProcessor=1)

examples[["ex4a"]]

Arguments of hybrid_power

rnull routine that generates data under the null hypothesis.
ralt routine that generates data under the alternative hypothesis.
param_alt vector of parameter values passed to ralt.
phat routine that finds parameter estimates.
TS a routine to calculate test statistics or p values for new tests.
TSextra additional info passed to TS, if necessary.
With.p.value=FALSE does user supplied routine return p values?
alpha=0.05 desired type I error probability.
maxProcessor maximum number of cores to use. If set to 1 no parallel processing is used.
B=1000 number of simulation runs.

Again the user can provide their own routine:

TSextra=list(which="pvalue", nbins=c(3,4))

examples[["ex4b"]]=hybrid_power(rnull, ralt, c(0, 0.8), TS=MD2sample::chiTS.cont, 
          TSextra=TSextra, B=500, 
          With.p.value=TRUE)

examples[["ex4b"]]

Discrete Data

Example 5

We consider the case of data from binomial distributions:

rnull=function() {
   x=rbinom(1000, 5, 0.5)
   y=rbinom(1000, 3, 0.5+x/20)
   MDgof::sq2rec(table(x, y))
}
x=rnull()

examples[["ex5"]]=hybrid_test(x, rnull, B=B)

examples[["ex5"]]

The MDgof routine sq2rec above changes the format of the data from that output by the table command to what is required by hybrid_test.

The arguments of hybrid_test for discrete data are the same as those for continuous data. The routines determine that the data is discrete if x is a matrix with three columns and the the values in the third column are integers.

User supplied tests

The routine chiTS.disc (included in the MD2sample package) does a chi-square test for discrete data:

TSextra=list(which="statistic")

examples[["ex6c"]]=hybrid_test(x, rnull,  TS=MD2sample::chiTS.disc, TSextra=TSextra,  B=B)

examples[["ex6c"]]

Power Estimation

Power estimation for discrete data is done the same way as for continuous data:

Example 7

We consider again the case of data from binomial distributions as in example 5. As an alternative we change the distribution of Y.

rnull=function() {
   x=rbinom(1000, 5, 0.5)
   y=rbinom(1000, 3, 0.5+x/20)
   MDgof::sq2rec(table(x, y))
}
ralt=function(p) {
   x=rbinom(1000, 5, p)
   y=rbinom(1000, 3, 0.5+x/20)
   MDgof::sq2rec(table(x, y))
}
x=ralt(0.475)

examples[["ex7b"]]= hybrid_power(rnull, ralt, c(0.5, 0.475), B=200)

examples[["ex7b"]]

or using a user-supplied test that find a p value:

TSextra=list(which="pvalue")

examples[["ex7d"]]=hybrid_power(rnull, ralt, c(0.5, 0.475), TS=MD2sample::chiTS.disc, TSextra=TSextra,  
         With.p.value = TRUE, B=B)

examples[["ex7d"]]

saveRDS(examples, "hybrid.mdgof.vignette.rds")

Any scripts or data that you put into this service are public.

MDgof documentation built on Feb. 13, 2026, 1:06 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

MDgof
Various Methods for the Goodness-of-Fit Problem in D>1 Dimensions

MDgof-hybrid"
In MDgof: Various Methods for the Goodness-of-Fit Problem in D>1 Dimensions

Example 1: Continuous data without parameter estimation

Example 2: Continuous data with parameter estimation

New Tests

Example 3

Power Estimation

Example 4

Discrete Data

Example 5

User supplied tests

Power Estimation

Example 7

Try the MDgof package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

MDgof Various Methods for the Goodness-of-Fit Problem in D>1 Dimensions

MDgof-hybrid" In MDgof: Various Methods for the Goodness-of-Fit Problem in D>1 Dimensions

Example 1: Continuous data without parameter estimation

Example 2: Continuous data with parameter estimation

New Tests

Example 3

Power Estimation

Example 4

Discrete Data

Example 5

User supplied tests

Power Estimation

Example 7

Try the MDgof package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

MDgof
Various Methods for the Goodness-of-Fit Problem in D>1 Dimensions

MDgof-hybrid"
In MDgof: Various Methods for the Goodness-of-Fit Problem in D>1 Dimensions