# dmixnorm: The Normal Mixture Distribution In KScorrect: Lilliefors-Corrected Kolmogorov-Smirnov Goodness-of-Fit Tests

## Description

Density, distribution function, quantile function, and random generation for a univariate (one-dimensional) distribution composed of a mixture of normal distributions with means equal to `mean`, standard deviations equal to `sd`, and mixing proportion of the components equal to `pro`.

## Usage

 ```1 2 3 4 5 6 7``` ```dmixnorm(x, mean, sd, pro) pmixnorm(q, mean, sd, pro) qmixnorm(p, mean, sd, pro, expand = 1) rmixnorm(n, mean, sd, pro) ```

## Arguments

 `x` Vector of quantiles. `mean` Vector of means, one for each component. `sd` Vector of standard deviations, one for each component. If a single value is provided, an equal-variance mixture model is implemented. Must be non-negative. `pro` Vector of mixing proportions, one for each component. If missing, an equal-proportion model is implemented, with a warning. If proportions do not sum to unity, they are rescaled to do so. Must be non-negative. `q` Vector of quantiles. `p` Vector of probabilities. `expand` Value to expand the range of probabilities for quantile approximation. `Default = 1.0`. See `details` below. `n` Number of observations.

## Details

These functions use, modify, and wrap around those from the `mclust` package, especially `dens`, and `sim`. Functions are slightly faster than the corresponding `mclust` functions when used with univariate distributions.

Unlike `mclust`, which primarily focuses on parameter estimation based on mixture samples, the functions here are modified to calculate PDFs, CDFs, approximate quantiles, and random numbers for mixture distributions with user-specified parameters. The functions are written to emulate the syntax of other R distribution functions (e.g., `Normal`).

The number of mixture components (argument `G` in `mclust`) is specified from the length of the `mean` vector. If a single `sd` value is provided, an equal-variance mixture model (`modelNames="E"` in `mclust`) is implemented; if multiple values are provided, a variable-variance model (`modelNames="V"` in `mclust`) is implemented. If mixing proportion `pro` is missing, all components are assigned equal mixing proportions, with a warning. Mixing proportions are rescaled to sum to unity. If the lengths of supplied means, standard deviations, and mixing proportions conflict, an error is called.

Analytical solutions are not available to calculate a quantile function for all combinations of mixture parameters. `qmixnorm` approximates the quantile function using a spline function calculated from cumulative density functions for the specified mixture distribution. Quantile values for probabilities near zero and one are approximated by taking a randomly generated sample (with sample size equal to the product of 1000 and the number of mixture components), and expanding that range positively and negatively by a multiple (specified by `(default) expand = 1`) of the observed range in the random sample. In cases where the distribution range is large (such as when mixture components are discrete or there are large distances between components), resulting extreme probability values will be very close to zero or one and can result in non-calculable (`NaN`) quantiles (and a warning). Use of other `expand` values (especially `expand < 1.0` that expand the ranges by smaller multiples) often will yield improved approximations. Note that `expand` values equal to or close to 0 may result in inaccurate approximation of extreme quantiles. In situations requiring extreme quantile values, it is recommended that the largest `expand` value that does not result in a non-calculable quantile (i.e., no warning called) be used. See `examples` for confirmation that approximations are accurate, comparing the approximate quantiles from a single 'mixture' distribution to those calculated for the same distribution using `qnorm`, and demonstrating cases in which using non-default `expand` values will allow correct approximation of quantiles.

## Value

`dmixnorm` gives the density, `pmixnorm` gives the distribution function, `qmixnorm` approximates the quantile function, and `rmixnorm` generates random numbers.

## Author(s)

Phil Novack-Gottshall pnovack-gottshall@ben.edu and Steve Wang scwang@swarthmore.edu, based on functions written by Luca Scrucca.

`Distributions` for other standard distributions, and `mclust::dens`, `sim`, and `cdfMclust` for alternative density, quantile, and random number functions for multivariate mixture distributions.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53``` ```# Mixture of two normal distributions mean <- c(3, 6) pro <- c(.25, .75) sd <- c(.5, 1) x <- rmixnorm(n=5000, mean=mean, pro=pro, sd=sd) hist(x, n=20, main="random bimodal sample") ## Not run: # Requires functions from the 'mclust' package require(mclust) # Confirm 'rmixnorm' above produced specified model mod <- mclust::Mclust(x) mod # Best model (correctly) has two-components with unequal variances mod\$parameters # and approximately same parameters as specified above sd^2 # Note reports var (sigma-squared) instead of sd used above ## End(Not run) # Density, distribution, and quantile functions plot(seq(0, 10, .1), dmixnorm(seq(0, 10, .1), mean=mean, sd=sd, pro=pro), type="l", main="Normal mixture density") plot(seq(0, 10, .1), pmixnorm(seq(0, 10, .1), mean=mean, sd=sd, pro=pro), type="l", main="Normal mixture cumulative") plot(stats::ppoints(100), qmixnorm(stats::ppoints(100), mean=mean, sd=sd, pro=pro), type="l", main="Normal mixture quantile") # Any number of mixture components are allowed plot(seq(0, 50, .01), pmixnorm(seq(0, 50, .01), mean=1:50, sd=.05, pro=rep(1, 50)), type="l", main="50-component normal mixture cumulative") # 'expand' can be specified to prevent non-calculable quantiles: q1 <- qmixnorm(stats::ppoints(30), mean=c(1, 20), sd=c(1, 1), pro=c(1, 1)) q1 # Calls a warning because of NaNs # Reduce 'expand'. (Values < 0.8 allow correct approximation) q2 <- qmixnorm(stats::ppoints(30), mean=c(1, 20), sd=c(1, 1), pro=c(1, 1), expand=.5) plot(stats::ppoints(30), q2, type="l", main="Quantile with reduced range") ## Not run: # Requires functions from the 'mclust' package # Confirmation that qmixnorm approximates correct solution # (single component 'mixture' should mimic qnorm): x <- rmixnorm(n=5000, mean=0, pro=1, sd=1) mpar <- mclust::Mclust(x)\$param approx <- qmixnorm(p=ppoints(100), mean=mpar\$mean, pro=mpar\$pro, sd=sqrt(mpar\$variance\$sigmasq)) known <- qnorm(p=ppoints(100), mean=mpar\$mean, sd=sqrt(mpar\$variance\$sigmasq)) cor(approx, known) # Approximately the same plot(approx, main="Quantiles for (unimodal) normal") lines(known) legend("topleft", legend=c("known", "approximation"), pch=c(NA,1), lty=c(1, NA), bty="n") ## End(Not run) ```

### Example output       ```Loading required package: mclust
Package 'mclust' version 5.3
Type 'citation("mclust")' for citing this R package in publications.
'Mclust' model object:
best model: univariate, unequal variance (V) with 2 components
\$pro
 0.2440519 0.7559481

\$mean
1        2
2.982775 5.995598

\$variance
\$variance\$modelName
 "V"

\$variance\$d
 1

\$variance\$G
 2

\$variance\$sigmasq
 0.241333 1.026897

\$variance\$scale
 0.241333 1.026897

 0.25 1.00
Warning message:
In pmixnorm(seq(0, 50, 0.01), mean = 1:50, sd = 0.05, pro = rep(1,  :
'equal variance model' implemented. If want 'variable-variance model', specify remaining 'sd's.
Warning message:
In qmixnorm(stats::ppoints(30), mean = c(1, 20), sd = c(1, 1), pro = c(1,  :
Some quantile values could not be calculated. If all 'p's are within [0,1], try reducing the value of 'expand' and try again.
 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
 1
```

KScorrect documentation built on July 4, 2019, 1:02 a.m.