dmixnorm: The Normal Mixture Distribution

Description Usage Arguments Details Value Author(s) See Also Examples

Description

Density, distribution function, quantile function, and random generation for a univariate (one-dimensional) distribution composed of a mixture of normal distributions with means equal to mean, standard deviations equal to sd, and mixing proportion of the components equal to pro.

Usage

1
2
3
4
5
6
7
dmixnorm(x, mean, sd, pro)

pmixnorm(q, mean, sd, pro)

qmixnorm(p, mean, sd, pro, expand = 1)

rmixnorm(n, mean, sd, pro)

Arguments

x

Vector of quantiles.

mean

Vector of means, one for each component.

sd

Vector of standard deviations, one for each component. If a single value is provided, an equal-variance mixture model is implemented. Must be non-negative.

pro

Vector of mixing proportions, one for each component. If missing, an equal-proportion model is implemented, with a warning. If proportions do not sum to unity, they are rescaled to do so. Must be non-negative.

q

Vector of quantiles.

p

Vector of probabilities.

expand

Value to expand the range of probabilities for quantile approximation. Default = 1.0. See details below.

n

Number of observations.

Details

These functions use, modify, and wrap around those from the mclust package, especially dens, and sim. Functions are slightly faster than the corresponding mclust functions when used with univariate distributions.

Unlike mclust, which primarily focuses on parameter estimation based on mixture samples, the functions here are modified to calculate PDFs, CDFs, approximate quantiles, and random numbers for mixture distributions with user-specified parameters. The functions are written to emulate the syntax of other R distribution functions (e.g., Normal).

The number of mixture components (argument G in mclust) is specified from the length of the mean vector. If a single sd value is provided, an equal-variance mixture model (modelNames="E" in mclust) is implemented; if multiple values are provided, a variable-variance model (modelNames="V" in mclust) is implemented. If mixing proportion pro is missing, all components are assigned equal mixing proportions, with a warning. Mixing proportions are rescaled to sum to unity. If the lengths of supplied means, standard deviations, and mixing proportions conflict, an error is called.

Analytical solutions are not available to calculate a quantile function for all combinations of mixture parameters. qmixnorm approximates the quantile function using a spline function calculated from cumulative density functions for the specified mixture distribution. Quantile values for probabilities near zero and one are approximated by taking a randomly generated sample (with sample size equal to the product of 1000 and the number of mixture components), and expanding that range positively and negatively by a multiple (specified by (default) expand = 1) of the observed range in the random sample. In cases where the distribution range is large (such as when mixture components are discrete or there are large distances between components), resulting extreme probability values will be very close to zero or one and can result in non-calculable (NaN) quantiles (and a warning). Use of other expand values (especially expand < 1.0 that expand the ranges by smaller multiples) often will yield improved approximations. Note that expand values equal to or close to 0 may result in inaccurate approximation of extreme quantiles. In situations requiring extreme quantile values, it is recommended that the largest expand value that does not result in a non-calculable quantile (i.e., no warning called) be used. See examples for confirmation that approximations are accurate, comparing the approximate quantiles from a single 'mixture' distribution to those calculated for the same distribution using qnorm, and demonstrating cases in which using non-default expand values will allow correct approximation of quantiles.

Value

dmixnorm gives the density, pmixnorm gives the distribution function, qmixnorm approximates the quantile function, and rmixnorm generates random numbers.

Author(s)

Phil Novack-Gottshall pnovack-gottshall@ben.edu and Steve Wang scwang@swarthmore.edu, based on functions written by Luca Scrucca.

See Also

Distributions for other standard distributions, and mclust::dens, sim, and cdfMclust for alternative density, quantile, and random number functions for multivariate mixture distributions.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# Mixture of two normal distributions
mean <- c(3, 6)
pro <- c(.25, .75)
sd <- c(.5, 1)
x <- rmixnorm(n=5000, mean=mean, pro=pro, sd=sd)
hist(x, n=20, main="random bimodal sample")

## Not run: 
# Requires functions from the 'mclust' package
require(mclust)
# Confirm 'rmixnorm' above produced specified model
mod <- mclust::Mclust(x)
mod             # Best model (correctly) has two-components with unequal variances
mod$parameters	# and approximately same parameters as specified above
sd^2            # Note reports var (sigma-squared) instead of sd used above

## End(Not run)

# Density, distribution, and quantile functions
plot(seq(0, 10, .1), dmixnorm(seq(0, 10, .1), mean=mean, sd=sd, pro=pro),
     type="l", main="Normal mixture density")
plot(seq(0, 10, .1), pmixnorm(seq(0, 10, .1), mean=mean, sd=sd, pro=pro),
     type="l", main="Normal mixture cumulative")
plot(stats::ppoints(100), qmixnorm(stats::ppoints(100), mean=mean, sd=sd, pro=pro),
     type="l", main="Normal mixture quantile")

# Any number of mixture components are allowed
plot(seq(0, 50, .01), pmixnorm(seq(0, 50, .01), mean=1:50, sd=.05, pro=rep(1, 50)),
     type="l", main="50-component normal mixture cumulative")

# 'expand' can be specified to prevent non-calculable quantiles:
q1 <- qmixnorm(stats::ppoints(30), mean=c(1, 20), sd=c(1, 1), pro=c(1, 1))
q1 # Calls a warning because of NaNs
# Reduce 'expand'. (Values < 0.8 allow correct approximation)
q2 <- qmixnorm(stats::ppoints(30), mean=c(1, 20), sd=c(1, 1), pro=c(1, 1), expand=.5)
plot(stats::ppoints(30), q2, type="l", main="Quantile with reduced range")

## Not run: 
# Requires functions from the 'mclust' package
# Confirmation that qmixnorm approximates correct solution
#   (single component 'mixture' should mimic qnorm):
x <- rmixnorm(n=5000, mean=0, pro=1, sd=1)
mpar <- mclust::Mclust(x)$param
approx <- qmixnorm(p=ppoints(100), mean=mpar$mean, pro=mpar$pro,
     sd=sqrt(mpar$variance$sigmasq))
known <- qnorm(p=ppoints(100), mean=mpar$mean, sd=sqrt(mpar$variance$sigmasq))
cor(approx, known)  # Approximately the same
plot(approx, main="Quantiles for (unimodal) normal")
lines(known)
legend("topleft", legend=c("known", "approximation"), pch=c(NA,1),
     lty=c(1, NA), bty="n")

## End(Not run)

pnovack-gottshall/KScorrect documentation built on July 6, 2019, 10:32 a.m.