contmixvar1: Generation of One Continuous Variable with a Mixture...
In SimCorrMix: Simulation of Correlated Data with Multiple Variable Types Including Continuous and Count Mixture Distributions

Description Usage Arguments Value Overview of Simulation Process Reasons for Function Errors References See Also Examples

View source: R/contmixvar1.R

This function simulates one continuous mixture variable. Mixture distributions describe random variables that are drawn from more than one component distribution. For a random variable Y_{mix} from a finite continuous mixture distribution with k components, the probability density function (PDF) can be described by:

h_Y(y) = ∑_{i=1}^{k} π_i f_{Yi}(y), ∑_{i=1}^{k} π_i = 1.

The π_i are mixing parameters which determine the weight of each component distribution f_{Yi}(y) in the overall probability distribution. As long as each component has a valid PDF, the overall distribution h_Y(y) has a valid PDF. The main assumption is statistical independence between the process of randomly selecting the component distribution and the distributions themselves. Each component Y_i is generated using either Fleishman's third-order (method = "Fleishman", doi: 10.1007/BF02293811) or Headrick's fifth-order (method = "Polynomial", doi: 10.1016/S0167-9473(02)00072-5) power method transformation (PMT). It works by matching standardized cumulants – the first four (mean, variance, skew, and standardized kurtosis) for Fleishman's method, or the first six (mean, variance, skew, standardized kurtosis, and standardized fifth and sixth cumulants) for Headrick's method. The transformation is expressed as follows:

Y = c_0 + c_1 * Z + c_2 * Z^2 + c_3 * Z^3 + c_4 * Z^4 + c_5 * Z^5, Z \sim N(0,1),

where c_4 and c_5 both equal 0 for Fleishman's method. The real constants are calculated by
find_constants. These components are then transformed to the desired mixture variable using a random multinomial variable generated based on the mixing probabilities. There are no parameter input checks in order to decrease simulation time. All inputs should be checked prior to simulation with validpar. Summaries for the simulation results can be obtained with summary_var.

Mixture distributions provide a useful way for describing heterogeneity in a population, especially when an outcome is a composite response from multiple sources. The vignette Variable Types provides more information about simulation of mixture variables and the required parameters. The vignette Expected Cumulants and Correlations for Continuous Mixture Variables gives the equations for the expected cumulants of a mixture variable. In addition, Headrick & Kowalchuk (2007, doi: 10.1080/10629360600605065) outlined a general method for comparing a simulated distribution Y to a given theoretical distribution Y^*. These steps can be found in the Continuous Mixture Distributions vignette.

contmixvar1(n = 10000, method = c("Fleishman", "Polynomial"), means = 0,
  vars = 1, mix_pis = NULL, mix_mus = NULL, mix_sigmas = NULL,
  mix_skews = NULL, mix_skurts = NULL, mix_fifths = NULL,
  mix_sixths = NULL, mix_Six = list(), seed = 1234, cstart = list(),
  quiet = FALSE)

`n`	the sample size (i.e. the length of the simulated variable; default = 10000)
`method`	the method used to generate the component variables. "Fleishman" uses Fleishman's third-order polynomial transformation and "Polynomial" uses Headrick's fifth-order transformation.
`means`	mean for the mixture variable (default = 0)
`vars`	variance for the mixture variable (default = 1)
`mix_pis`	a vector of mixing probabilities that sum to 1 for the component distributions
`mix_mus`	a vector of means for the component distributions
`mix_sigmas`	a vector of standard deviations for the component distributions
`mix_skews`	a vector of skew values for the component distributions
`mix_skurts`	a vector of standardized kurtoses for the component distributions
`mix_fifths`	a vector of standardized fifth cumulants for the component distributions; keep NULL if using `method` = "Fleishman" to generate continuous variables
`mix_sixths`	a vector of standardized sixth cumulants for the component distributions; keep NULL if using `method` = "Fleishman" to generate continuous variables
`mix_Six`	a list of vectors of sixth cumulant correction values for the component distributions of Y_{mix}; use `NULL` if no correction is desired for a given component; if no correction is desired for any component keep as `mix_Six = list()` (not necessary for `method` = "Fleishman")
`seed`	the seed value for random number generation (default = 1234)
`cstart`	a list of length equal to the total number of mixture components containing initial values for root-solving algorithm used in `find_constants`. If user specified, each list element must be input as a matrix. For `method` = "Fleishman", each should have 3 columns for c_1, c_2, c_3; for `method` = "Polynomial", each should have 5 columns for c_1, c_2, c_3, c_4, c_5. If no starting values are specified for a given component, that list element should be `NULL`.
`quiet`	if FALSE prints total simulation time

A list with the following components:

constants a data.frame of the constants

Y_comp a data.frame of the components of the mixture variable

Y_mix a data.frame of the generated mixture variable

sixth_correction the sixth cumulant correction values for Y_comp

valid.pdf "TRUE" if constants generate a valid PDF, else "FALSE"

Time the total simulation time in minutes

1) A check is performed to see if any distributions are repeated within the parameter inputs, i.e. if the mixture variable contains 2 components with the same standardized cumulants. These are noted so that the constants are only calculated once.

2) The constants are calculated for each component variable using find_constants. If no solutions are found that generate a valid power method PDF, the function will return constants that produce an invalid PDF (or a stop error if no solutions can be found). Possible solutions include: 1) changing the seed, or 2) using a mix_Six list with vectors of sixth cumulant correction values (if method = "Polynomial"). Errors regarding constant calculation are the most probable cause of function failure.

3) A matrix X_cont of dim n x length(mix_pis) of standard normal variables is generated and singular-value decomposition is done to remove any correlation. The constants are applied to X_cont to create the component variables Y with the desired distributions.

4) A random multinomial variable M = rmultinom(n, size = 1, prob = mix_pis) is generated using stats::rmultinom. The continuous mixture variable Y_mix is created from the component variables Y based on this multinomial variable. That is, if M[i, k_i] = 1, then Y_mix[i] = Y[i, k_i]. A location-scale transformation is done on Y_mix to give it mean means and variance vars.

1) The most likely cause for function errors is that no solutions to fleish or poly converged when using find_constants. If this happens, the simulation will stop. It may help to first use find_constants for each component variable to determine if a sixth cumulant correction value is needed. The solutions can be used as starting values (see cstart below). If the standardized cumulants are obtained from calc_theory, the user may need to use rounded values as inputs (i.e. skews = round(skews, 8)). For example, in order to ensure that skew is exactly 0 for symmetric distributions.

2) The kurtosis may be outside the region of possible values. There is an associated lower boundary for kurtosis associated with a given skew (for Fleishman's method) or skew and fifth and sixth cumulants (for Headrick's method). Use calc_lower_skurt to determine the boundary for a given set of cumulants.

See references for SimCorrMix.

find_constants, validpar, summary_var

# Mixture of Normal(-2, 1) and Normal(2, 1)
Nmix <- contmixvar1(n = 1000, "Polynomial", means = 0, vars = 1,
  mix_pis = c(0.4, 0.6), mix_mus = c(-2, 2), mix_sigmas = c(1, 1),
  mix_skews = c(0, 0), mix_skurts = c(0, 0), mix_fifths = c(0, 0),
  mix_sixths = c(0, 0))
## Not run: 
# Mixture of Beta(6, 3), Beta(4, 1.5), and Beta(10, 20)
Stcum1 <- calc_theory("Beta", c(6, 3))
Stcum2 <- calc_theory("Beta", c(4, 1.5))
Stcum3 <- calc_theory("Beta", c(10, 20))
mix_pis <- c(0.5, 0.2, 0.3)
mix_mus <- c(Stcum1[1], Stcum2[1], Stcum3[1])
mix_sigmas <- c(Stcum1[2], Stcum2[2], Stcum3[2])
mix_skews <- c(Stcum1[3], Stcum2[3], Stcum3[3])
mix_skurts <- c(Stcum1[4], Stcum2[4], Stcum3[4])
mix_fifths <- c(Stcum1[5], Stcum2[5], Stcum3[5])
mix_sixths <- c(Stcum1[6], Stcum2[6], Stcum3[6])
mix_Six <- list(seq(0.01, 10, 0.01), c(0.01, 0.02, 0.03),
  seq(0.01, 10, 0.01))
Bstcum <- calc_mixmoments(mix_pis, mix_mus, mix_sigmas, mix_skews,
  mix_skurts, mix_fifths, mix_sixths)
Bmix <- contmixvar1(n = 10000, "Polynomial", Bstcum[1], Bstcum[2]^2,
  mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths,
  mix_sixths, mix_Six)
Bsum <- summary_var(Y_comp = Bmix$Y_comp, Y_mix = Bmix$Y_mix, means = means,
  vars = vars, mix_pis = mix_pis, mix_mus = mix_mus,
  mix_sigmas = mix_sigmas, mix_skews = mix_skews, mix_skurts = mix_skurts,
  mix_fifths = mix_fifths, mix_sixths = mix_sixths)

## End(Not run)