Description Usage Arguments Value Choice of Fleishman's third-order or Headrick's fifth-order method Overview of Simulation Process Reasons for Function Errors References See Also Examples
This function simulates one non-normal continuous variable using either Fleishman's Third-Order (method
= "Fleishman",
doi: 10.1007/BF02293811) or Headrick's Fifth-Order (method
= "Polynomial", doi: 10.1016/S0167-9473(02)00072-5) Polynomial
Transformation. If only one variable is desired and that variable is continuous, this function should be used. The power method
transformation is a computationally efficient algorithm that simulates continuous distributions through the method of moments.
It works by matching standardized cumulants – the first four (mean, variance, skew, and standardized kurtosis) for Fleishman's
method, or the first six (mean, variance, skew, standardized kurtosis, and standardized fifth and sixth cumulants) for Headrick's
method. The transformation is expressed as follows:
Y = c_{0} + c_{1} * Z + c_{2} * Z^2 + c_{3} * Z^3 + c_{4} * Z^4 + c_{5} * Z^5,
where Z ~ N(0,1), and c_{4} and c_{5} both equal 0 for Fleishman's method. The real constants are calculated by
find_constants
. All variables are simulated with mean 0 and variance 1, and then
transformed to the specified mean and variance at the end.
The required parameters for simulating continuous variables include: mean, variance, skewness, standardized kurtosis (kurtosis - 3), and
standardized fifth and sixth cumulants (for method
= "Polynomial"). If the goal is to simulate a theoretical distribution
(i.e. Gamma, Beta, Logistic, etc.), these values can be obtained using calc_theory
. If the goal is to
mimic an empirical data set, these values can be found using calc_moments
(using the method of moments) or
calc_fisherk
(using Fisher's k-statistics). If the standardized cumulants
are obtained from calc_theory
, the user may need to use rounded values as inputs (i.e. skews = round(skews, 8)
). Due to the nature
of the integration involved in calc_theory
, the results are approximations. Greater accuracy can be achieved by increasing the number of
subdivisions (sub
) used in the integration process. For example, in order to ensure that skew is exactly 0 for symmetric distributions.
For some sets of cumulants, it is either not possible to find power method constants or the
calculated constants do not generate valid power method pdfs. In these situations, adding a value to the sixth cumulant may
provide solutions (see find_constants
). If simulation results indicate that a continuous variable
does not generate a valid pdf, the user can try find_constants
with various sixth cumulant correction
vectors to determine if a valid pdf can be found.
Headrick & Kowalchuk (2007, doi: 10.1080/10629360600605065) outlined a general method for comparing a simulated distribution Y to a given theoretical distribution Y^*. These steps can be found in the example and the Comparison of Simulated Distribution to Theoretical Distribution or Empirical Data vignette.
1 2 3 | nonnormvar1(method = c("Fleishman", "Polynomial"), means = 0, vars = 1,
skews = 0, skurts = 0, fifths = 0, sixths = 0, Six = NULL,
cstart = NULL, n = 10000, seed = 1234)
|
method |
the method used to generate the continuous variable. "Fleishman" uses Fleishman's third-order polynomial transformation and "Polynomial" uses Headrick's fifth-order transformation. |
means |
mean for the continuous variable (default = 0) |
vars |
variance (default = 1) |
skews |
skewness value (default = 0) |
skurts |
standardized kurtosis (kurtosis - 3, so that normal variables have a value of 0; default = 0) |
fifths |
standardized fifth cumulant (not necessary for |
sixths |
standardized sixth cumulant (not necessary for |
Six |
a vector of correction values to add to the sixth cumulant if no valid pdf constants are found,
ex: |
cstart |
initial values for root-solving algorithm (see |
n |
the sample size (i.e. the length of the simulated variable; default = 10000) |
seed |
the seed value for random number generation (default = 1234) |
A list with the following components:
constants
a data.frame of the constants
continuous_variable
a data.frame of the generated continuous variable
summary_continuous
a data.frame containing a summary of the variable
summary_targetcont
a data.frame containing a summary of the target variable
sixth_correction
the sixth cumulant correction value
valid.pdf
"TRUE" if constants generate a valid pdf, else "FALSE"
Constants_Time
the time in minutes required to calculate the constants
Simulation_Time
the total simulation time in minutes
Using the fifth-order approximation allows additional control over the fifth and sixth moments of the generated distribution, improving
accuracy. In addition, the range of feasible standardized kurtosis values, given skew and standardized fifth (γ_{3}) and sixth
(γ_{4}) cumulants, is larger than with Fleishman's method (see calc_lower_skurt
).
For example, the Fleishman method can not be used to generate a
non-normal distribution with a ratio of γ_{3}^2/γ_{4} > 9/14 (see Headrick & Kowalchuk, 2007). This eliminates the
Chi-squared family of distributions, which has a constant ratio of γ_{3}^2/γ_{4} = 2/3. However, if the fifth and
sixth cumulants do not exist, the Fleishman approximation should be used.
1) The constants are calculated for the continuous variable using find_constants
. If no
solutions are found that generate a valid power method pdf, the function will return constants that produce an invalid pdf
(or a stop error if no solutions can be found). Possible solutions include: 1) changing the seed, or 2) using a Six
vector
of sixth cumulant correction values (if method
= "Polynomial"). Errors regarding constant
calculation are the most probable cause of function failure.
2) An intermediate standard normal variate X of length n is generated.
3) Summary statistics are calculated.
1) The most likely cause for function errors is that no solutions to fleish
or
poly
converged when using find_constants
. If this happens,
the simulation will stop. It may help to first use find_constants
for each continuous variable to
determine if a vector of sixth cumulant correction values is needed. The solutions can be used as starting values (see cstart
below).
If the standardized cumulants are obtained from calc_theory
, the user may need to use rounded values as inputs (i.e.
skews = round(skews, 8)
).
2) In addition, the kurtosis may be outside the region of possible values. There is an associated lower boundary for kurtosis associated
with a given skew (for Fleishman's method) or skew and fifth and sixth cumulants (for Headrick's method). Use
calc_lower_skurt
to determine the boundary for a given set of cumulants.
Fleishman AI (1978). A Method for Simulating Non-normal Distributions. Psychometrika, 43, 521-532. doi: 10.1007/BF02293811.
Headrick TC (2002). Fast Fifth-order Polynomial Transforms for Generating Univariate and Multivariate Non-normal Distributions. Computational Statistics & Data Analysis, 40(4):685-711. doi: 10.1016/S0167-9473(02)00072-5. (ScienceDirect)
Headrick TC (2004). On Polynomial Transformations for Simulating Multivariate Nonnormal Distributions. Journal of Modern Applied Statistical Methods, 3(1), 65-71. doi: 10.22237/jmasm/1083370080.
Headrick TC, Kowalchuk RK (2007). The Power Method Transformation: Its Probability Density Function, Distribution Function, and Its Further Use for Fitting Data. Journal of Statistical Computation and Simulation, 77, 229-249. doi: 10.1080/10629360600605065.
Headrick TC, Sawilowsky SS (1999). Simulating Correlated Non-normal Distributions: Extending the Fleishman Power Method. Psychometrika, 64, 25-35. doi: 10.1007/BF02294317.
Headrick TC, Sheng Y, & Hodis FA (2007). Numerical Computing and Graphics for the Power Method Transformation Using Mathematica. Journal of Statistical Software, 19(3), 1 - 17. doi: 10.18637/jss.v019.i03.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | # Normal distribution with Headrick's fifth-order PMT:
N <- nonnormvar1("Polynomial", 0, 1, 0, 0, 0, 0)
## Not run:
# Use Headrick & Kowalchuk's (2007) steps to compare a simulated exponential
# (mean = 2) variable to the theoretical exponential(mean = 2) density:
# 1) Obtain the standardized cumulants:
stcums <- calc_theory(Dist = "Exponential", params = 0.5) # rate = 1/mean
# 2) Simulate the variable:
H_exp <- nonnormvar1("Polynomial", means = 2, vars = 2, skews = stcums[3],
skurts = stcums[4], fifths = stcums[5],
sixths = stcums[6], n = 10000, seed = 1234)
H_exp$constants
# c0 c1 c2 c3 c4 c5
# 1 -0.3077396 0.8005605 0.318764 0.03350012 -0.00367481 0.0001587076
# 3) Determine whether the constants produce a valid power method pdf:
H_exp$valid.pdf
# [1] "TRUE"
# 4) Select a critical value:
# Let alpha = 0.05.
y_star <- qexp(1 - 0.05, rate = 0.5) # note that rate = 1/mean
y_star
# [1] 5.991465
# 5) Solve m_{2}^{1/2} * p(z') + m_{1} - y* = 0 for z', where m_{1} and
# m_{2} are the 1st and 2nd moments of Y*:
# The exponential(2) distribution has a mean and standard deviation equal
# to 2.
# Solve 2 * p(z') + 2 - y_star = 0 for z'
# p(z') = c0 + c1 * z' + c2 * z'^2 + c3 * z'^3 + c4 * z'^4 + c5 * z'^5
f_exp <- function(z, c, y) {
return(2 * (c[1] + c[2] * z + c[3] * z^2 + c[4] * z^3 + c[5] * z^4 +
c[6] * z^5) + 2 - y)
}
z_prime <- uniroot(f_exp, interval = c(-1e06, 1e06),
c = as.numeric(H_exp$constants), y = y_star)$root
z_prime
# [1] 1.644926
# 6) Calculate 1 - Phi(z'), the corresponding probability for the
# approximation Y to Y* (i.e. 1 - Phi(z') = 0.05), and compare to target
# value alpha:
1 - pnorm(z_prime)
# [1] 0.04999249
# 7) Plot a parametric graph of Y* and Y:
plot_sim_pdf_theory(sim_y = as.numeric(H_exp$continuous_variable[, 1]),
Dist = "Exponential", params = 0.5)
# Note we can also plot the empirical cdf and show the cumulative
# probability up to y_star:
plot_sim_cdf(sim_y = as.numeric(H_exp$continuous_variable[, 1]),
calc_cprob = TRUE, delta = y_star)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.