# fitGSMAR: Estimate Gaussian or Student's t Mixture Autoregressive model In uGMAR: Estimate Univariate Gaussian and Student's t Mixture Autoregressive Models

## Description

`fitGSMAR` estimates GMAR, StMAR, or G-StMAR model in two phases. In the first phase, a genetic algorithm is employed to find starting values for a gradient based method. In the second phase, the gradient based variable metric algorithm is utilized to accurately converge to a local maximum or a saddle point near each starting value. Parallel computing is used to conduct multiple rounds of estimations in parallel.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16``` ```fitGSMAR( data, p, M, model = c("GMAR", "StMAR", "G-StMAR"), restricted = FALSE, constraints = NULL, conditional = TRUE, parametrization = c("intercept", "mean"), ncalls = round(10 + 9 * log(sum(M))), ncores = 2, maxit = 300, seeds = NULL, print_res = TRUE, ... ) ```

## Arguments

 `data` a numeric vector or class `'ts'` object containing the data. `NA` values are not supported. `p` a positive integer specifying the autoregressive order of the model. `M` For GMAR and StMAR models:a positive integer specifying the number of mixture components. For G-StMAR models:a size (2x1) integer vector specifying the number of GMAR type components `M1` in the first element and StMAR type components `M2` in the second element. The total number of mixture components is `M=M1+M2`. `model` is "GMAR", "StMAR", or "G-StMAR" model considered? In the G-StMAR model, the first `M1` components are GMAR type and the rest `M2` components are StMAR type. `restricted` a logical argument stating whether the AR coefficients φ_{m,1},...,φ_{m,p} are restricted to be the same for all regimes. `constraints` specifies linear constraints imposed to each regime's autoregressive parameters separately. For non-restricted models:a list of size (pxq_{m}) constraint matrices C_{m} of full column rank satisfying φ_{m}=C_{m}ψ_{m} for all m=1,...,M, where φ_{m}=(φ_{m,1},...,φ_{m,p}) and ψ_{m}=(ψ_{m,1},...,ψ_{m,q_{m}}). For restricted models:a size (pxq) constraint matrix C of full column rank satisfying φ=Cψ, where φ=(φ_{1},...,φ_{p}) and ψ=ψ_{1},...,ψ_{q}. The symbol φ denotes an AR coefficient. Note that regardless of any constraints, the autoregressive order is always `p` for all regimes. Ignore or set to `NULL` if applying linear constraints is not desired. `conditional` a logical argument specifying whether the conditional or exact log-likelihood function should be used. `parametrization` is the model parametrized with the "intercepts" φ_{m,0} or "means" μ_{m} = φ_{m,0}/(1-∑φ_{i,m})? `ncalls` a positive integer specifying how many rounds of estimation should be conducted. The estimation results may vary from round to round because of multimodality of the log-likelihood function and the randomness associated with the genetic algorithm. `ncores` the number of CPU cores to be used in the estimation process. `maxit` the maximum number of iterations for the variable metric algorithm. `seeds` a length `ncalls` vector containing the random number generator seed for each call to the genetic algorithm, or `NULL` for not initializing the seed. Exists for the purpose of creating reproducible results. `print_res` should the estimation results be printed? `...` additional settings passed to the function `GAfit` employing the genetic algorithm.

## Details

Because of complexity and multimodality of the log-likelihood function, it's not guaranteed that the estimation algorithm will end up in the global maximum point. It's often expected that most of the estimation rounds will end up in some local maximum point instead, and therefore a number of estimation rounds is required for reliable results. Because of the nature of the models, the estimation may fail particularly in the cases where the number of mixture components is chosen too large. Note that the genetic algorithm is designed to avoid solutions with mixing weights of some regimes too close to zero at almost all times ("redundant regimes") but the settings can, however, be adjusted (see ?GAfit).

If the iteration limit for the variable metric algorithm (`maxit`) is reached, one can continue the estimation by iterating more with the function `iterate_more`.

The core of the genetic algorithm is mostly based on the description by Dorsey and Mayer (1995). It utilizes a slightly modified version the individually adaptive crossover and mutation rates described by Patnaik and Srinivas (1994) and employs (50%) fitness inheritance discussed by Smith, Dike and Stegmann (1995). Large (in absolute value) but stationary AR parameter values are generated with the algorithm proposed by Monahan (1984).

The variable metric algorithm (or quasi-Newton method, Nash (1990, algorithm 21)) used in the second phase is implemented with function the `optim` from the package `stats`.

Sometimes the found MLE is very close to the boundary of the stationarity region some regime, the related variance parameter is very small, and the associated mixing weights are "spiky". This kind of estimates often maximize the log-likelihood function for a technical reason that induces by the endogenously determined mixing weights. In such cases, it might be more appropriate to consider the next-best local maximum point of the log-likelihood function that is well inside the parameter space. Models based local-only maximum points can be built with the function `alt_gsmar` by adjusting the argument `which_largest` accordingly.

Some mixture components of the StMAR model may sometimes get very large estimates for the degrees of freedom parameters. Such parameters are weakly identified and induce various numerical problems. However, mixture components with large degree of freedom parameters are similar to the mixture components of the GMAR model. It's hence advisable to further estimate a G-StMAR model by allowing the mixture components with large degrees of freedom parameter estimates to be GMAR type with the function `stmar_to_gstmar`.

## Value

Returns an object of class `'gsmar'` defining the estimated GMAR, StMAR or G-StMAR model. The returned object contains estimated mixing weights, some conditional and unconditional moments, and quantile residuals. Note that the first `p` observations are taken as the initial values, so the mixing weights, conditional moments, and quantile residuals start from the `p+1`:th observation (interpreted as t=1). In addition, the returned object contains the estimates and log-likelihoods from all of the estimation rounds. See `?GSMAR` for the form of the parameter vector, if needed.

## S3 methods

The following S3 methods are supported for class `'gsmar'` objects: `print`, `summary`, `plot`, `predict`, `simulate`, `logLik`, `residuals`.

## References

• Dorsey R. E. and Mayer W. J. 1995. Genetic algorithms for estimation problems with multiple optima, nondifferentiability, and other irregular features. Journal of Business & Economic Statistics, 13, 53-66.

• Kalliovirta L., Meitz M. and Saikkonen P. 2015. Gaussian Mixture Autoregressive model for univariate time series. Journal of Time Series Analysis, 36, 247-266.

• Meitz M., Preve D., Saikkonen P. 2021. A mixture autoregressive model based on Student's t-distribution. Communications in Statistics - Theory and Methods, doi: 10.1080/03610926.2021.1916531

• Monahan J.F. 1984. A Note on Enforcing Stationarity in Autoregressive-Moving Average Models. Biometrica 71, 403-404.

• Nash J. 1990. Compact Numerical Methods for Computers. Linear algebra and Function Minimization. Adam Hilger.

• Patnaik L.M. and Srinivas M. 1994. Adaptive Probabilities of Crossover and Mutation in Genetic Algorithms. Transactions on Systems, Man and Cybernetics 24, 656-667.

• Smith R.E., Dike B.A., Stegmann S.A. 1995. Fitness inheritance in genetic algorithms. Proceedings of the 1995 ACM Symposium on Applied Computing, 345-350.

• Virolainen S. 2021. A mixture autoregressive model based on Gaussian and Student's t-distributions. Studies in Nonlinear Dynamics & Econometrics, doi: 10.1515/snde-2020-0060

`GSMAR`, `iterate_more`, , `stmar_to_gstmar`, `add_data`, `profile_logliks`, `swap_parametrization`, `get_gradient`, `simulate.gsmar`, `predict.gsmar`, `diagnostic_plot`, `quantile_residual_tests`, `cond_moments`, `uncond_moments`, `LR_test`, `Wald_test`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57``` ```## These are long running examples that use parallel computing. ## The below examples take approximately 90 seconds to run. ## Note that the number of estimation rounds (ncalls) is relatively small ## in the below examples to reduce the time required for running the examples. ## For reliable results, a large number of estimation ## rounds is recommended! # GMAR model fit12 <- fitGSMAR(data=simudata, p=1, M=2, model="GMAR", ncalls=4, seeds=1:4) summary(fit12) plot(fit12) profile_logliks(fit12) diagnostic_plot(fit12) # StMAR model (boundary estimate + large degrees of freedom) fit42t <- fitGSMAR(data=M10Y1Y, p=4, M=2, model="StMAR", ncalls=2, seeds=c(1, 6)) summary(fit42t, digits=4) # Four almost-unit roots in the 2nd regime! plot(fit42t) # Spiking mixing weights! fit42t_alt <- alt_gsmar(fit42t, which_largest=2) # The second largest local max summary(fit42t_alt) # Overly large 2nd regime degrees of freedom estimate! fit42gs <- stmar_to_gstmar(fit42t_alt) # Switch to G-StMAR model summary(fit42gs) # Finally, an appropriate model! plot(fit42gs) # Restricted StMAR model fit42r <- fitGSMAR(M10Y1Y, p=4, M=2, model="StMAR", restricted=TRUE, ncalls=2, seeds=1:2) fit42r # G-StMAR model with one GMAR type and one StMAR type regime fit42gs <- fitGSMAR(M10Y1Y, p=4, M=c(1, 1), model="G-StMAR", ncalls=1, seeds=4) fit42gs # The following three examples demonstrate how to apply linear constraints # to the autoregressive (AR) parameters. # Two-regime GMAR p=2 model with the second AR coeffiecient of # of the second regime contrained to zero. C22 <- list(diag(1, ncol=2, nrow=2), as.matrix(c(1, 0))) fit22c <- fitGSMAR(M10Y1Y, p=2, M=2, constraints=C22, ncalls=1, seeds=6) fit22c # StMAR(3, 1) model with the second order AR coefficient constrained to zero. C31 <- list(matrix(c(1, 0, 0, 0, 0, 1), ncol=2)) fit31tc <- fitGSMAR(M10Y1Y, p=3, M=1, model="StMAR", constraints=C31, ncalls=1, seeds=1) fit31tc # Such StMAR(3, 2) model that the AR coefficients are restricted to be # the same for both regimes and the second AR coefficients are # constrained to zero. fit32rc <- fitGSMAR(M10Y1Y, p=3, M=2, model="StMAR", restricted=TRUE, constraints=matrix(c(1, 0, 0, 0, 0, 1), ncol=2), ncalls=1, seeds=1) fit32rc ```