library(knitr) opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", fig.align = "center", fig.retina = 2, out.width = "75%", dpi = 96 ) knit_hooks$set(pngquant = hook_pngquant)
Robust estimation methods for the mean vector, scatter matrix, and covariance matrix (if it exists) from data (possibly containing NAs) under multivariate heavy-tailed distributions such as angular Gaussian (via Tyler's method), Cauchy, and Student's t distributions. Additionally, a factor model structure can be specified for the covariance matrix. The latest revision also includes the multivariate skewed t distribution.
The package can be installed from CRAN or GitHub:
# install stable version from CRAN install.packages("fitHeavyTail") # install development version from GitHub devtools::install_github("convexfi/fitHeavyTail")
To get help:
library(fitHeavyTail) help(package = "fitHeavyTail") ?fit_mvt
To cite fitHeavyTail
in publications:
citation("fitHeavyTail")
To illustrate the simple usage of the package fitHeavyTail
, let's start by generating some multivariate data under a Student's $t$ distribution with significant heavy tails (degrees of freedom $\nu=4$):
library(mvtnorm) # package for multivariate t distribution N <- 10 # number of variables T <- 80 # number of observations nu <- 4 # degrees of freedom for heavy tails set.seed(42) mu <- rep(0, N) U <- t(rmvnorm(n = round(0.3*N), sigma = 0.1*diag(N))) Sigma_cov <- U %*% t(U) + diag(N) # covariance matrix with factor model structure Sigma_scatter <- (nu-2)/nu * Sigma_cov X <- rmvt(n = T, delta = mu, sigma = Sigma_scatter, df = nu) # generate data
We can first estimate the mean vector and covariance matrix via the traditional sample estimates (i.e., sample mean and sample covariance matrix):
mu_sm <- colMeans(X) Sigma_scm <- cov(X)
Then we can compute the robust estimates via the package fitHeavyTail
:
library(fitHeavyTail) fitted <- fit_mvt(X)
We can now compute the estimation errors and see the significant improvement:
sum((mu_sm - mu)^2) sum((fitted$mu - mu)^2) sum((Sigma_scm - Sigma_cov)^2) sum((fitted$cov - Sigma_cov)^2)
# fitting with factor model fitted_3factors <- fit_mvt(X, factors = 3) sum((fitted_3factors$mu - mu)^2) sum((fitted_3factors$cov - Sigma_cov)^2)
To get a visual idea of the robustness, we can plot the shapes of the covariance matrices (true and estimated ones) on two dimensions. Observe how the heavy-tailed estimation follows the true one more closely than the sample covariance matrix:
# fig.cap="Sample covariance matrix vs robust estimator." library(mvtnorm) library(ellipse) library(ggplot2) N <- 2 T <- 50 nu <- 4 mu <- rep(0, N) Sigma <- matrix(c(0.00125, 0.00112, 0.00112, 0.00125), N, N) set.seed(42) X <- rmvt(n = 200, delta = mu, sigma = (nu-2)/nu * Sigma, df = nu) X_ <- X[1:T, ] Sigma_scm <- cov(X_) Sigma_heavytail <- fit_mvt(X_, optimize_mu = FALSE, nu = 4, scale_covmat = TRUE)$cov colnames(Sigma_heavytail) <- rownames(Sigma_heavytail) <- NULL # scatter plot ggplot(data.frame(x = X[, 1], y = X[, 2]), aes(x, y)) + geom_point(alpha = 1, size = 0.8) + geom_path(data = data.frame(ellipse(Sigma)), aes(x, y, col = "1"), linewidth = 1) + geom_path(data = data.frame(ellipse(Sigma_scm)), aes(x, y, col = "2"), linewidth = 1) + geom_path(data = data.frame(ellipse(Sigma_heavytail)), aes(x, y, col = "3"), linewidth = 1) + coord_cartesian(xlim = c(-0.15, 0.15), ylim = c(-0.15, 0.15)) + scale_color_manual(name = "ellipses", values = c("1" = "black", "2" = "#c0392b", "3" = "#2980b9"), labels = c("true", "SCM estimation", "heavy-tailed estimation")) + labs(title = "Shape of covariance matrices", x = NULL, y = NULL)
For more detailed information, please check the vignette.
README file: GitHub-readme.
Vignette: CRAN-vignette and GitHub-vignette.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.