knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.align = 'center', fig.width=6, fig.height=5 )
library(fdaoutlier)
The following are simulation models included in the fdaoutlier
package. Some of these models were curated from research work related to functional depths and outlier detection for functional data. This documents presents the model equations as well as their corresponding functions and parameters in fdaoutlier
. The parameters of the fdaoutlier
functions have been set to reasonable default values for ease of use.
This is a typical magnitude model in which outliers are shifted from the 'normal' non-outlying observations. The main model is of the form:
$$X_i(t) = \mu t + e_i(t),$$ and the contamination model model is of the form:
$$X_i(t) = \mu t + qk_i + e_i(t)$$ where:
This model can be accessed with the simulation_model1()
function in
fdaoutlier
.
library(fdaoutlier) dtss <- simulation_model1(n = 100, p = 50, outlier_rate = .1, seed = 50, plot = F)
The returned object is a list containing a matrix of the data and a vector of the indices of the true outliers:
dim(dtss$data) dtss$true_outliers
The simulated data can be tuned using additional parameters to
simulation_model1()
. The following parameters modify the data generated by
simulation_model1()
:
mu
: the coefficient $\mu$ in the main and contamination models controlling the mean function.q
: the shift parameter $q$ in the contamination model which controls how far the outliers are from the mean function.kprob
: the probability that $k_i = 1$, i.e., $P(k_i=1)$ in the contamination modelcov_alpha
: the coefficient $\alpha$ in the covariance function.cov_beta
: the coefficient $\beta$ in the covariance function.cov_nu
: the coefficient $\nu$ in the covariance function.Additional plotting parameters allows for modifying the plot title
(plot_title
), the font size of the title (title_cex
), toggle on/off the
display of the legend (show_legend
), y-axis label (ylabel
) and x-axis label
(xlabel
).
This model generates non-persistent magnitude outliers, i.e., the outliers are magnitude outliers for only a portion of the domain of the functional data. The main model is of the form: $$X_i(t) = \mu t + e_i(t),$$ with contamination model of the form: $$X_i(t) = \mu t + qk_iI_{T_i \le t\le T_i+l } + e_i(t)$$ where:
A call to simulation_model2()
generates data from this model:
dtss <- simulation_model2(n = 100, p = 50, outlier_rate = .1, seed = 50, plot = F)
Additional parameters of simulation_model3()
to which arguments can be passed
are:
mu
: the coefficient $\mu$ in the main and contamination models controlling
the mean function.q
: the shift parameter $q$ in the contamination model which controls how far
the outliers are from the mean function.kprob
: the probability that $k_i = 1$, i.e., $P(k_i=1)$ in the
contamination model.a
, b
: values specifying the interval $[a,b]$ from which $T_i$ is drawn in the
contamination model.l
: the value of $l$ in the contamination model.cov_alpha
: the coefficient $\alpha$ in the covariance function.cov_beta
: the coefficient $\beta$ in the covariance function.cov_nu
: the coefficient $\nu$ in the covariance function.Additional plotting parameters listed for simulation_model1()
also applies.
This model generates outliers that are magnitude outliers for a part of the domain. The main model is of the form: $$X_i(t) = \mu t + e_i(t),$$ with contamination model of the form: $$X_i(t) = \mu t + qk_iI_{T_i \le t } + e_i(t)$$ where:
A call to simulation_model3()
generates data from this model:
dtss <- simulation_model3(n = 100, p = 50, outlier_rate = .1, seed = 50, plot = F)
Additional parameters of simulation_model3()
to which arguments can be passed
are:
mu
: the coefficient $\mu$ in the main and contamination models controlling
the mean function.q
: the shift parameter $q$ in the contamination model which controls how
far the outliers are from the mean function.kprob
: the probability that $k_i = 1$, i.e., $P(k_i=1)$ in the
contamination model.a
, b
: values specifying the interval $[a,b]$ from which $T_i$ is drawn in the
contamination model.cov_alpha
: the coefficient $\alpha$ in the covariance function.cov_beta
: the coefficient $\beta$ in the covariance function.cov_nu
: the coefficient $\nu$ in the covariance function.Additional plotting parameters listed for simulation_model1()
also applies.
This models generates outliers defined on the reversed interval of the main model. The main model is of the form: $$X_i(t) = \mu t(1 - t)^m + e_i(t),$$ with contamination model of the form: $$X_i(t) = \mu(1 - t)t^m + e_i(t)$$ where:
A call to simulation_model4()
generates data from this model:
dtss <- simulation_model4(n = 100, p = 50, outlier_rate = .1, seed = 50, plot = F)
Additional parameters of simulation_model4()
to which arguments can be passed
are:
mu
: the coefficient $\mu$ in the main and contamination models controlling the mean function.m
: the constant $m$ in the main and contamination models.cov_alpha
: the coefficient $\alpha$ in the covariance function.cov_beta
: the coefficient $\beta$ in the covariance function.cov_nu
: the coefficient $\nu$ in the covariance function.Additional plotting parameters listed for simulation_model1()
also applies.
This models generates shape outliers with a different covariance structure from that of the main model. The main model is of the form: $$X_i(t) = \mu t + e_i(t),$$ with contamination model of the form: $$X_i(t) = \mu t + \tilde{e}_i(t),$$ where:
A call to simulation_model5()
generates data from this model:
dtss <- simulation_model5(n = 100, p = 50, outlier_rate = .1, seed = 50, plot = F)
Additional parameters of simulation_model5()
to which arguments can be passed
are:
mu
: the coefficient $\mu$ in the main and contamination models controlling the mean function.cov_alpha
: the coefficient $\alpha$ in the covariance function of $e_i(t)$.cov_beta
: the coefficient $\beta$ in the covariance function of $e_i(t)$.cov_nu
: the coefficient $\nu$ in the covariance function of $e_i(t)$.cov_alpha2
: the coefficient $\alpha$ in the covariance function of $\tilde{e}_i(t)$.cov_beta2
: the coefficient $\beta$ in the covariance function of $\tilde{e}_i(t)$.cov_nu2
: the coefficient $\nu$ in the covariance function of $\tilde{e}_i(t)$.Additional plotting parameters listed for simulation_model1()
also applies.
This models generates shape outliers that have a different shape for a portion of the domain. The main model is of the form: $$X_i(t) = \mu t + e_i(t),$$ with contamination model of the form: $$X_i(t) = \mu t + (-1)^u\cdot q + (-1)^{(1-u)}\left(\frac{1}{\sqrt{r\pi}}\right)\exp{(-z(t-v)^w)} + e_i(t)$$ where:
A call to simulation_model6()
generates data from this model:
dtss <- simulation_model6(n = 100, p = 50, outlier_rate = .1, seed = 50, plot = F)
Additional parameters of simulation_model6()
to which arguments can be passed
are:
mu
: the coefficient $\mu$ in the main and contamination models controlling the mean function.q
: the constant term $q$ in the contamination model.kprob
: the probability $P(u = 1)$a
, b
: values specifying the interval of from which $v$ in the contamination model is drawn.pi_coeff
: the constant $r$ in the contamination model.exp_pow
: the constant $w$ in the contamination model.exp_coeff
: the constant $z$ in the contamination model.cov_alpha
: the coefficient $\alpha$ in the covariance function.cov_beta
: the coefficient $\beta$ in the covariance function.cov_nu
: the coefficient $\nu$ in the covariance function.Additional plotting parameters listed for simulation_model1()
also applies.
This model generates pure shape outliers that are periodic. The main model is of the form: $$X_i(t) = \mu t + e_i(t),$$ with contamination model of the form: $$X_i(t) = \mu t + k\sin(r\pi(t + \theta)) + e_i(t),$$ where:
A call to simulation_model7()
generates data from this model:
dtss <- simulation_model7(n = 100, p = 50, outlier_rate = .1, seed = 50, plot = F)
Additional parameters of simulation_model7()
to which arguments can be passed
are:
mu
: the coefficient $\mu$ in the main and contamination models controlling the mean function.cov_alpha
: the coefficient $\alpha$ in the covariance function of $e_i(t)$.cov_beta
: the coefficient $\beta$ in the covariance function of $e_i(t)$.cov_nu
: the coefficient $\nu$ in the covariance function of $e_i(t)$.sin_coeff
: the coefficient $k$ in the contamination model.pi_coeff
: the coefficient $r$ in the contamination model.a
, b
: values specifying the interval of from which $\theta$ is to be drawn.Additional plotting parameters listed for simulation_model1()
also applies.
This model generates pure shape outliers that are periodic. The main model is of the form: $$X_i(t) = k\sin(r\pi t) + e_i(t),$$ with contamination model of the form: $$X_i(t) = k\sin(r\pi t + v) + e_i(t),$$ where:
A call to simulation_model8()
generates data from this model:
dtss <- simulation_model8(n = 100, p = 50, outlier_rate = .1, seed = 50, plot = F)
Additional parameters of simulation_model7()
to which arguments can be passed
are:
cov_alpha
: the coefficient $\alpha$ in the covariance function of $e_i(t)$.cov_beta
: the coefficient $\beta$ in the covariance function of $e_i(t)$.cov_nu
: the coefficient $\nu$ in the covariance function of $e_i(t)$.sin_coeff
: the coefficient $k$ in the main and contamination model.pi_coeff
: the coefficient $r$ in the main and contamination model.constant
: the value of the constant $v$ in the contamination model.Additional plotting parameters listed for simulation_model1()
also applies.
Periodic functions with outliers of different amplitude. The main model is of the form: $$X_i(t) = a_{1i}\sin \pi + a_{2i}\cos\pi + e_i(t),$$ with contamination model of the form: $$X_i(t) = (b_{1i}\sin\pi + b_{2i}\cos\pi)(1-u_i) + (c_{1i}\sin\pi + c_{2i}\cos\pi)u_i + e_i(t),$$ where:
A call to simulation_model9()
generates data from this model:
dtss <- simulation_model9(n = 100, p = 50, outlier_rate = .1, seed = 50, plot = F)
Additional parameters of simulation_model9()
to which arguments can be passed
are:
kprob
the probability $P(u_i = 1)$ai
a vector of 2 values containing $a_{1}$ and $a_{2}$ indicating the interval from which $a_{1i}$ and $a_{2i}$ are drawn in the main model. bi
a vector of 2 values containing $b_{1}$ and $b_{2}$ indicating the interval from which $a_{1i}$ and $a_{2i}$ are drawn in the main model.ci
a vector of 2 values containing $c_{1}$ and $c_{2}$ indicating the interval from which $c_{1i}$ and $c_{2i}$ are drawn in the main model.cov_alpha
: the coefficient $\alpha$ in the covariance function of $e_i(t)$.cov_beta
: the coefficient $\beta$ in the covariance function of $e_i(t)$.cov_nu
: the coefficient $\nu$ in the covariance function of $e_i(t)$.Additional plotting parameters listed for simulation_model1()
also applies.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.