simulation_model9: Convenience function for generating functional data

View source: R/simulation_models.R

simulation_model9R Documentation

Convenience function for generating functional data

Description

Periodic functions with outliers of different amplitude. The main model is of the form

X_i(t) = a_{1i}\sin \pi + a_{2i}\cos\pi + e_i(t),

with contamination model of the form

X_i(t) = (b_{1i}\sin\pi + b_{2i}\cos\pi)(1-u_i) + (c_{1i}\sin\pi + c_{2i}\cos\pi)u_i + e_i(t),

where t\in [0,1], \pi \in [0, 2\pi], a_{1i}, a_{2i} follows uniform distribution in an interval [a_1, a_2] b_{1i}, b_{i1} follows uniform distribution in an interval [b_1, b_2]; c_{1i}, c_{i1} follows uniform distribution in an interval [c_1, c_2]; u_i follows Bernoulli distribution and e_i(t) is a Gaussian processes with zero mean and covariance function of the form

\gamma(s,t) = \alpha\exp{-\beta|t-s|^\nu}

Please see the simulation models vignette with vignette("simulation_models", package = "fdaoutlier") for more details.

Usage

simulation_model9(
  n = 100,
  p = 50,
  outlier_rate = 0.05,
  kprob = 0.5,
  ai = c(3, 8),
  bi = c(1.5, 2.5),
  ci = c(9, 10.5),
  cov_alpha = 1,
  cov_beta = 1,
  cov_nu = 1,
  deterministic = TRUE,
  seed = NULL,
  plot = F,
  plot_title = "Simulation Model 9",
  title_cex = 1.5,
  show_legend = T,
  ylabel = "",
  xlabel = "gridpoints"
)

Arguments

n

The number of curves to generate. Set to 100 by default.

p

The number of evaluation points of the curves. Curves are usually generated over the interval [0, 1]. Set to 50 by default.

outlier_rate

A value between [0, 1] indicating the percentage of outliers. A value of 0.06 indicates about 6\% of the observations will be outliers depending on whether the parameter deterministic is TRUE or not. Set to 0.05 by default.

kprob

The probability P(u_i = 1). Set to 0.5 by default.

ai

A vector of two values containing a_{1i} and a_{2i} in the main model. Set to c(3, 8) by default.

bi

A vector of 2 values containing b_{1i} and b_{2i} in the contamination model. Set to c(1.5, 2.5) by default.

ci

A vector of 2 values containing $c_1i$ and $c_2i$ in the contamination model. Set to c(9, 10.5) by default.

cov_alpha

A value indicating the coefficient of the exponential function of the covariance matrix, i.e., the \alpha in the covariance function. Set to 1 by default.

cov_beta

A value indicating the coefficient of the terms inside the exponential function of the covariance matrix, i.e., the \beta in the covariance function. Set to 1 by default.

cov_nu

A value indicating the power to which to raise the terms inside the exponential function of the covariance matrix, i.e., the \nu in the covariance function. Set to 1 by default.

deterministic

A logical value. If TRUE, the function will always return round(n*outlier_rate) outliers and consequently the number of outliers is always constant. If FALSE, the number of outliers are determined using n Bernoulli trials with probability outlier_rate, and consequently the number of outliers returned is random. TRUE by default.

seed

A seed to set for reproducibility. NULL by default in which case a seed is not set.

plot

A logical value indicating whether to plot data.

plot_title

Title of plot if plot is TRUE

title_cex

Numerical value indicating the size of the plot title relative to the device default. Set to 1.5 by default. Ignored if plot = FALSE.

show_legend

A logical indicating whether to add legend to plot if plot = TRUE.

ylabel

The label of the y-axis. Set to "" by default.

xlabel

The label of the x-axis if plot = TRUE. Set to "gridpoints" by default.

Value

A list containing:

data

a matrix of size n by p containing the simulated data set

true_outliers

a vector of integers indicating the row index of the outliers in the generated data.

Examples

dt <- simulation_model9(plot = TRUE)
dim(dt$data)
dt$true_outliers

fdaoutlier documentation built on Oct. 1, 2023, 1:06 a.m.