simulate_cor: A function for generating simulated multivariate data

Description Usage Arguments Value Examples

View source: R/simulate_data.R

Description

Generates cross-correlated multivariate simulated data having n observations and p variates. The data have a Gaussian distribution with the specified covariance matrix except at a specified number of locations where there is a change in mean in a proportion of the variates. The function is useful for generating data to demonstrate and assess multivariate anomaly detection methods such as capa.cc, capa.mv pass and inspect.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
simulate_cor(
  n = 100,
  p = 10,
  vartheta = 5,
  shape = 0,
  change_seed = NA,
  Sigma = diag(1, p),
  locations = 40,
  durations = 20,
  proportions = 0.1,
  change_type = "adjacent",
  changing_vars = NA,
  point_locations = NA,
  point_proportions = NA,
  point_mu = NA,
  n_sd_changes = 0
)

Arguments

n

The number of observations. The default is n=100.

p

The number of variates. The default is p=10.

vartheta

The size of the mean change vector in L2 distance. Defaults to 5.

shape

An integer between 0 and 10 specifying the shape of a change. Defaults to 0, which means equally changing components. 5 gives mean components drawn from an i.i.d. Gaussian distriubtion, while 6 draws changes from the data distribution. See the function generate_change for more details.

change_seed

The seed of the drawn mean change.

Sigma

The data covariance matrix. The default is the identity matrix.

locations

A vector of locations (or scalar for a single location) where the change in mean occurs. The default is locations=40.

durations

A scalar or vector (the same length as locations) of values indicating the duration for the change in mean. If the durations are all of the same length then a scalar value can be used. The default is durations=20.

proportions

A scalar or vector (the same length as locations) of values in the range (0,1] indicating the proportion of variates at each location that are affected by the change in mean. If the proportions are all same than a scalar value can be used. The default is proportions=0.1.

change_type

A string specifying which variables are affected. Options include "adjacent", "adjacent_lattice", "scattered", "block_scattered", "custom" and "random". See the function get_affected_dims for more details.

changing_vars

If change_type="custom", which variables are anomalous?

point_locations

A vector with locations of point anomalies. Defaults to NA.

point_proportions

A vector of the same length as point_locations specifying the the proportion of variables affected by each point anomaly.

point_mu

A vector of the same length as point_locations specifying the mean of all variables affected by each point anomaly.

Value

A matrix with n rows and p columns

Examples

1
2
library(anomaly)
sim.data<-simulate(500,200,2,c(100,200,300),6,c(0.04,0.06,0.08))

Tveten/capacc documentation built on Sept. 29, 2021, 5:31 a.m.