simData: Simulates observations for outlier determination.

Description Usage Arguments Details Value References See Also Examples

View source: R/simData.R

Description

Simulates observations from a mixture model based on information on partitions from the leader function.

Usage

1
simData( leaderInstance, nsim=NULL, model=c("diagonal","spherical"), seed=NULL)

Arguments

leaderInstance

A single component from a call to leader, giving Leader Algorithm results for one value of the partitioning radius.

nsim

The number of observations to be simulated. Only the radius and centroids are returned of nsim = 0 or leaderInstance$radius == 0) — no observations are simulated.
Default: min(# observations,max(# partitions, 1000)).

model

For multivariate data, a vector of character strings indicating the type of Gaussian mixture model covariance to be used in generating the simulated observations (see details).
For univariate data, the observations are generated from a model in which the variances may vary across components.

seed

An optional integer argument to set.seed for reproducible simulations. By default the current seed will be used. Reproducibility can also be achieved by calling set.seed before calling simData.

Details

The following models are available for multivariate data:

"spherical" : spherical, varying volume
"diagonal" : diagonal, varying volume and shape

An ellipsoidal model is also possible, but has not yet been implemented.
If nsim = 0 or leaderInstance$radius == 0, no observations are simulated, and only the radius and partition centroids are returned.

Value

A list with the following components:

radius

The value of the radius associated with leaderInstance.

location

The vector or matrix of centroids of the partitions. If a matrix, rows correspond to the partitions and columns to the variables.

index

A vector of integer values giving the index of the partition associated with each simulated observation.

offset

A vector of numeric values giving offset for the simulated observations from their associated centroids.

weight

A vector of numeric values between 0 and 1 giving the proportion of data observations in each partition.

scale

The scale (variance) of the mixture components in a univariate or spherical model. Set to 1 for each component in the diagonal model.

shape

A matrix giving the variances of the mixture component in a diagonal model. The rows correspond to the dimensions of the data, while the columns correspond to the mixture components (partitions).

References

C. Fraley, Estimating Outlier Probabilities for Large Datasets, 2017.

See Also

leader, partProb

Examples

1
2
3
4
5
6
7
8
 radius.default <- LWradius(nrow(faithful),ncol(faithful))
 lead <- leader(faithful, radius = c(0,radius.default))

# (simulated) data for outlier statistic (no simulation for radius = 0)
 sim <- lapply( lead, simData)

# components of simData output
 lapply( sim, names)

probout documentation built on Feb. 11, 2022, 5:10 p.m.