generate.row: Generate a raw with statistical properties

Description Usage Arguments Details Value Author(s) See Also

Description

Generate a raw with statistical properties

Usage

1
2
3
generate.row(dim = 10, subspaces = list(c(3, 4), c(7, 8)),
  margins = list(0.9, 0.9), dependency = "Wall", prop = 0.01,
  proptype = "proportional", discretize = 0)

Arguments

dim

Number of dimension of the generated vector

subspaces

List of subspaces that contain a dependency.

margins

List of margins that correspond to each subspace.

dependency

Type of dependency for the subspaces (We don't support 'mixed') currently

prop

Probability of a point belonging to the hidden space of a subspace to become an outlier.

proptype

Type of the proportion of outliers. Value "proportional": depend on the size of the empty space. Value "absolute": same absolute proportion per subspace.

Details

The row is at first drawn from the uniform distribution between 0 and 1 over each dimension For each subspace in subspaces, if the point belongs to the hidden space of the specified dependency (i.e. for the "Wall", the value for each dimension of the subspace if bigger than the 1-margin), then it becomes an outlier with probability prop. If it is an outlier, it would stay in the hidden space and we make sure it is not too close from the border by rescaling the value so that it is at least 10 If it is not an outlier, we map uniformly the point to the dependencies (i.e. for "Wall", one of the axis or center (for a 3-D dimensions)) The probability to be mapped to one of such element is determined by their volume The bigger the area, the more likely the point will be mapped to the area. In the case a point is already an outlier in a subspace, it cannot become and outlier in an overlapping subspace But it still has the same probability to become an outlier in another space. This appends occasionally (pigeonhole principle)

If the point if not an outlier, it will have 0 as label, otherwise, it has a string representing the different subspaces

Value

A list with 2 elements where data contains the generated vector and labels the corresponding label.

Author(s)

Edouard Fouché, edouard.fouche@kit.edu

See Also


edouardfouche/R-streamgenerator documentation built on May 15, 2019, 11:02 p.m.