Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/DSD_Gaussians.R

A data stream generator that produces a data stream with a mixture of static Gaussians.

1 | ```
DSD_Gaussians(k=2, d=2, mu, sigma, p, separation=0.2, noise=0, noise_range)
``` |

`k` |
Determines the number of clusters. |

`d` |
Determines the number of dimensions. |

`mu` |
A matrix of means for each dimension of each cluster. |

`sigma` |
A list of length |

`p` |
A vector of probabilities that determines the likelihood of generated a data point from a particular cluster. |

`separation` |
Minimum distance between cluster centers to reduce overlap between clusters (0-.8). |

`noise` |
Noise probability between 0 and 1. Noise is uniformly distributed within noise range (see below). |

`noise_range` |
A matrix with d rows and 2 columns. The first column contains the minimum values and the second column contains the maximum values for noise. |

`DSD_Gaussians`

creates a mixture of `k`

`d`

-dimensional
static Gaussians in approximately unit space.
The centers `mu`

and the covariance matrices `sigma`

can be supplied or will be randomly generates. The probability vector `p`

defines for each cluster the probability that the next data point will
be chosen from it (defaults to equal probability).

The generation method is similar to the one suggested by Jain and Dubes (1988).

Returns a `DSD_Gaussians`

object (subclass of
`DSD_R`

, `DSD`

) which is a list of the defined
params. The params are either passed in from the function or
created internally. They include:

`description` |
A brief description of the DSD object. |

`k` |
The number of clusters. |

`d` |
The number of dimensions. |

`mu` |
The matrix of means of the dimensions in each cluster. |

`sigma` |
The covariance matrix. |

`p` |
The probability vector for the clusters. |

`noise` |
A flag that determines if or if not noise is generated. |

Michael Hahsler

Jain and Dubes(1988) Algorithms for clustering data, Prentice-Hall, Inc., Upper Saddle River, NJ, USA.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 | ```
# create data stream with three clusters in 3-dimensional data space
stream1 <- DSD_Gaussians(k=3, d=3)
plot(stream1)
# create data stream with specified clusterpositions,
# 20% noise in a given bounding box and
# with different densities (1 to 9 between the two clusters)
stream2 <- DSD_Gaussians(k=2, d=2,
mu=rbind(c(-.5,-.5), c(.5,.5)),
noise=0.2, noise_range=rbind(c(-1,1),c(-1,1)),
p=c(.1,.9))
plot(stream2)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.