generate.sample4: Sample4 generator of synthetic data

generate.sample4R Documentation

Sample4 generator of synthetic data

Description

Multivariate normally distributed data synthetic generator. Data sets with 5 clusters are randomly generated. n 6000-dimensional examples for each class are generated. All classes (each one of n examples) have 1000 no-noisy and 5000 noisy features but there is substantial overlapping between distributions underlying classes 1 and 2 and 1 and 3, while class 4 and 5 are separated. The first class (first n examples) has its no noisy variables centered in 0. The second class (second n examples) has its no noisy variables centered in 1. The third class (third n examples) has its no noisy variables centered in -1. The fourth class (fourth n examples) has its no noisy variables centered in 5. The fifth class (fifth n examples) has its no noisy variables centered in -5. The diagonal of the covariance matrix for all classes has its elements equal to sigma (first 1000 variables) and equal to 2*sigma (last 5000 variables).

Usage

generate.sample4(n = 2, sigma = 1)

Arguments

n

number of examples for each class

sigma

standard deviation of the first 1000 variables. The remaining variables have 2*sigma standard deviation

Value

a real data matrix with 1000 rows (variables) and n*5 columns (examples)

Author(s)

Giorgio Valentini valentini@di.unimi.it

Examples

generate.sample4()
# Generation of a data set with 100 6000-dimensional examples
generate.sample4(n = 20, sigma = 1)

clusterv documentation built on June 8, 2025, 10:21 a.m.