rbfDataGen: A data generator based on RBF network
In semiArtificial: Generator of Semi-Artificial Data

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/rbfDataGen.R

Using given formula and data the method builds a RBF network and extracts its properties thereby preparing a data generator which can be used with newdata.RBFgenerator method to generate semi-artificial data.

1 2	rbfDataGen(formula, data, eps=1e-4, minSupport=1, nominal=c("encodeBinary","asInteger"))

`formula`	A formula specifying the response and variables to be modeled.
`data`	A data frame with training data.
`eps`	The minimal probability considered in data generator to be larger than 0.
`minSupport`	The minimal number of instances defining a Gaussian kernel to copy the kernel to the data generator.
`nominal`	The way how to treat nominal features. The option `"asInteger"` converts factors into integers and treats them as numeric features. The option `"encodeBinary"` converts each nominal attribute into a set of binary features, which encode the nominal value, e.g., for three valued attribute three binary attributes are constructed, each encoding a presence of one nominal value with 0 or 1.

Parameter formula is used as a mechanism to select features (attributes) and the prediction variable (response, class). Only simple terms can be used and interaction terms are not supported. The simplest way is to specify just the response variable using e.g. class ~ .. See examples below.

A RBF network is build using rbfDDA from RSNNS package. The learned Gaussian kernels are extracted and used in data generation with newdata.RBFgenerator method.

The created model is returned as a structure of class RBFgenerator, containing the following items:

`noGaussians`	The number of extracted Gaussian kernels.
`centers`	A matrix of Gaussian kernels' centers, with one row for each Gaussian kernel.
`probs`	A vector of kernel probabilities. Probabilities are defined as relative frequencies of training set instances with maximal activation in the given kernel.
`unitClass`	A vector of class values, one for each kernel.
`bias`	A vector of kernels' biases, one for each kernel. The bias is multiplied by the kernel activation to produce output value of given RBF network unit.
`spread`	A matrix of estimated variances for the kernels, one row for each kernel. The j-th value in i-th row represents the variance of training instances for j-th attribute with maximal activation in i-th Gaussian.
`gNoActivated`	A vector containing numbers of training instances with maximal activation in each kernel.
`noAttr`	The number of attributes in training data.
`datNames`	A vector of attributes' names.
`originalNames`	A vector of original attribute names.
`attrClasses`	A vector of attributes' classes (i.e., data types like `numeric` or `factor`).
`attrLevels`	A list of levels for discrete attributes (with class `factor`).
`attrOrdered`	A vector of type logical indicating whether the attribute is `ordered` (only possible for attributes of type `factor`.
`normParameters`	A list of parameters for normalization of attributes to [0,1].
`noCol`	The number of columns in the internally generated data set.
`isDiscrete`	A vector of type logical, each value indicating whether a respective attribute is discrete.
`noAttrGen`	The number of attributes to generate.
`nominal`	The value of parameter `nominal`.

Marko Robnik-Sikonja

Marko Robnik-Sikonja: Not enough data? Generate it!. Technical Report, University of Ljubljana, Faculty of Computer and Information Science, 2014

Other references are available from http://lkm.fri.uni-lj.si/rmarko/papers/

newdata.RBFgenerator.

# use iris data set, split into training and testing, inspect the data
set.seed(12345)
train <- sample(1:nrow(iris),size=nrow(iris)*0.5)
irisTrain <- iris[train,]
irisTest <- iris[-train,]

# inspect properties of the original data
plot(irisTrain, col=irisTrain$Species)
summary(irisTrain)

# create rbf generator
irisGenerator<- rbfDataGen(Species~.,irisTrain)

# use the generator to create new data
irisNew <- newdata(irisGenerator, size=200)

#inspect properties of the new data
plot(irisNew, col = irisNew$Species) #plot generated data
summary(irisNew)

  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width         Species  
 Min.   :4.400   Min.   :2.200   Min.   :1.000   Min.   :0.10   setosa    :27  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.20   versicolor:23  
 Median :5.700   Median :3.100   Median :4.200   Median :1.30   virginica :25  
 Mean   :5.863   Mean   :3.099   Mean   :3.721   Mean   :1.18                  
 3rd Qu.:6.550   3rd Qu.:3.400   3rd Qu.:5.300   3rd Qu.:1.80                  
 Max.   :7.900   Max.   :4.200   Max.   :6.400   Max.   :2.50                  
  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width    
 Min.   :4.402   Min.   :2.220   Min.   :1.020   Min.   :0.1038  
 1st Qu.:5.047   1st Qu.:2.755   1st Qu.:1.545   1st Qu.:0.2897  
 Median :5.823   Median :3.024   Median :4.214   Median :1.1556  
 Mean   :5.856   Mean   :3.032   Mean   :3.689   Mean   :1.1286  
 3rd Qu.:6.586   3rd Qu.:3.252   3rd Qu.:5.349   3rd Qu.:1.8957  
 Max.   :7.716   Max.   :4.192   Max.   :6.396   Max.   :2.4999  
       Species  
 setosa    :72  
 versicolor:62  
 virginica :66