newdata: Generate semi-artificial data using a generator

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/rbfDataGen.R

Description

Using a generator build with rbfDataGen or treeEnsemble the method generates size new instances.

Usage

1
2
3
4
5
6
7
8
## S3 method for class 'RBFgenerator'
newdata(object, size, var=c("estimated","Silverman"), 
                               classProb=NULL, defaultSpread=0.05, ... )
## S3 method for class 'TreeEnsemble'
newdata(object, fillData=NULL, 
                               size=ifelse(is.null(fillData),1,nrow(fillData)), 
                               onlyPath=FALSE, classProb=NULL, 
                               predictClass=FALSE, ...) 

Arguments

object

An object of class RBFgenerator or TreeEnsemble containing a generator structure as returned by rbfDataGen or treeEnsemble, respectively.

fillData

A dataframe with part of the values already specified. All missing values (i.e. NA values) are filled in by the generator.

size

A number of instances to generate. By default this is one instance, or in the case of existing fillData this is the number of rows in that dataframe.

var

For the generator of type RBFgenerator the parameter var determines the method of kernel width (variance) estimation. Supported options are "estimated" and "Silverman".

classProb

For classification problems, a vector of desired class value probability distribution. Default value classProb=NULL uses probability distribution of the generator's training instances.

defaultSpread

For the generator of type RBFgenerator the parameter is a numeric value replacing zero spread in case var="estimated" is used. The value defaultSpread=NULL keeps zero spread values.

onlyPath

For the generator of type TreeEnsemble and attribute density data in the leaves (densityData="leaf"), the parameter is a boolean variable indicating if only attributes on the path from the root to the leaf are generated in the leaf. If onlyPath=FALSE all value are generated in the first randomly chosen leaf of a tree, else only attributes on the path are generated and then the next random tree is selected.

predictClass

For classification problems and the generator of type TreeEnsemble the parameter determines if the class value is set through prediction with the forest (the constructed generator serves as a predictor) or set according to the class value distribution of the selected leaf.

...

Additional parameters passed to density estimation functions kde, logspline, and quantile.

Details

The function uses the object structure as returned by rbfDataGen or treeEnsemble. In case of RBFgenerator the object contains descriptions of the Gaussian kernels, which model the original data. The kernels are used to generate a required number of new instances. The kernel width of provided kernels can be set in two ways. By setting var="estimated" the estimated spread of the training instances that have the maximal activation value for the particular kernel is used. Using var="Silverman" width is set by the generalization of Silverman's rule of thumb to multivariate case (unreliable for larger dimensions).

In case of TreeEnsemble generator no additional parameters are needed, except for the number of generated instances.

Value

The method returns a data.frame object with required number of instances.

Author(s)

Marko Robnik-Sikonja

See Also

rbfDataGen, treeEnsemble.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# inspect properties of the iris data set
plot(iris, col=iris$Species)
summary(iris)

# create RBF generator
irisRBF<- rbfDataGen(Species~.,iris)
# create treesemble  generator
irisEnsemble<- treeEnsemble(Species~.,iris,noTrees=10)


# use the generator to create new data with both generators
irisNewRBF <- newdata(irisRBF, size=150)
irisNewEns <- newdata(irisEnsemble, size=150)

#inspect properties of the new data
plot(irisNewRBF, col = irisNewRBF$Species) #plot generated data
summary(irisNewRBF)
plot(irisNewEns, col = irisNewEns$Species) #plot generated data
summary(irisNewEns)

Example output

  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
 Median :5.800   Median :3.000   Median :4.350   Median :1.300  
 Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
       Species  
 setosa    :50  
 versicolor:50  
 virginica :50  
                
                
                
  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width    
 Min.   :4.355   Min.   :2.023   Min.   :1.007   Min.   :0.1020  
 1st Qu.:5.235   1st Qu.:2.783   1st Qu.:1.636   1st Qu.:0.2972  
 Median :5.699   Median :3.054   Median :4.025   Median :1.2531  
 Mean   :5.902   Mean   :3.179   Mean   :3.736   Mean   :1.1747  
 3rd Qu.:6.501   3rd Qu.:3.703   3rd Qu.:5.166   3rd Qu.:1.8621  
 Max.   :7.881   Max.   :4.343   Max.   :6.736   Max.   :2.4829  
       Species  
 setosa    :50  
 versicolor:50  
 virginica :50  
                
                
                
  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
 Min.   :4.400   Min.   :2.000   Min.   :1.000   Min.   :0.100  
 1st Qu.:5.100   1st Qu.:2.700   1st Qu.:1.500   1st Qu.:0.300  
 Median :5.700   Median :3.000   Median :4.200   Median :1.330  
 Mean   :5.832   Mean   :3.038   Mean   :3.621   Mean   :1.196  
 3rd Qu.:6.400   3rd Qu.:3.316   3rd Qu.:4.900   3rd Qu.:1.800  
 Max.   :7.828   Max.   :4.159   Max.   :6.184   Max.   :2.500  
       Species  
 setosa    :50  
 versicolor:50  
 virginica :50  
                
                
                

semiArtificial documentation built on Sept. 24, 2021, 1:07 a.m.