performanceCompare: Evaluate similarity of two data sets based on predictive...
In semiArtificial: Generator of Semi-Artificial Data

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/dataQuality.R

Depending on the type of problem (classification or regression), a classification performance (accuracy, AUC, brierScore, etc) or regression performance (RMSE, MSE, MAE, RMAE, etc) on two data sets is used to compare the similarity of two data sets.

1	performanceCompare(data1, data2, formula, model="rf", stat=NULL, ...)

`data1`	A `data.frame` containing the reference data.
`data2`	A `data.frame` with the same number and names of columns as `data1`.
`formula`	A `formula` specifying the response and predictive variables.
`model`	A predictive model used for performance comparison. The default value "rf" stands for random forest, but any classification or regression model supported by function `CoreModel` in CORElearn package can be used.
`stat`	A statistics used as performance indicator. The default value is NULL and means that for classification "accuracy" is used, and for regression "RMSE"" (relative mean squared error) is used. Other values supported and output by `modelEval` from CORElearn package can be used e.g., "AUC" or "brierScore".
`...`	Additional parameters passed to `CoreModel` function.

The function compares data stored in data1 with data2 by comparing models constructed on data1 and evaluated on both data1 and data2 with models built on data2 and evaluated on both data1 and data2. The difference between these performances are indicative on similarity of the data sets if used in machine learning and data mining. The performance indicator used is determined by parameter stat.

The method returns a list of performance indicators computed on both data sets:

`diff.m1`	The difference between performance of model built on `data1` (and evaluated on both `data1` and `data2`.)
`diff.m2`	The difference between performance of model built on `data2` (and evaluated on both `data1` and `data2`.)
`perf.m1d1`	The performance of model built on `data1` on `data1`.
`perf.m1d2`	The performance of model built on `data1` on `data2`.
`perf.m2d1`	The performance of model built on `data2` on `data1`.
`perf.m2d2`	The performance of model built on `data2` on `data2`.

Marko Robnik-Sikonja

newdata.RBFgenerator.

# use iris data set

# create RBF generator
irisGenerator<- rbfDataGen(Species~.,iris)

# use the generator to create new data
irisNew <- newdata(irisGenerator, size=200)

# compare statistics of original and new data
performanceCompare(iris, irisNew, Species~.)