synData: Decision table generator.
In mategarb/R.ROSETTA: R.ROSETTA: an interpretable machine learning framework

Description Usage Arguments Value Author(s) Examples

View source: R/synData.R

Creates a decision table of correlated features.

1
2
3

synData(nFeatures=c(10,5,3,2,2), rf=c(0.2,0.2,0.2,0.2,0.2), rd=c(0.4,0.5,0.6,0.7,0.8),
nObjects=120, nOutcome=2,distribution="uniform", unbalanced=F, pUnbalancedClass=0.8, 
discrete=F, levels=4, labels=c("A","C","G","T"), binProb=0.5, seed=1)

`nFeatures`	A numeric vector of features proportions. The default is c(10,5,3,2,2).
`rf`	A numeric vector of correlations within feature set.
`rd`	A numeric vector of correlations between each feature and decision.
`nObjects`	A numeric value of objects number. The default is 120.
`nOutcome`	A numeric value of number of decision classes. The default is 2.
`distribution`	A character value of the name of the distribution. For discrete data choose betwen "uniform" and "binomial". For non-discrete data choose between "uniform" or "normal". The default is "uniform".
`unbalanced`	Logical. Set TRUE to generate unbalanced data. Default is FALSE.
`pUnbalancedClass`	A numeric value of number of unbalanced proportion for the first class. The default is 0.8.
`discrete`	Logical. Set TRUE to generate discrete data. Default is FALSE.
`levels`	A numeric value of discretization levels. The default is 4.
`labels`	A character vector of discretization labels for levels of discretization.
`binProb`	A numeric value of probability for binomial distribution. The default is 0.5.
`seed`	A numeric value of seed. The default is 1.

output

A data frame with a decision table.

Mateusz Garbulowski

library(R.ROSETTA)

### continuous data ###

## weak correlation
df1 <- synData(nFeatures=c(5,5,3,2,2), rf=c(0.2,0.3,0.2,0.4,0.4), rd=c(0.2,0.3,0.4,0.3,0.4))
out1 <- rosetta(df1)
out1$quality ## accuracy = 60%

## medium correlation
df2 <- synData(nFeatures=c(5,5,3,2,2), rf=c(0.2,0.3,0.2,0.4,0.4), rd=c(0.4,0.4,0.6,0.6,0.7))
out2 <- rosetta(df2)
out2$quality ## accuracy = 75%

## strong correlation
df3 <- synData(nFeatures=c(5,5,3,2,2), rf=c(0.2,0.3,0.2,0.4,0.4), rd=c(0.5,0.7,0.7,0.8,0.8))
out3 <- rosetta(df3)
out3$quality ## accuracy = 90%

### discrete data ###

dfd <- synData(nFeatures=c(5,5,3,2,2), rf=c(0.2,0.3,0.2,0.4,0.4), 
               rd=c(0.2,0.3,0.4,0.5,0.6), discrete = T, levels = 3, labels = c("low", "medium", "high"))
outd <- rosetta(dfd, discrete = T)
outd$quality ## accuracy = 85%