Data set discretization and formatting

Description

Removes rows containing missing data, and discretizes the data set using Minimum Description Length Partitioning (MDLP).

Usage

1
data_disc(data, n_train = NULL, missing = "?")

Arguments

data

Data frame, where the last column must be the class variable.

n_train

Number of data frame rows to use as the training set - the rest are used for the test set. If NULL, all rows are used for training, and there is no test set (default=NULL).

missing

Label that denotes missing values in your data frame (default='?').

Value

A discretized data set:

TrainX

Matrix containing the training data.

TrainY

Vector containing the class labels for the training data.

TestX

Matrix containing the test data (optional).

TestY

Vector containing the class labels for the test data (optional).

Examples

1
2
data(iris)
iris_disc = data_disc(iris)