dataMining: Data pre-processing utilities

dataMiningR Documentation

Data pre-processing utilities

Description

Collection of functions for discretizing, standardizing, converting factors to characters and other usufull methods for pre-processing datasets.

Usage

whichDiscrete(dataset, discreteVariables)

discreteVariables_as.character(dataset, discreteVariables)

standardizeDataset(dataset)

discretizeVariablesEWdis(dataset, numIntervals, factor = FALSE, binary = FALSE)

discreteVariablesStates(namevariables, discreteData)

nstates(DiscreteVariablesStates)

quantileIntervals(X, numIntervals)

scaleData(dataset, scale)

Arguments

dataset

A dataset of class "data.frame". Tha variables of the dataset can be discrete and continuous.

discreteVariables

A "character" array with the names of the discrete variables.

numIntervals

Number of bins used to discretize the continuous variables.

factor

A boolean value indicating if the variables should be considered as "factor" or as "character". By default it is set to FALSE.

binary

By default it is set to FALSE, indicating that only binary entries are used for continuous variables; a TRUE value means that binary entries are used to discretize the full dataset taking into account the states the discrete variables.

namevariables

an array with the names of the varibles.

discreteData

A discretized dataset of class "data.frame".

DiscreteVariablesStates

The output of the function discreteVariablesStates.

X

A "numeric" vector with the data values of a continuous variable.

scale

A "numeric" vector (when it refers to a single variable) or a "list" containing the name(s) of the variable(s) and the scale value.

Details

whichDiscrete() selects the position of the discrete variables.

discreteVariables_as.character() transforms the values of the discrete variables into character values.

standardizeDataset() standardizes all the variables in a data set.

discretizeVariablesEWdis() discretizes the continuous variables in a dataset using equal width binning.

discreteVariablesStates() extracts the states of the qualitative variables.

nstates() computes the number of different values of the discrete variables.

quantileIntervals() gets the quantiles of a variable taking into account the number of intervals into which its domain is splitted.

Examples

## dataset: 2 continuous variables, 1 discrete variable.
data <- data.frame(X = rnorm(100),Y = rexp(100,1/2), Z = as.factor(rep(c("s","a"), 50)))
disVar <- "Z" ## Discrete variable
class(data[,disVar]) ## factor

data <- discreteVariables_as.character(dataset = data, discreteVariables = disVar)
class(data[,disVar]) ## character

whichDiscrete(dataset = data, discreteVariables = "Z")

standData <- standardizeDataset(dataset = data)

disData <- discretizeVariablesEWdis(dataset = data, numIntervals = 3)

l <- discreteVariablesStates(namevariables = names(data), discreteData = disData)

nstates(DiscreteVariablesStates = l)

## Continuous variables
quantileIntervals(X = data[,1], numIntervals = 4)
quantileIntervals(X = data[,2], numIntervals = 10)

MoTBFs documentation built on April 18, 2022, 5:06 p.m.

Related to dataMining in MoTBFs...