datasetReader: Read Dataset File into Memory.

Description Usage Arguments Value Examples

View source: R/datasetReader.R

Description

Read the file of the training and testing dataset, and perform preprocessing and data cleaning if necessary.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
datasetReader(
  directory,
  testDirectory,
  selectedFeats = c(),
  classCol = "class",
  preProcessF = "N",
  featuresToPreProcess = c(),
  nComp = NA,
  missingVal = c("NA", "?", " "),
  missingOpr = 0
)

Arguments

directory

String of the directory to the file containing the training dataset.

testDirectory

String of the directory to the file containing the testing dataset.

selectedFeats

Vector of numbers of features columns to include from the training set and ignore the rest of columns - In case of empty vector, this means to include all features in the dataset file (default = c()).

classCol

String of the name of the class label column in the dataset (default = 'class').

preProcessF

string containing the name of the preprocessing algorithm (default = 'N' –> no preprocessing):

  • "boxcox" - apply a Box–Cox transform and values must be non-zero and positive in all features,

  • "yeo-Johnson" - apply a Yeo-Johnson transform, like a BoxCox, but values can be negative,

  • "zv" - remove attributes with a zero variance (all the same value),

  • "center" - subtract mean from values,

  • "scale" - divide values by standard deviation,

  • "standardize" - perform both centering and scaling,

  • "normalize" - normalize values,

  • "pca" - transform data to the principal components,

  • "ica" - transform data to the independent components.

featuresToPreProcess

Vector of number of features to perform the feature preprocessing on - In case of empty vector, this means to include all features in the dataset file (default = c()) - This vector should be a subset of selectedFeats.

nComp

Integer of Number of components needed if either "pca" or "ica" feature preprocessors are needed.

missingVal

Vector of strings representing the missing values in dataset (default: c('NA', '?', ' ')).

missingOpr

Boolean variable represents either delete instances with missing values or apply imputation using "MICE" library which helps you imputing missing values with plausible data values that are drawn from a distribution specifically designed for each missing datapoint- (default = 0 –> delete instances).

Value

List of the TrainingSet Train and TestingSet Test.

Examples

1
2
3
4
## Not run: 
dataset <- datasetReader('/Datasets/irisTrain.csv', '/Datasets/irisTest.csv')

## End(Not run)

DataSystemsGroupUT/SmartML documentation built on Nov. 24, 2020, 1:31 p.m.