ks.load_datamix: ks.load_datamix

Description Usage Arguments Value

View source: R/ks.load_datamix.R

Description

This function loads the data created in preparation phase. It requires the output constructed by 'ks.prepare_split' function to be placed in working directory ('wd'), thus files 'mixed_train.csv', 'mixed_test.csv' and 'mixed_valid.csv' have to exist in the directory. For imbalanced data, the fuction can perform balancing using: 1. ROSE: https://journal.r-project.org/archive/2014/RJ-2014-008/RJ-2014-008.pdf - by default we generate 10 * number of cases in orginal dataset. 2. SMOTE (default): https://arxiv.org/abs/1106.1813 - by defult we use 'perc.under=100' and 'k=10'.

Usage

1
2
3
4
5
6
7
ks.load_datamix(
  wd = getwd(),
  smote_over = 10000,
  use_smote_not_rose = T,
  replace_smote = F,
  selected_miRNAs = NULL
)

Arguments

wd

Working directory with files for the loading.

smote_over

Oversampling of minority class in SMOTE function (deterimes the number of cases in final dataset). See 'perc.over' in 'DMwR::SMOTE()“ function.

use_smote_not_rose

Set TRUE for SMOTE instead of ROSE.

replace_smote

For some analyses we may want to replace imbalanced train dataset with balanced dataset. This saved coding time in some functions.

selected_miRNAs

If null - take all features staring with "hsa", if set - vector of feature names to be selected.

Value

The list of objects in the following order: train, test, valid, train_smoted, trainx, trainx_smoted. (trainx contains only the miRNA data without metadata)


kstawiski/miRNAselector documentation built on Oct. 10, 2020, 9:03 a.m.