OSTSC: Over Sampling for Time Series Classification

Description Usage Arguments Details Value References Examples

View source: R/OSTSC.R

Description

Oversample a univariate, multi-modal time series sequence of imbalanced classified data.

Usage

1
2
OSTSC(sample, label, class, ratio = 1, per = 0.8, r = 1, k = 5,
  m = 15, parallel = TRUE, progBar = TRUE)

Arguments

sample

Univariate sequence data samples

label

Labels corresponding to samples

class

The number of the classes to be oversampled, starting from the class with the fewest observations, with the default setting to progress to as many classes as possible.

ratio

The oversampling ratio number (>=1) (default = 1)

per

Ratio of weighting between ESPO and ADASYN (default = 0.8)

r

A scalar ratio specifying which level (towards the boundary) we shall push the synthetic data in ESPO (default = 1)

k

Number of nearest neighbours in k-NN (for ADASYN) algorithm (default = 5)

m

Seeds from the positive class in m-NN (for ADASYN) algorithm (default = 15)

parallel

Whether to execute in parallel mode (default = TRUE). (Recommended for datasets with over 30,000 records.)

progBar

Whether to include progress bars (default = TRUE). For ESPO approach, the bar charactor is |——–|100%. For ADASYN approach, the bar charactor is |========|100%.

Details

This function balances univariate imbalance time series data based on structure preserving oversampling.

Value

sample: the time series sequences data oversampled

label: the label corresponding to each row of records

References

H. Cao, X.-L. Li, Y.-K. Woon and S.-K. Ng, "Integrated Oversampling for Imbalanced Time Series Classification" IEEE Trans. on Knowledge and Data Engineering (TKDE), vol. 25(12), pp. 2809-2822, 2013

H. Cao, V. Y. F. Tan and J. Z. F. Pang, "A Parsimonious Mixture of Gaussian Trees Model for Oversampling in Imbalanced and Multi-Modal Time-Series Classification" IEEE Trans. on Neural Network and Learning System (TNNLS), vol. 25(12), pp. 2226-2239, 2014

H. Cao, X. L. Li, Y. K. Woon and S. K. Ng, "SPO: Structure Preserving Oversampling for Imbalanced Time Series Classification" Proc. IEEE Int. Conf. on Data Mining ICDM, pp. 1008-1013, 2011

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# This is a simple example to show the usage of OSTSC. See the vignetter for a tutorial 
# demonstrating more complex examples.
# loading data
data(Dataset_Synthetic_Control)
# get split feature and label data 
train.label <- Dataset_Synthetic_Control$train.y
train.sample <- Dataset_Synthetic_Control$train.x
# the first dimension of the feature set and labels must be the same
# the second dimension of the feature set is the sequence length
dim(train.sample)
dim(train.label)
# check the imbalance ratio of the data
table(train.label)
# oversample class 1 to the same number of observations as class 0
MyData <- OSTSC(train.sample, train.label, parallel = FALSE)
# store the feature data after oversampling
x <- MyData$sample
# store the label data after oversampling
y <- MyData$label
# check the imbalance of the data
table(y)

OSTSC documentation built on May 2, 2019, 5:16 a.m.