OSTSC: Over Sampling for Time Series Classification
In lweicdsor/OSTSC: Over Sampling for Time Series Classification

Description Usage Arguments Details Value References Examples

Oversample a univariate, multi-modal time series sequence of imbalanced classified data.

1 2	OSTSC(sample, label, class, ratio = 1, per = 0.8, r = 1, k = 5, m = 15, parallel = TRUE, progBar = TRUE)

`sample`	Univariate sequence data samples
`label`	Labels corresponding to samples
`class`	The number of the classes to be oversampled, starting from the class with the fewest observations, with the default setting to progress to as many classes as possible.
`ratio`	The oversampling ratio number (>=1) (default = 1)
`per`	Ratio of weighting between ESPO and ADASYN (default = 0.8)
`r`	A scalar ratio specifying which level (towards the boundary) we shall push the synthetic data in ESPO (default = 1)
`k`	Number of nearest neighbours in k-NN (for ADASYN) algorithm (default = 5)
`m`	Seeds from the positive class in m-NN (for ADASYN) algorithm (default = 15)
`parallel`	Whether to execute in parallel mode (default = TRUE). (Recommended for datasets with over 30,000 records.)
`progBar`	Whether to include progress bars (default = TRUE). For ESPO approach, the bar charactor is \|——–\|100%. For ADASYN approach, the bar charactor is \|========\|100%.

This function balances univariate imbalance time series data based on structure preserving oversampling.

sample: the time series sequences data oversampled

label: the label corresponding to each row of records

H. Cao, X.-L. Li, Y.-K. Woon and S.-K. Ng, "Integrated Oversampling for Imbalanced Time Series Classification" IEEE Trans. on Knowledge and Data Engineering (TKDE), vol. 25(12), pp. 2809-2822, 2013

H. Cao, V. Y. F. Tan and J. Z. F. Pang, "A Parsimonious Mixture of Gaussian Trees Model for Oversampling in Imbalanced and Multi-Modal Time-Series Classification" IEEE Trans. on Neural Network and Learning System (TNNLS), vol. 25(12), pp. 2226-2239, 2014

H. Cao, X. L. Li, Y. K. Woon and S. K. Ng, "SPO: Structure Preserving Oversampling for Imbalanced Time Series Classification" Proc. IEEE Int. Conf. on Data Mining ICDM, pp. 1008-1013, 2011

# This is a simple example to show the usage of OSTSC. See the vignetter for a tutorial 
# demonstrating more complex examples.
# Example one
# loading data
data(Dataset_Synthetic_Control)
# get split feature and label data 
train.label <- Dataset_Synthetic_Control$train.y
train.sample <- Dataset_Synthetic_Control$train.x
# the first dimension of the feature set and labels must be the same
# the second dimension of the feature set is the sequence length
dim(train.sample)
dim(train.label)
# check the imbalance ratio of the data
table(train.label)
# oversample class 1 to the same number of observations as class 0
MyData <- OSTSC(train.sample, train.label, parallel = FALSE)
# store the feature data after oversampling
x <- MyData$sample
# store the label data after oversampling
y <- MyData$label
# check the imbalance of the data
table(y)
# Example two
# loading data
ecg <- Dataset_ECG()
# get split feature and label data 
train.label <- ecg$train.y
train.sample <- ecg$train.x
# the first dimension of the feature set and labels must be the same
# the second dimension of the feature set is the sequence length
dim(train.sample)
dim(train.label)
# check the imbalance ratio of the data
table(train.label)
# oversample minority class to the same number of observations as majority classes
MyData <- OSTSC(train.sample, train.label, parallel = FALSE)
# store the feature data after oversampling
x <- MyData$sample
# store the label data after oversampling
y <- MyData$label
# check the imbalance of the data
table(y)