learnPattern: Learn Local Auto-Patterns for Time Series Representation and...

Description Usage Arguments Value Note Author(s) References See Also Examples

Description

learnPattern implements ensemble of regression trees (based on Breiman and Cutler's original Fortran code) to learn local auto-patterns for time series representation. Ensemble of regression trees are used to learn an autoregressive model. A local time-varying autoregressive behavior is learned by the ensemble.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
## Default S3 method:
learnPattern(x,
   segment.factor=c(0.05,0.95),
   random.seg=TRUE, target.diff=TRUE, segment.diff=TRUE, 
   random.split=0,
   ntree=200,
   mtry=1,
   replace=FALSE,
   sampsize=if (replace) ceiling(0.632*nrow(x)) else nrow(x),
   maxdepth=6,
   nodesize=5,
   do.trace=FALSE,
   keep.forest=TRUE,
   oob.pred=FALSE,
   keep.errors=FALSE, 
   keep.inbag=FALSE, ...)
## S3 method for class 'learnPattern'
print(x, ...)

Arguments

x

time series database as a matrix in UCR format. Rows are univariate time series, columns are observations (for the print method, a learnPattern object).

segment.factor

The proportion of the time series length to be used for both predictors and targets, if random.seg is TRUE (default), minimum and maximum factor should be provided as array of length two.

random.seg

TRUE if segment length is random between thresholds defined by segment.factor

target.diff

Can target segment be a difference feature?

segment.diff

Can predictor segments be difference feature?

random.split

Type of the split. If set to zero (0), splits are generated based on decrease in SSE in target segment Setting of one (1) generates the split value randomly between max and min values. Setting of two (2) generates a kd-tree type of split (i.e. median of the values at each node is chosen as the split).

ntree

Number of trees to grow. Larger number of trees are preferred if there is no concern regarding the computation time.

mtry

Number of predictor segments randomly sampled as candidates at each split. Note that it is preset to 1 for now.

replace

Should bagging of time series be done with replacement? All training time series are used if FALSE (default).

sampsize

Size(s) of sample to draw with replacement if replace is set to TRUE

maxdepth

The maximum depth of the trees in the ensemble.

nodesize

Minimum size of terminal nodes. Setting this number larger causes smaller trees to be grown (and thus take less time).

do.trace

If set to TRUE, give a more verbose output as learnPattern is run. If set to some integer, then running output is printed for every do.trace trees.

keep.forest

If set to FALSE, the forest will not be retained in the output object.

oob.pred

if replace is set to TRUE, predictions for the time series observations are returned.

keep.errors

If set to TRUE, the mean square error (MSE) of target prediction over target segments is evaluated for each tree. If oob.pred=TRUE, this information is evaluated on “out-of-bag” samples at each tree.

keep.inbag

Should an n by ntree matrix be returned that keeps track of which samples are “in-bag” in which trees

...

optional parameters to be passed to the low level function learnPattern.

Value

An object of class learnPattern, which is a list with the following components:

call

the original call to learnPattern.

type

regression

segment.factor

the proportion of the time series length to be used for both predictors and targets.

segment.length

used segment length settings by the trees of ensemble

nobs

number of observations in a segment

ntree

number of trees grown

maxdepth

maximum depth level for each tree

mtry

number of predictor segments sampled for spliting at each node.

target

starting time of the target segment for each tree.

target.type

type of the target segment; 1 if observed series, 2 if difference series.

forest

a list that contains the entire forest; NULL if keep.forest=FALSE.

oobprediction

predicted observations based on “out-of-bag” time series are returned if oob.pred=TRUE

ooberrors

Mean square error (MSE) over the trees evaluated using the predicted observations on “out-of-bag” time series is returned if oob.pred=TRUE.

inbag

n by ntree matrix be returned that keeps track of which samples are “in-bag” in which trees if keep.inbag=TRUE

errors

Mean square error (MSE) of target prediction over target segments for each tree. If oob.pred=TRUE, Mean square error (MSE) is reported based on “out-of-bag” samples at each tree.

Note

OOB predictions may have missing values (i.e. NA) if time series is not left out-of-bag during computations. Even, it is left out-of-bag, there is a potential of some observations (i.e. time frames) not being selected as the target. In such cases, there will no OOB predictions.

Author(s)

Mustafa Gokce Baydogan baydoganmustafa@gmail.com, based on original Fortran code by Leo Breiman and Adele Cutler, R port by Andy Liaw and Matthew Wiener.

References

Baydogan, M. G. (2013), “Learned Pattern Similarity“, Homepage: http://www.mustafabaydogan.com/learned-pattern-similarity-lps.html.

Breiman, L. (2001), Random Forests, Machine Learning 45(1), 5-32.

See Also

predict.learnPattern, computeSimilarity, tunelearnPattern

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
data(GunPoint)
set.seed(71)

## Learn patterns on GunPoint training series with default parameters
ensemble=learnPattern(GunPoint$trainseries)
print(ensemble)

## Find the similarity between test and training series based on the learned model
similarity=computeSimilarity(ensemble,GunPoint$testseries,GunPoint$trainseries)

## Find the index of 1 nearest neighbor (1NN) training series for each test series
NearestNeighbor=apply(similarity,1,which.min)

## Predicted class for each test series
predicted=GunPoint$trainclass[NearestNeighbor]

## Compute the percentage of accurate predictions
accuracy=sum(predicted==GunPoint$testclass)/nrow(GunPoint$testseries)
print(100*accuracy)

## Learn patterns randomly on GunPoint training series with default parameters
ensemble=learnPattern(GunPoint$trainseries, random.split=1)

## Find the similarity between test and training series and classify test series
similarity=computeSimilarity(ensemble,GunPoint$testseries,GunPoint$trainseries)
NearestNeighbor=apply(similarity,1,which.min)
predicted=GunPoint$trainclass[NearestNeighbor]
accuracy=sum(predicted==GunPoint$testclass)/nrow(GunPoint$testseries)
print(100*accuracy)

## Learn patterns by training each tree on a random subsample
## and classify test time series
ensemble=learnPattern(GunPoint$trainseries,replace=TRUE)
similarity=computeSimilarity(ensemble,GunPoint$testseries,GunPoint$trainseries)
NearestNeighbor=apply(similarity,1,which.min)
predicted=GunPoint$trainclass[NearestNeighbor]
print(predicted)

## Learn patterns and do predictions on OOB time series
ensemble=learnPattern(GunPoint$trainseries,replace=TRUE,target.diff=FALSE,oob.pred=TRUE)
## Plot first series and its OOB approximation
plot(GunPoint$trainseries[1,],xlab='Time',ylab='Observation',
	type='l',lty=1,lwd=2)
points(c(1:ncol(GunPoint$trainseries)),ensemble$oobpredictions[1,],
	type='l',col=2,lty=2,lwd=2)
legend('topleft',c('Original series','Approximation'),
	col=c(1,2),lty=c(1,2),lwd=2)

LPStimeSeries documentation built on May 2, 2019, 8:25 a.m.