stream.SHC.behavioral: Statistical Hierarchical Clusterer - reference class,...

Description Usage Arguments Details Methods References Examples

Description

The Statistical Hierachical Clusterer reference class.

Usage

1
2
3
4
5
## S4 method for signature 'stream.SHC'
initialize(dimensions,
  aggloType=AgglomerationType$NormalAgglomeration,driftType=DriftType$NormalDrift,
  decaySpeed=10,sharedAgglomerationThreshold=1,recStats=FALSE,sigmaIndex=FALSE,
  sigmaIndexNeighborhood=3,sigmaIndexPrecisionSwitch=TRUE)

Arguments

dimensions

(integer) - A number of space dimensions.

aggloType

(list, AgglomerationType) - Agglomeration type: NormalAgglomeration,AggresiveAgglomeration,RelaxedAgglomeration.

decaySpeed

(integer) - Components decay speed. 0 = no decay, >0 = higher number represents slower decay.

driftType

(list, DriftType) - Drift type: NormalDrift,FastDrift,SlowDrift,NoDrift,UltraFastDrift.

sharedAgglomerationThreshold

(integer) - A number of data instances between components that cause their agglomeration under the same cluster.

recStats

(logical) - A flag that indicated whether the SH clusterer should return statistics about the number of components and outliers generated during values processing.

sigmaIndex

(logical) - A flag that indicates whether the SH clusterer should utilize the Sigma-index for speeing up the statistical space query.

sigmaIndexNeighborhood

(integer) - A multiplier for the statistical neighborhood used to manage Sigma-index. This multiplier is used in conjunction with SH clusterer statistical thershold theta - hereby determined by the aggloType parameter.

sigmaIndexPrecisionSwitch

(logical) - A flag that indicates whether the Sigma-index should maintain high precision over a speed.

Details

Instantiates an SHC object that represents an instance of the stream.SHC reference class.

Methods

process(data)

Initiates clustering for a suppled data frame (dataset). Returned data frame comprises macro and micro-level details, as well as outlier flags.

getComponentAndOutlierStatistics()

Returns the number of current components and outliers.

getTimes()

Returns the list of processing times during last clustering.

getNodeCounter()

Returns the number of statistical distance calculations when querying for statistical classification. Meaningful when utilizing sigma-index, to compare statistical neighborhood density for the processed dataset.

getComputationCostReduction()

Can be used only with sigma-index. Returns the ratio of the number of statistical distance calculations and the maximal number of statistical distance calculations for the sequential scan approach. The returned number tells how much statistical calculations was saved by utilizing simga-index for the processed dataset.

getHistogram()

Returns computation cost reduction histogram when utilizing sigma-index.

recheckOutlier(id, ...)

Can be used to re-check the outlier status for the supplied id. If the supplied id still represents an outlier, this method will return true.

getTrace(id)

Returns a list of current component identifiers that have trace back to the supplied component id. This method is used when we want to know new components to which are successors of the predecessor component that once had the supplied identifier. This method establishes direct temporal connection between the supplied component id and that returned list of current components.

clearEigenMPSupport()

Clears the OpenMP usage by the Eigen linear algebra package. Introduced only for the reproducibility purposes.

References

[1] Krleža D, Vrdoljak B, and Brčić M, Statistical hierarchical clustering algorithm for outlier detection in evolving data streams, Machine Learning, Sep. 2020

Examples

1
2
3
4
5
6
7
8
9
s <- stream.SHC(2,driftType=DriftType$NoDrift,decaySpeed=0,sigmaIndex=TRUE)
res <- s$process(data.frame(X=c(1,2,3,34,5,3,2,2,3,34,150),Y=c(3,4,2,1,6,7,4,5,6,3,150)))
res
s$getComponentAndOutlierStatistics()
s$recheckOutlier(res[11,"component_id"])
orig_id <- res[3,"component_id"]
trace_id <- s$getTrace(orig_id)
message(paste("Original id",orig_id,"traced to",trace_id))
s$getHistogram()

dkrleza/SHClus documentation built on Feb. 25, 2021, 10:30 p.m.