mspTrain: Multi-State Adaptive-Dynamic Process Training

Description Usage Arguments Details Value See Also Examples

View source: R/mspTrain.R

Description

This function performs Multi-State Adaptive-Dynamic PCA on a data set with time-stamped observations.

Usage

1
2
mspTrain(data, labelVector, trainObs, updateFreq = ceiling(0.5 * trainObs),
  Dynamic = TRUE, lagsIncluded = c(0, 1), faultsToTriggerAlarm = 5, ...)

Arguments

data

An xts data matrix

labelVector

Class labels as a logical (two states only) or finite numeric (two or more states) vector or matrix column (not from data frame) with length equal to the number of rows in "data". For data with only one state, this will be a vector of 1s.

trainObs

The number of observations upon which to train the algorithm. This will be split based on class information by a priori class membership proportion.

updateFreq

The algorithm update frequency. Defaults to half as many observations as the training frequency.

Dynamic

Specify if the PCA algorithm should include lagged variables. Defaults to TRUE.

lagsIncluded

A vector of lags to include. If Dynamic = TRUE, specify which lags to include. Defaults to c(0, 1), signifying that the Dynamic process observations will include current observations and observations from one time step previous. See "Details" for more information.

faultsToTriggerAlarm

The number of sequential faults needed to trigger an alarm. Defaults to 5.

...

Lazy dots for additional internal arguments

Details

This function is designed to identify and sort out sequences of observations which fall outside normal operating conditions. We assume that the process data are time-dependent in both seasonal and non-stationary effects (which necessitate the Adaptive and Dynamic components, respectively). We further assume that this data is drawn from a multivariate process under multiple mutually exclusive states, implying that the linear dimension reduction projection matrices may be different for each state. Therefore, in summary, this function lags the features to account for correlation between sequential observations, splits the data by classes, and re-estimates projection matrices on a rolling window to account for seasonality. Further, this function uses non-parametric density estimation to calculate the 1 - alpha quantiles of the SPE and Hotelling's T2 statistics from a set of training observations, then flags any observation in the testing data set with process monitoring statistics beyond these calculated critical values. Becuase of natural variablity inherent in all real data, we do not remove observations simply because they are have been flagged as outside normal operating conditions. This function records an alarm only for observations having five flags in a row, as set by the default argument value of "faultsToTriggerAlarm". These alarm-positive observations are then removed from the data set and held in a separate xts matrix for inspection.

Concering the lagsIncluded variable: the argument lagsIncluded = c(0,1) will column concatenate the current data with the same data from one discrete time step back. This will necessarily remove the first row of the data matrix, as we will have NA values under the lagged features. The argument lagsIncluded = 0:2 will column concatenate the current observations with the observations from one step previous and the observations from two steps previous, which will necessarily require the removal of the first two rows of the data matrix. To include only certain lags with the current data, specify lagsIncluded = c(0, lag_1, lag_2, ... , lag_K). This induce NA values in the first max(lag_k) rows, for k = 1, ... , K, and these rows will be removed from consideration. From the lag.xts() function helpfile: "The primary motivation for having methods specific to xts was to make use of faster C-level code within xts. Additionally, it was decided that lag's default behavior should match the common time-series interpretation of that operator — specifically that a value at time ‘t’ should be the value at time ‘t-1’ for a positive lag. This is different than lag.zoo() as well as lag.ts()."

Of note when considering performance: the example has 10080 rows on three features alternating between three states, and trains on 20 percent of the observations, while updating every 1008 (10 percent) observation. On a 2016 Macbook Pro with 16Gb of RAM, this example function call takes 15 second to run. Increasing the update frequency will decrease computation time, but may increase false alarm rates or decrease flagging accuracy. We recommend that you set the update frequency based on the natural and physical designs of your system. For example, if your system has a multi-state process which switches across one of four states every two hours, then test the update frequency at an eight or 12 hour level — enough observations to measure two to three full cycles of the switching process. For observations recorded every five minutes, try updateFreq = (60 / 5) * 8 = 96 or (60 / 5) * 12 = 144.

This user-facing function calls the processMonitor() function, and returns the training arguments necessary to call the mspMonitor() and mspWarning() functions.

For more details, see Kazor et al (2016):

http://link.springer.com/article/10.1007/s00477-016-1246-2

Value

A list with the following components:

See Also

Calls: processMonitor. Pipe flow: mspTrain into mspMonitor into mspWarning.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
## Not run: # cut down on R CMD check time

  nrml <- mspProcessData(faults = "NOC")

  mspTrain(data = nrml[, -1],
         labelVector = nrml[, 1],
         trainObs = 4320)


## End(Not run)

mvMonitoring documentation built on Nov. 17, 2017, 6:31 a.m.