knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(MASS)
Statistical process control (SPC) and multivariate statistical process control (MSPC) use statistical technique to evaluate or discover the abnormal issue and gives a signal as soon as possible. They are widely used in the manufacture industry, environmental monitoring and so on. The early SPC methods assume that observations at one time point or different time points are independent over time and the data set follows a parametric distribution (e.g., Gaussian) when the process is in-control. In the literature, it has been well demonstrated that control charts would be unreliable to use if they ignore serial data correlation in the observed data.
Therefore, it is critically important to develop nonparametric SPC or MSPC methods would be reliable to use when process observations are serially correlated and process distribution is nonparametric. The SPCmonitor
package applied a general univariate and multivariate CUSUM chart for online process monitoring, which can accommodate stationary serial data correlation without imposing any parametric form on the IC process distribution.
The package contains 12 functions:
Vcor
- compute the variance of a univariate data set x and the covariance or correlation between x(i) and x(i+b).Vcormatrix
- compute the variance of a univariate data set x and the Covariance Matrix for a univariate data xupdate_mean
- update the mean of a dataset when we have a new data by recursive formulaupdate_cov
- update the variance and covariance $\gamma(q)$ = Cov($X_i$,$X_{i+q}$) of a dataset when we have a new datadecor
- decorrelate a univariate Data Set online_monitor
- monitor the new univariate data set to detect if there is any mean shift occurs, and give a signal as soon as any mean shift occurs.MVcor
- compute the Covariance of a multivariate data set x and the covariance or correlation between $x_i$ and $x_{i+b}$ where x is a multivariate data setMVcormatrix
- compute the variance Matrix of a multivariate data set x and the covariance or correlation Matrix between $x_i$ and $x_{i+b}$Nhalf_power
- compute the negative half power of a square matrix x based on eigenvectors and eigenvalues.decor_multi
- decorrelate a Multivariate Data Set multi_online_monitor
- monitor the new multivariate data set to detect if there is any mean shift occurs, and give a signal as soon as any mean shift occurs.SPCmonitor
was designed to monitor univariate and multivariate processes which can accommodate stationary serial data correlation without imposing any parametric form on the IC process distribution.
library(SPCmonitor)
Vcov computes the variance of a univariate data set x and the covariance or correlation between $X_i$ and $X_{i+q}$ where $X_i$ and $X_{i+q}$ are process observations obtained at times i and i + q where q = 0,1,2.....bmax. The first element in the output $\gamma(0)$ will just be the variance of the dataset. The default value of bmax is 10, which means the output will be [$\gamma(0)$,$\gamma(1)$,.....$\gamma(10)$] where $\gamma(q)$ = Cov($X_i$,$X_{i+q}$), and variance/covariance {$\gamma(s)$,0 ≤ s ≤ bmax} are estimated from the data as follows $$ {\hat{\gamma}}^{\left(0\right)}(s)=\frac{1}{m_0-s}\sum_{i=-m_0+1}^{-s}{(X_{i+s}-}{\hat{\mu}}^{\left(0\right)})(X_i-{\hat{\mu}}^{\left(0\right)}),\ for\ s=0,1,\ldots,b_{max} $$ where ${\hat{\mu}}^{(0)}=\frac{1}{m_0}\sum_{i=-m_0+1}^{0}X_i$
Below is an example of what Vcov( ) do. The source code is:
x <- numeric(200) for (i in 2:200){ x[i] <- x[i-1] + rnorm(1,0,1) } Vcov(x,bmax = 20)
Vcovmatrix computes the variance of a univariate data set x and the covariance matrix for a univariate data x. The output form will be $$ {\hat{\Sigma}}_{i,i}=\left(\begin{matrix}{\hat{\gamma}}^{\left(0\right)}(0)&\cdots&{\hat{\gamma}}^{\left(0\right)}(b)\\vdots&\ddots&\vdots\{\hat{\gamma}}^{\left(0\right)}(b)&\ldots&{\hat{\gamma}}^{\left(0\right)}(0)\\end{matrix}\right) $$
where [$\gamma(0)$,$\gamma(1)$,.....$\gamma(b)$] are computed by Vcov()
Below is an example of what Vcovmatrix( ) do. The source code is:
x <- numeric(200) for (i in 2:200){ x[i] <- 0.5*x[i-1] + rnorm(1,0,1) } Vcov(x,bmax = 9) Vcovmatrix(x,bmax = 9)
The function updates the mean of a dataset when we have a new data by recursive formula.
update_mean()
can be used to update the mean of a large data set without recalculating the whole data set to make compuation easier. Besides, the function can be used to update both univariate and multivariate data set.
The recursive formula is $$ {\hat{\mu}}^{(n)}=\frac{1}{m_0+n}X_n+\frac{m_0+n-1}{m_0+n}{\hat{\mu}}^{(n-1)} $$ Below are examples of what update_mean( ) do for univariate and multivariate data set.
When the dataset is univaraite
#calculate the mean of new whole data set mean(c(1:101)) #update the mean by the recursive formula update_mean(mean(c(1:100)),100,101)
When the dataset is multivaraite
#calculate the mean of new whole data set oldx <- matrix(c(1:9),nrow = 3) old_mean <- rowMeans(oldx) rowMeans(matrix(c(1:12),nrow = 3)) #update the mean by the recursive formula newx <- c(10:12) update_mean(old_mean,ncol(oldx),newx)
The function updates the variance and covariance $\gamma(q)$ = Cov($X_i$,$X_{i+q}$) of a dataset when we have a new data
update_cov()
can be used to update the variance and covariance of a large data set without recalculating the whole data set to make compuation easier by a recursive formula. Besides, the function can be used to update both univariate and multivariate data set.
The recursive formula is $$ {\hat{\gamma}}^{\left(n\right)}\left(s\right)=\frac{1}{m_0+n-s}\left(X_n-{\hat{\mu}}^{\left(n\right)}\right)\left(X_{n-s}-{\hat{\mu}}^{\left(n\right)}\right)+\frac{m_0+n-s-1}{m_0+n-s}{\hat{\gamma}}^{\left(n-1\right)}\left(s\right)+\frac{m_0+n-s-1}{m_0+n-s}{({\hat{\mu}}^{\left(n-1\right)}-{\hat{\mu}}^{\left(n\right)})}^2+\frac{1}{m_0+n-s}\left({\hat{\mu}}^{\left(n-1\right)}-{\hat{\mu}}^{\left(n\right)}\right)\left(2s{\hat{\mu}}^{\left(n-1\right)}-\sum_{i=-s+n}^{n-1}X_i-\sum_{i=1-m_0}^{s-m_0}X_i\right) $$ Below are examples of what update_cov( ) do for univariate and multivariate data set.
When the dataset is univaraite
#calculate the covariance of new whole data set by Vcov() Vcov(c(1:101),bmax = 10) #calculate the covariance by recursive update_cov(c(1:100),mean(c(1:100)),Vcov(c(1:100)),101)
When the dataset is multivaraite
#calculate the covariance of new whole data set by Vcovmatrix() MVcov(matrix(c(1:12),nrow = 3),bmax = 2)$cov_vector oldx <- matrix(c(1:9),nrow = 3) old_mean <- rowMeans(oldx) old_cov <- MVcov(matrix(c(1:9),nrow = 3),bmax = 2)$cov_vector #calculate the covariance by recursive newx <- c(10:12) update_cov(oldx, old_mean, old_cov, newx)
The function decorrelates a Data Set so that the new data will have no correlation or very small correlations
It is assumed that the data observations are covariance stationary and the serial correlation exists only when two observations are within bmax > 0 in their observation indices. More specifically, it is assumed that $\gamma(q)$ = Cov($X_i$,$X_{i+q}$) only depends on q when i changes, and $\gamma(q)$ = 0 when q > bmax, where Xi and Xi+q are two process observations obtained at times i and i + q. In practice, the autocorrelation between $X_i$ and $X_{i+q}$ usually decays when q increases. In such cases, $\gamma(q)$ is small when q is large, and thus a proper value of bmax can be chosen such that $\gamma(q)$ ≈ 0 when q > bmax. The default value of bmax is 10, which indicates that the assumption is $\gamma(q)$ = Cov($X_i$,$X_{i+q}$) = 0 when q is greater than 10.
The new ith decorrelated and standardized observation is defined to be
$$ X_i^\ast=\frac{X_i-{\hat{\mu}}^{\left(0\right)}-{\hat{\sigma}}{i-1}^T{\hat{\Sigma}}{i-1,i-1}^{-1}e_{i-1}}{{\hat{d}}_i} $$
where ${\widehat{d^2}}i={\hat{\gamma}}^{\left(0\right)}\left(0\right)-{\hat{\sigma}}{i-1}^T{\hat{\Sigma}}{i-1,i-1}^{-1}{\hat{\sigma}}{i-1}$ and $e_{i-1}={(X_{i-b}-{\hat{\mu}}^{\left(0\right)},\ldots,X_{i-1}-{\hat{\mu}}^{\left(0\right)})}^T$
Below are examples of what decov( ) do.
set.seed(100) x <- numeric(200) for (i in 2:200){ set.seed(i) x[i] <- x[i-1] + rnorm(1,0,1) } Vcov(x,bmax = 30) newx <- decor(x, bmax = 30) Vcov(newx,bmax = 30)
From the output in this example, we can see that the correlation is large before decorrelation, but after decorrelation, the $\gamma(0)$ ≈ 1, and $\gamma(i)$ ≈ 0 for i > 0
online_monitor()
estimats the correlation structure nonparametrically from an IC dataset, and decorrelating the original process observations. After data decorrelation, a univariate nonparametric CUSUM chart based on data categorization was applied to monitor the new data set to detect if there is any mean shift occurs, and give us a signal as soon as possible.
Traditional statistical process control charts are based on the assumptions that process observations are independent and identically normally distributed when the related process is in-control (IC). online_monitor()
applied a general charting scheme for monitoring serially correlated process observations with short-memory data dependence and unknown process distributions. The method focus on Phase II online monitoring of process observations $X_{1}$,$X_{2}$,...,$X_{n}$, where n ≥ 1 is the current time point during process monitoring. The IC process distribution is assumed to be unknown, and the process observations are serially correlated. The output of the function online_monitor()
is the index of the obsevation in the new data that gives a out of control signal. For example, if the output is 14, the result indicates that at $X_{14}$, the function detects a mean shift.
Below are examples of online_monitor()
can do.
In the example 1, x1 follows the AR(1) model $x_{1i}$ = $x_{1i-1}$ + $\epsilon$ where $x_{1i}$ = 0 and ${ε_n}$ are i.i.d. random errors with the N(0,1) distribution. $x_{1i}$ and $x_{1j}$ are correlated. xx follows same model with x1.
The source code for example 1 is:
set.seed(10) x1 <- numeric(200) for (i in 2:200){ x1[i] <- 0.1*x1[i-1] + rnorm(1,0,1) } xx<- numeric(300) for (i in 2:300){ xx[i] <- 0.1*xx[i-1] + rnorm(1,0,1) } online_monitor(x1, xx, h = 25, k = 0.01, bmax = 20)
In the example 2, x1 follows the AR(1) model $x_{1i}$ = $x_{1i-1}$ + $\epsilon$ where $x_{1i}$ = 0 and ${ε_n}$ are i.i.d. random errors with the N(0,1) distribution. $x_{1i}$ and $x_{1j}$ are correlated. xx follows different model from x1, $x_{2i}$ = $x_{2i-1}$ + $\epsilon$ where $x_{1i}$ = 0 and ${ε_n}$ are i.i.d. random errors with the N(1,1) distribution. Therefore, in this example, we would expect control chart gives a signal as soon as possible.
The source code for example 2 is:
xx<- numeric(300) for (i in 2:300){ xx[i] <- 0.1*xx[i-1] + rnorm(1,1,1) } online_monitor(x1, xx, h = 25, k = 0.01, bmax = 20)
The function MVcov computes the covariance or correlation between $X_i$ and $X_{i+q}$ where $X_i$ and $X_{i+q}$ are multivariate process observations obtained at times i and i + q.
Covariance $\gamma(q)$ = Cov($X_i$,$X_{i+q}$), where X is a p-dimensinal vector and $\gamma(q)$ will be a p by p matrix. The default value of bmax is 10, which means the output will be [$\gamma(0)$,$\gamma(1)$,.....$\gamma(10)$] where $\gamma(q)$ =Cov($X_i$,$X_{i+q}$) a p by p matrix, and $\gamma(q)$ are estimated from the data as follows
$$ {\hat{\gamma}}^{\left(0\right)}(s)=\frac{1}{m_0-s}\sum_{i=-m_0+1}^{-s}{(X_{i+s}-}{\hat{\mu}}^{\left(0\right)})(X_i-{\hat{\mu}}^{\left(0\right)})^{T},\ for\ s=0,1,\ldots,b_{max} $$
set.seed(100) MVcov(matrix(rnorm(900,0,1),nrow = 3),bmax = 4)$cov_vector
MVcovmatrix( )
computes the Covariance Matrix of a multivariate data set x and the covariance or correlation matrix between $x_{i}$ and $x_{i+b}$. The output form will be
$$
{\hat{\Sigma}}_{i,i}=\left(\begin{matrix}{\hat{\gamma}}^{\left(0\right)}(0)&\cdots&{\hat{\gamma}}^{\left(0\right)}(b)\\vdots&\ddots&\vdots\{\hat{\gamma}}^{\left(0\right)}(b)&\ldots&{\hat{\gamma}}^{\left(0\right)}(0)\\end{matrix}\right)
$$
where [$\gamma(0)$,$\gamma(1)$,.....$\gamma(b)$] are computed by MVcov()
Below is an example of what MVcovmatrix( )
do. The source code is:
set.seed(100) MVcovmatrix(matrix(rnorm(900,0,1),nrow = 3),bmax = 4)
The function computes negative half power of a square matrix x based on eigenvectors and eigenvalues.
The Negative Half Matrix power is computed by eigenvectors and eigenvalues. $X^{-1/2}$ = U$Λ^{-1/2}$$U^{}$ where X is a hermitian matrix, 𝑈 is a unitary matrix with columns the eigenvectors of matrix X,$U^{}$ is the conjugate transpose of matrix 𝑈 and Λ is a diagonal matrix with entries the eigenvalues of matrix X. If the negative half power of input matrix does not exist, the result matrix might have NaN.
Below is an example of what Nhalf_power() do. The source code is:
Nhalf_power(matrix(c(1, 0.4, 0.4, 1), nrow = 2))
The function decorrelates a Multivariate Data Set so that the new Multivariate data will have no correlation or very small correlations
It is assumed that the data observations are covariance stationary and the serial correlation exists only when two observations are within bmax > 0 in their observation indices. More specifically, it is assumed that $\gamma(q)$ = Cov($X_i$,$X_{i+q}$) only depends on q when i changes, and $\gamma(q)$ = 0 when q > bmax, where Xi and Xi+q are two process observations obtained at times i and i + q. In practice, the autocorrelation between $X_i$ and $X_{i+q}$ usually decays when q increases. In such cases, $\gamma(q)$ is small when q is large, and thus a proper value of bmax can be chosen such that $\gamma(q)$ ≈ 0 when q > bmax. The default value of bmax is 10, which indicates that the assumption is $\gamma(q)$ = Cov($X_i$,$X_{i+q}$) = 0 when q is greater than 10.
The new ith decorrelated and standardized observation is defined to be
$$ X_i^\ast={{\hat{D}}i^{-0.5}}[{X_i-{\hat{\mu}}^{\left(0\right)}-{\hat{\sigma}}{i-1}^T{\hat{\Sigma}}{i-1,i-1}^{-1}e{i-1}}] $$
where ${\widehat{D}}i={\hat{\gamma}}^{\left(0\right)}\left(0\right)-{\hat{\sigma}}{i-1}^T{\hat{\Sigma}}{i-1,i-1}^{-1}{\hat{\sigma}}{i-1}$ and $e_{i-1}={((X_{i-b}-{\hat{\mu}}^{\left(0\right)})^{T},\ldots,(X_{i-1}-{\hat{\mu}}^{\left(0\right)})^{T}})^T$
Below are examples of what decov_multi( ) do.
sigmass <- matrix(c(10, 3, 3, 2), nrow = 2) set.seed(103) x <- matrix(numeric(400),2) for (i in 2:200){ x[,i] <- x[,(i-1)] + t(mvrnorm(1,rep(0, 2),sigmass)) } MVcov(x)$cov_vector newx <- decor_multi(x, bmax = 10) MVcov(newx)$cov_vector
From the output in this example, we can see that the correlation is large before decorrelation, but after decorrelation, only the diagonal of $\gamma(0)$ ≈ 1, and all elements in $\gamma(i)$ ≈ 0 for i > 0
The function estimats the correlation structure nonparametrically from an IC dataset, and decorrelating the original multivariate process observations. After data decorrelation, a multivariate nonparametric CUSUM chart based on data categorization was applied to monitor the new data set to detect if there is any mean shift occurs, and give us a signal as soon as possible.
Let the sequence of process observations under sequential monitoring be $X_n$=($X_{n1}$,$X_{n2}$,$\ldots$,$X_{np}$$)^T$.One assumption needed by the proposed method is that the serial data correlation in the observed data is stationary when the process is IC, which implies that the covariance matrix $\gamma(s)$ = Cov($X_i$,$X_{i+s}$), for any i, depends only on s in such cases. This assumption is often reasonable because it is believed that the IC process distribution, including the IC serial data correlation, does not change over time in many applications, including the production lines in manufacturing industries. Regarding the serial data correlation of an IC process, it is also assumed that there is an integer w ≥ 1 such that $\gamma(s)$ = 0 when s > w. This assumption implies that the correlation between two observations would disappear if the related observation times are far away, which should be (approximately) true in many applications. The IC process distribution is assumed to be unknown, and the process observations are multivariate and serially correlated. The output of the function multi_online_monitor() is the index of the obsevation in the new data that gives a out of control signal. For example, if the output is 19, the result indicates that at $X_{19}$, the function detects a mean shift.
Below are examples of online_monitor()
can do.
In the example 1, x1 follows the model $x_{1i}$ = $x_{1i-1}$ + $\epsilon$ where $x_{1i}$ = [0...0] and ${ε_n}$ are i.i.d. random errors with the Multinormal distribution. $x_{1i}$ and $x_{1j}$ are correlated. xx follows same model with x1.
The source code for example 1 is:
x <- matrix(rnorm(900,0,1),nrow = 3) sigma <- matrix(c(10, 3, 3, 2), nrow = 2) x <- matrix(numeric(400),2) set.seed(14) for (i in 2:200){ x[,i] <- x[,(i-1)] + t(mvrnorm(1,rep(0, 2),sigmass)) } xx <- matrix(numeric(400),2) for (i in 2:200){ xx[,i] <- xx[,(i-1)] + t(mvrnorm(1,rep(0, 2),sigmass)) } multi_online_monitor(x,xx,h = 30, k = 0.01, bmax = 10)
In the example 2, x1 follows the model $x_{1i}$ = $x_{1i-1}$ + $\epsilon$ where $x_{1i}$ = [0...0] and ${ε_n}$ are i.i.d. random errors with the Multinormal distribution. $x_{1i}$ and $x_{1j}$ are correlated. xx follows different model from x1, $x_{2i}$ = $x_{2i-1}$ + $\epsilon$ where $x_{1i}$ = 0 and ${ε_n}$ are i.i.d. random errors with the Multinormal distribution with mean = [2,....2]. Therefore, in this example, we would expect control chart gives a signal as soon as possible.
The source code for example 2 is:
x <- matrix(rnorm(900,0,1),nrow = 3) sigma <- matrix(c(10, 3, 3, 2), nrow = 2) x <- matrix(numeric(400),2) set.seed(14) for (i in 2:200){ x[,i] <- x[,(i-1)] + t(mvrnorm(1,rep(0, 2),sigma)) } xx <- matrix(numeric(400),2) for (i in 2:200){ xx[,i] <- xx[,(i-1)] + t(mvrnorm(1,rep(2, 2),sigma)) } multi_online_monitor(x,xx,h = 30, k = 0.01, bmax = 10)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.