options(rmarkdown.html_vignette.check_title = FALSE)

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = ">",
  fig.align = "center"
)
library(car)
library(PlaneGeometry)
library(stats)
library(rotateS21)

The Goal

Automated welding machines are used for the assembly of automobile driveshafts. To make sure that a weld can be considered of good quality, the weld needs to be controlled within specific operating limits on 4 variables. The values of each variable were measured every 5 seconds until 40 observations were collected. The goal is to use this data to assess whether the variables are in control. We want to check using multivariate, bivariate, and univariate tests.

welds <- read.table("../inst/df/T5-9.dat")
names(welds) <- c("Voltage", "Current", "Feed_Speed", "Gas_Flow")
head(welds)

Normality

We want to check the data for normality. The book suggests a log transformation for one of the variables. We can assess normality both before and after the transformation.

Normality of Original Data

The first way to test for normality is to check univariate normality for all of the variables. This can be done using a shapiro wilks test. For this test the null hypothesis is normality, so a significant p-value indicates deviation from normality. The name of each varaible along with its shapiro test statistic and p-value are printed below.

n1 <- apply(welds, 2, FUN=shapiro.test)


for(i in 1:4){
  lab <- names(n1)
  print( paste( lab[i], n1[[i]]$statistic, n1[[i]]$p.value, sep=" "))
}

We can see that the only p-value which is significant at a level of $\alpha=0.05$ is Gas_Flow. This tells us that this variable in not univariate normal. However, we cannot yet say whether the data is multivariate normal.

Next we can check for bivariate normality. We can visually assess bivariate normality by making an ellipse for each pair of variables. To form the ellipse we turn to the covariance matrix. The length of the major axis is the square root of the first eigenvalue of the covariance matrix and the direction is determined by the associated eigenvector. Similarly, the length of the mainor axis is the square root of the second eigenvalue of the covariance matrix and the direction is determined by the associated eigenvector.

gridExtra::grid.arrange(
                          ellipseScatter(welds[,c(1,2)]),
                          ellipseScatter(welds[,c(1,3)]),   
                          ellipseScatter(welds[,c(1,4)]),
                          ellipseScatter(welds[,c(2,3)]),
                          ellipseScatter(welds[,c(2,4)]),
                          ellipseScatter(welds[,c(3,4)]),
                          nrow=2
  )

The bivariate normality does not seem suspicious for any pairs of variables.

Finally, we can assess multivariate normality using a chi-square QQ-plot. We plot the ordered distance squared values for each observation on the y axis. The distance is calculated for each point as

$$ d^2 = (\textbf{x}_j-\bar{\textbf{x}})'\textbf{S}^{-1}(\textbf{x}_j-\bar{\textbf{x}}) $$ Notice that this measurement is a scalar because the vector $(\textbf{x}_j-\bar{\textbf{x}})$ is a $4 \times 1$ vector that comes from centering each of the variables of the observation $j$.

On the x axis we have the chi square quantiles of (j-0.5)/n for each j. The quantiles are calculated with the chi square distribution on p degrees of freedom where p is the number of columns in the data frame. If the data is distributed multivariate normal then we would expect our chi plot to follow a straight line at a 45 degree angle.

chiPlot(welds)

The tail of the chi square plot does not look linear. This indicates a deviation from normality. Because of this, it is a good idea to try to transform the data. As the book suggests, we can use a log transformation on the gas flow variable since it was the only univariate non-normal variable. In the next section we perform the same multivariate normality tests on the transformed data.

Normality of Transformed Data

We can take the log transformation of the gas flow variable.

weldT <- welds
weldT$Gas_Flow <- log(welds$Gas_Flow)

head(weldT)

Now we can perform the same univariate normality checks.

n2 <- apply(weldT, 2, FUN=shapiro.test)


for(i in 1:4){
  lab <- names(n2)
  print( paste( lab[i], n2[[i]]$statistic, n2[[i]]$p.value, sep=" "))
}

As we can see, the test statistic and p-values of the 3 unchanged variables were unaffected by the transformation. The gas flow variable is still not normally distributed even with the new test statistic. This is an indication that a natural log transform may not be the best transformation. Examples of other transforms to try might be square root, box-cox, or arcsine.

Again we can look at the bivariate spread as well with the ellipses.

gridExtra::grid.arrange(
                          ellipseScatter(weldT[,c(1,2)]),
                          ellipseScatter(weldT[,c(1,3)]),   
                          ellipseScatter(weldT[,c(1,4)]),
                          ellipseScatter(weldT[,c(2,3)]),
                          ellipseScatter(weldT[,c(2,4)]),
                          ellipseScatter(weldT[,c(3,4)]),
                          nrow=2
  )

The ellipses do not appear much different than they did with the untransformed data. However, log transformations are non linear, so the correlation between the variables has changed.

Finally, we can assess multivariate normality using the chi qq plot.

chiPlot(weldT)

Again the chi qq plot is not visibly significantly different than the untransformed qq plot. This affirms the conclusion that the natural log transformation was not a good choice of transformations. Because the transformation is not useful, the rest of the analysis can be done with either the transformed or untransformed data.

Tsq Plot

A $T^2$ chart is a multivariate tool to assess whether observations are in control. Similar to the chi square qq plot, the values of the Tsq chart are calculated using the $d^2$ formula.

$$ T^2=d^2 = (\textbf{x}_j-\bar{\textbf{x}})'\textbf{S}^{-1}(\textbf{x}_j-\bar{\textbf{x}}) $$

This value is found for every observation and then plotted against a time series. Because we are plotting the squared distance, the lower control limit is 0. The upper control limit is the chi square quantile of the desired alpha value. Usually alpha is set at .05 or .01 to get 95% of 99% control limits.

$$ UCL = \chi_p^2(\alpha) $$

We can create this chart for our welder data because the welder data has an inherent time series ordering. Recall that observations were recorded every 5 seconds. The function tsq() will create the $T^2$ plot.

We can choose to look at the 99% or 95% upper control limit on the $T^2$ chart.

99%

tsq(weldT, chi=.01)

Here it appears that all observations are in control.

95%

tsq(weldT)

Here we can see that the 31st observation is out of the control limit.

Quality Control Ellipse

We might like to see how the 31st observation looks on a bivariate scatter plot of all different pairs of variables. One way that we can do this is with the quality control ellipse function confEllipse(). Calling confEllipse() will show a quality control ellipse based on the confidence level set in the arguments. The function can also plot a specific point to show whether that point is inside the ellipse or not. We can plot the bivariate distribution and see whether observation 31 falls outside any of the quality control ellipses.

First we must see the values of observation 31 that we need to test.

weldT[31,]

Now we can look at all of the quality control ellipses. The quality control ellipses shown below are at 95%.

gridExtra::grid.arrange(
                          confEllipse(weldT[,c(1,2)], c(21.9, 273), .95),
                          confEllipse(weldT[,c(1,3)], c(21.9, 288.7), .95),   
                          confEllipse(weldT[,c(1,4)], c(21.9, 4.012), .95),
                          confEllipse(weldT[,c(2,3)], c(273, 288.7), .95),
                          confEllipse(weldT[,c(2,4)], c(273, 4.012), .95),
                          confEllipse(weldT[,c(3,4)], c(288.7, 4.012), .95),
                          nrow=2
  )

We can see that observation 31, the observation in red in each graph, is out of control in all of the quality ellipses that plot gas flow. There is one other observation that is out of control in on the voltage axis at the quality control level of 95%. This point might also indicate a problem.

Univariate X-bar Charts

We turn to our final assessment, the univariate x-bar charts. This type of chart is similar to the $T^2$ chart, but it plots only a single variable along a time series. We set the upper and lower control limits at 3 standard deviations away from the mean.

$$ UCL = \bar{x}+3\sigma $$ $$ LCL = \bar{x}-3\sigma $$

The uniControl() function will loop through the columns of a data frame and produce all of the univariate x-bar charts.

uniControl(weldT)

We can see, as expected, that observation 31 was out of control with respect to the gas flow variable. It was not out of control with respect to any other variable. There are no observations that are out of control with respect to any other variables. We can see one observation that has a very low voltage value. It is not quite at the lower control limit, but it is close. This is likely the point outside the quality control ellipse in the bivariate charts that include voltage.

Overall, what we observe in the the univariate charts aligns with what we observe in the multivariate and univariate control observations.

Conclusion

The conclusions that we draw are limited because of the non-normal distribution of the gas-flow variable. The natural log transformation did not sufficiently normalize the individual variable or improve the metrics for multivariate normality.

Comparing the multivariate and univariate quality control charts shows that it is important to consider both. The multivariate $T^2$ chart at the 99% level suggests that all observations are in control. The univariate charts however show that there was a significant deviation in gas-flow at observation 31. It is important to consider both the individual issues with gas-flow as well as the broader consequences of the lapse of control on the product as a whole.



s-huebler/rotateS21 documentation built on Dec. 22, 2021, 8:21 p.m.