In industRial: Data, Functions and Support Materials from the Book "industRial Data Science"

source(here::here("inst/book/setup.R"))

General designs

General factorial designs require teams to put together a wealth of knowledge of which some has been already applied in previous case studies or is referred in the bibliography and glossary. This comprises things like root cause analysis, linear models and analysis of variance all coherently articulated in a well though project with clear objectives. The building blocks discussed so far relate to a limited number of input factors and levels and exclusively a single output variable. Model validation and interactions are needed tools in all cases and once these are all mastered it becomes possible to consider situations with many variables, many outputs and higher level interactions. These arrangements become extremely powerful and allow to handle complex real life situations such as the design of a new system with dozens of features that relate with each other or the optimization of a manufacturing process where the amount of data generated is very large but the testing time and cost are very high. At this moment considerations of trial quantities optimization enter at play.

In our case studies a run represents a unique combination of the factors and a replicate an independent repetition of a run. This leads to the notion of trials corresponding to the multiplication of the number of runs by the number of replicates. For small designs it is possible to calculate the number of trials by simply multiplying the number of levels of each factor. If we have for instance 1 factor with 3 levels and 2 factors with 4 levels then we have $3 \times 4 \times 4 = 48$ runs which corresponds to the number of distinct combinations. Then we have to multiply by the number of replicates per run to get the number of trials, e.g $48 \times 3 = 144$ trials.

For a very high number of factors and levels where this way of calculating may not be practical the total number of trials is given by $l^k$ where $l$ is the number of levels and $k$ the number of input factors. With this approach a design with 4 factors of 2 levels gives $2^4 = 2 \times 2 \times 2 \times 2 = 16$ runs and if each has 5 replicates there are $16 \times 5 = 80$ trials to be executed. If more factors with a different number of levels are added, the total number of trials is calculated by multiplying both groups: $l_{1}^{k_{1}}$ $\times$ $l_{2}^{k_{2}}$. Continuing the previous example, if 3 additional factors with 4 levels each were added, all with 5 replicates, the total number of trials would be expressed as follows: $2^{4} \times 4^{3} \times 5 = 5120$ trials, which is a very high number in most industrial cases and would require optimization techniques which will be discussed in later units.

Factorial design {#fac.design}

Case study: juice bottling plant In a juice producing plant a new fast dry matter content measurement device from the supplier DRX has been recently put in operation but the Quality Assurance Head has raised concerns on a bias with the reference method. The quality team established DoE to assess several potential causes such as the dry matter content and juice residue particle size. wzxhzdk:1

library(DoE.base)

juice_doe <- fac.design(
  randomize = FALSE,
  factor.names = list(
    product = c("beetroot", "apple", "carrot"), 
    drymatter_target = c(10, 15, 20),
    part = c(1, 2, 3),
    speed = c(20, 25),
    particle_size = c(250, 300))
)

Although the Calibration essay discussed in the MSA unit has shown a bias during the acceptance phase the Factory Management has opted to put it in production. The reduction in measurement time is significant and Supply Chain is putting pressure to increase volumes in a context where on-line sales rocket sky high. The Quality Manager understands all this but he's concern of having some kind of kickback. Although he's not expecting any kind of consumer complain, dry matter levels are somehow loosely related with some sort claims and regulatory limits.

To dig deeper and understand how to minimize this bias he has asked one of his team members to come up with an experiment design. He would like something that combines all factors mentionned by the team as potential root causes for the bias. After a short brainstorming between the production and quality teams several potential causes for bias were: drymatter level, the speed of filling and the powder particle size. This lead to a mid size experiment design with three products, three levels of drymatter, two line speed levels and two particle size levels.

When the number of factors and levels is limited it is possible to reccur to existing experiment designs and pre-filled Yates tables. In this case the quality analyst had been trying with R and found a package called {DoE.base} which generates such designs automatically with the function fac.desig. The output generated by this function is more than just a tibble, it belongs to a specific class called designand has other attributes just like an lm or aov S3 objects. The care given by the package authors becomes visible when using an R generic function such as summary() with such object and get as return a tailor made output, in this case showing the levels of the different factors of our design:

class(juice_doe)
summary(juice_doe)

In the summary() output we can see the plan factors with 3 products, 3 levels of dry matter target, 2 levels for speed and 2 levels for particle size. Using this the team has simple copied the experiment plan to an spreadsheet to collect the data:

juice_doe %>% 
  write_clip()

and after a few days the file completed with the test results cames back ready for analysis

juice_drymatter %>%
  head() %>%
  kable()

and the only thing quality analyst had to add was an extra column to calculate the bias:

juice_drymatter <- juice_drymatter %>%
  mutate(bias = drymatter_DRX - drymatter_REF)

Main effects plots {#main_effects}

wzxhzdk:8

drymatter_TGT_plot <- juice_drymatter %>%
  group_by(drymatter_TGT) %>%
  summarise(bias_m_drymatter = mean(bias)) %>%
  ggplot(aes(x = drymatter_TGT, y = bias_m_drymatter)) +
  geom_point() +
  geom_line() +
  coord_cartesian(
    xlim = c(9,21),
    ylim = c(-1,0), expand = TRUE) +
  labs(
    title = "Juice bottling problem",
    subtitle = "Main effects plots",
    x = "Drymatter TGT [%]",
    y = "Average bias [g]"
  )

As the number of factors and levels of a design increase, more thinking is required to obtain good visualisation of the data. Main effects plots consist usually of a scatterplot representing the experiment output as a function of one of the inputs. This first plot consists of the mean bias as a function of the dry matter for each of the 3 levels tested. As the DOE has 3 factors, three plots like this are required in total. The Quality Analyst builds the remaining two plots and then groups them all together in a single output with the package {patchwork}. This is made by simply putting + between the plots.

particle_size_plot <- juice_drymatter %>%  
  group_by(particle_size) %>%
  summarise(particle_size_bias_mean = mean(bias)) %>%
  ggplot(aes(x = particle_size, y = particle_size_bias_mean)) +
  geom_point() +
  geom_line() +
  coord_cartesian(
    xlim = c(240,310), 
    ylim = c(-1,0), expand = TRUE) +
  labs(
    x = "Particle Size",
    y = "Average bias [g]"
  )

speed_plot <- juice_drymatter %>%  
  group_by(speed) %>%
  summarise(speed_bias_mean = mean(bias)) %>%
  ggplot(aes(x = speed, y = speed_bias_mean)) +
  geom_point() +
  geom_line() +
  coord_cartesian(
    xlim = c(19, 26),
    ylim = c(-1,0), expand = TRUE) +
  labs(
    x = "Speed",
    y = "Average bias [g]"
  )

drymatter_TGT_plot + particle_size_plot + speed_plot

Main effects plots give important insights in to the experiement outcomes and this even before any statistical analysis with a linear model and anova. From these three plots the Quality Analyst already takes the following observations for her report:

bias increases in negative direction as dry matter content increases
bias increases in positive direction as particle size increases
bias is not influence by line speed

Interaction plots (custom) {#interaction_plot}

drymatter_TGT_particle_size_plot <- juice_drymatter %>%  
  mutate(particle_size = as_factor(particle_size)) %>%
  group_by(drymatter_TGT, particle_size) %>%
  summarise(drymatter_bias_mean = mean(bias), drymatter_bias_sd = sd(bias)) %>%
  ggplot(aes(x = drymatter_TGT, y = drymatter_bias_mean, color = particle_size, linetype = particle_size)) +
  geom_point(aes(group = particle_size), size = 2) +
  geom_line(aes(group = particle_size, linetype = particle_size)) +
  scale_linetype(guide=FALSE) +
  geom_errorbar(aes(
    ymin = drymatter_bias_mean - 2 * drymatter_bias_sd,
    ymax = drymatter_bias_mean + 2 * drymatter_bias_sd,
    width = .5
  )) +
  scale_color_viridis_d(option = "C", begin = 0.3, end = 0.7, name = "Particle size") +
  coord_cartesian(
    xlim = c(9,21),
    ylim = c(-1,0), expand = TRUE) +
  annotate(geom = "text", x = Inf, y = 0, label = "Error bars are +/- 2xSD", 
    hjust = 1, vjust = -1, colour = "grey30", size = 3, 
    fontface = "italic") +
  labs(
    title = "Juice bottling problem",
    subtitle = "Interaction plots",
    x = "Drymatter target",
    y = "Average bias deviation [g]"
  ) +
  theme(legend.justification=c(1,0), legend.position=c(1,0))

Now to look deeper and she's preparing interaction plots. She wants to understand if factors combine in unexpected ways at certain levels. In designs like these with 3 factors we have 3 possible interactions (A-B, A-C, B-C) corresponding the the possible combination between them. It is important to keep in mind that at least two replicates by run are needed to be able determine the sum of squares due to error, this if all possible interactions are to be included in the model. As the plan is a full factorial plan and there are more than 2 replicates, all factor combinations are resolved and can be assessed for their significance. The interaction plots show precisely such combinations, two at a time against the output. The first one Dry matter target - Particle Size being ready she moves to the next two: Dry matter target - Speed and Speed - Particle Size.

drymatter_TGT_speed_plot <- juice_drymatter %>%  
  mutate(speed = as_factor(speed)) %>%
  group_by(drymatter_TGT, speed) %>%
  summarise(drymatter_bias_mean = mean(bias), drymatter_bias_sd = sd(bias)) %>%
  ggplot(aes(x = drymatter_TGT, y = drymatter_bias_mean, color = speed)) +
  geom_point(aes(group = speed), size = 2) +
  geom_line(aes(group = speed, linetype = speed)) +
  scale_linetype( guide=FALSE) +
  scale_color_viridis_d(option = "C", begin = 0.3, end = 0.7, name = "Speed") +
  geom_errorbar(aes(
    ymin = drymatter_bias_mean - 2 * drymatter_bias_sd,
    ymax = drymatter_bias_mean + 2 * drymatter_bias_sd,
    width = .5
  )) +
  coord_cartesian(
    xlim = c(9, 21),
    ylim = c(-1,0), expand = TRUE) +
  annotate(geom = "text", x = Inf, y = 0, label = "Error bars are +/- 2xSD", 
    hjust = 1, vjust = -1, colour = "grey30", size = 3, 
    fontface = "italic") +
  labs(
    x = "Dry matter target",
    y = "Average bias deviation [g]"
  ) +
  theme(legend.justification=c(1,0), legend.position=c(1,0))

speed_particle_size_plot <- juice_drymatter %>%  
  mutate(particle_size = as_factor(particle_size)) %>%
  group_by(speed, particle_size) %>%
  summarise(drymatter_bias_mean = mean(bias), drymatter_bias_sd = sd(bias)) %>%
  ggplot(aes(x = speed, y = drymatter_bias_mean, color = particle_size)) +
  geom_point(aes(group = particle_size), size = 2) +
  geom_line(aes(group = particle_size, linetype = particle_size)) +
  scale_linetype(guide=FALSE) +
  scale_color_viridis_d(option = "C", begin = 0.3, end = 0.7, name = "Particle size") +
  geom_errorbar(aes(
    ymin = drymatter_bias_mean - 2 * drymatter_bias_sd,
    ymax = drymatter_bias_mean + 2 * drymatter_bias_sd,
    width = .5
  )) +
  coord_cartesian(
    xlim = c(19, 26),
    ylim = c(-1,0), expand = TRUE) +
  annotate(geom = "text", x = Inf, y = 0, label = "Error bars are +/- 2xSD", 
    hjust = 1, vjust = -1, colour = "grey30", size = 3, 
    fontface = "italic") +
  labs(
    x = "Speed",
    y = "Average bias deviation [g]"
  ) +
  theme(legend.justification=c(1,0), legend.position=c(1,0))

drymatter_TGT_particle_size_plot + drymatter_TGT_speed_plot + speed_particle_size_plot

The approach here goes much beyond the interaction.plot function from the {stats} package introduced before and the code to obtain this plots is significantly longer. She has chosen to develop here the plots with {ggplot2} because she wanted to have direct access to all the minor details in the contruction of the plot such as the colors by line, a custom error bars calculation, very specific locations for the legends. She ends up concluding that there is no interaction between any of the different factors as all lines do not intercept, are mostly parallel and error bars cover each other.

Formula expansion {#formula_expansion}

f1 <- Y ~ A * B * C
f2 <- Y ~ A * B + C

expand_formula(f1)
expand_formula(f2)

The short code chunk before shows two formula expansion examples, the first one corresponding to our Juice DOE: in a design with factors coded A, B and C the sources of variation for the Anova table for three-factor fixed effects model are: A, B, C, AB, AC, BC, ABC. The second case corresponds to a situation where interactions with C would be discarded. Understanding these syntax details is very important because as more and more factors are added to models, the number of trials grows to unrealistic quantities. In such situations a preliminary work consisting in the selection of specific interactions enables the creation a fractional design. For now the juice doe is still small with 108 trials so she can move ahead assessing the effect significance of the different factors using the anova.

Anova with 3rd level interactions {#anova_three}

juice_drymatter_aov <- aov(
  bias ~ drymatter_TGT * speed * particle_size,
  data = juice_drymatter)
summary(juice_drymatter_aov)

Here she simplified things by inputing the formula directly in the aov function without passing by the formula() or lm() functions. The previous observations done on the plots are fully confirmed and now supported with statistical evidence: drymatter target and particle_size significantly affect the bias (p < 0.05); speed has no effect; none of the interactions is significative. This first round of assessment was very clear and successful and she can make bold proposals to the Quality Manager now to look deeper into the links between drymatter target and particle size in that bias. Certainly passing by a discussion again with the product team and a final DOE with more levels to identify or select an optimal operating zone for the measurement method.

Any scripts or data that you put into this service are public.

industRial documentation built on June 11, 2021, 5:10 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

industRial
Data, Functions and Support Materials from the Book "industRial Data Science"

In industRial: Data, Functions and Support Materials from the Book "industRial Data Science"

General designs

Factorial design {#fac.design}

Main effects plots {#main_effects}

Interaction plots (custom) {#interaction_plot}

Formula expansion {#formula_expansion}

Anova with 3rd level interactions {#anova_three}

Try the industRial package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

industRial Data, Functions and Support Materials from the Book "industRial Data Science"

In industRial: Data, Functions and Support Materials from the Book "industRial Data Science"

General designs

Factorial design {#fac.design}

Main effects plots {#main_effects}

Interaction plots (custom) {#interaction_plot}

Formula expansion {#formula_expansion}

Anova with 3rd level interactions {#anova_three}

Try the industRial package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

industRial
Data, Functions and Support Materials from the Book "industRial Data Science"