Description Usage Arguments Details Value Examples

View source: R/categorical.igate.R

This function performs an initial Guided Analysis for parameter testing and controlband extraction (iGATE) for a categorical target variable on a dataset and returns those parameters found to be influential.

1 2 | ```
categorical.igate(df, versus = 8, target, best.cat, worst.cat,
test = "w", ssv = NULL, outlier_removal_ssv = TRUE)
``` |

`df` |
Data frame to be analysed. |

`versus` |
How many Best of the Best and Worst of the Worst do we collect? By default, we will collect 8 of each. |

`target` |
Target variable to be analysed. Must be categorical.
Use |

`best.cat` |
The best category. The |

`worst.cat` |
The worst category. The |

`test` |
Statistical hypothesis test to be used to determine influential
process parameters. Choose between Wilcoxon Rank test ( |

`ssv` |
A vector of suspected sources of variation. These are the variables
in |

`outlier_removal_ssv` |
Logical. Should outlier removal be performed for each |

We collect the Best of the Best and the Worst of the Worst
dynamically dependent on the current `ssv`

. That means, for each `ssv`

we first
remove all the observations with missing values for that `ssv`

from `df`

.
Then, based on the remaining observations, we randomly select `versus`

observations from the the best category (“Best of the Best”, short BOB) and
`versus`

observations from the worst category
(“Worst of the Worst”, short WOW). By default, we select 8 of each.
Next, we compare BOB and WOW using the the counting method and the specified
hypothesis test. If the distributions of the `ssv`

in BOB and WOW are
significantly different, the current `ssv`

has been identified as influential
to the `target`

variable. An `ssv`

is considered influential, if the test returns
a count larger/ equal to 6 and/ or a p-value of less than 0.05.
For the next `ssv`

we again start with the entire dataset `df`

, remove all
the observations with missing values for that new `ssv`

and then select our
new BOB and WOW. In particular, for each `ssv`

we might select different observations.
This dynamic selection is necessary, because in case of an incomplete data set,
if we select the same BOB and WOW for all the `ssv`

, we might end up with many
missing values for particular `ssv`

. In that case the hypothesis test loses
statistical power, because it is used on a smaller sample or worse, might
fail altogether if the sample size gets too small.

For those `ssv`

determined to be significant, control bands are extracted. The rationale is:
If the value for an `ssv`

is in the interval [`good_lower_bound`

,`good_upper_bound`

]
the `target`

is likely to be good. If it is in the interval
[`bad_lower_bound`

,`bad_upper_bound`

], the `target`

is likely to be bad.

Furthermore some summary statistics are provided: `na_removed`

tells us
how many observations have been removed for a particular `ssv`

. When
selecting the `versus`

BOB/ WOW, the selection is done randomly from within
the best/ worst category, i.e. the `versus`

BOB/ WOW are not uniquely
determined. The randomness in the selection is quantified by ```
ties_best_cat,
ties_worst_cat
```

, which gives the size of the best/ worst category respectively.

A data frame with the following columns

`Causes` | Those `ssv` that have been found to be influential to the `target` variable. |

`Count` | The value returned by the counting method. |

`p.value` | The p-value of the hypothesis test performed, i.e. either of the
Wilcoxon rank test (in case `test = "w"` ) or the t-test (if `test = "t"` ). |

`good_lower_bound` | The lower bound for this `Cause` for good quality. |

`good_upper_bound` | The upper bound for this `Cause` for good quality. |

`bad_lower_bound` | The lower bound for this `Cause` for bad quality. |

`bad_upper_bound` | The upper bound for this `Cause` for bad quality. |

`na_removed` | How many missing values were in the data set for this `Cause` ? |

`ties_best_cat` | How many observations fall into the best category? |

`ties_worst_cat` | How many observations fall into the worst category? |

1 2 3 |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.