Description Usage Arguments Details Value Examples
View source: R/robust.categorical.igate.R
This function performs a robust an initial Guided Analysis for parameter testing and
controlband extraction (iGATE) for a categorical target variable by repeatedly running
categorical.igate
and only returning those parameters that are selected more often than a
certain threshold.
1 2 3  robust.categorical.igate(df, versus = 8, target, best.cat, worst.cat,
test = "w", ssv = NULL, outlier_removal_ssv = TRUE,
iterations = 50, threshold = 0.5)

df 
Data frame to be analysed. 
versus 
How many Best of the Best and Worst of the Worst do we collect? By default, we will collect 8 of each. 
target 
Target variable to be analysed. Must be categorical.
Use 
best.cat 
The best category. The 
worst.cat 
The worst category. The 
test 
Statistical hypothesis test to be used to determine influential
process parameters. Choose between Wilcoxon Rank test ( 
ssv 
A vector of suspected sources of variation. These are the variables
in 
outlier_removal_ssv 
Logical. Should outlier removal be performed for each 
iterations 
Integer. How often should categorical.igate be performed? A message about how many iterations
have been perfermed so far will be printed to the console every 
threshold 
Between 0 and 1. Only parameters that are selected at least 
We collect the Best of the Best and the Worst of the Worst
dynamically dependent on the current ssv
. That means, for each ssv
we first
remove all the observations with missing values for that ssv
from df
.
Then, based on the remaining observations, we randomly select versus
observations from the the best category (“Best of the Best”, short BOB) and
versus
observations from the worst category
(“Worst of the Worst”, short WOW). By default, we select 8 of each. Since this selection
happens randomly, it is recommended to use robust.categorical.igate
over categorical.igate
.
After the selection we compare BOB and WOW using the the counting method and the specified
hypothesis test. If the distributions of the ssv
in BOB and WOW are
significantly different, the current ssv
has been identified as influential
to the target
variable. An ssv
is considered influential, if the test returns
a count larger/ equal to 6 and/ or a pvalue of less than 0.05.
For the next ssv
we again start with the entire dataset df
, remove all
the observations with missing values for that new ssv
and then select our
new BOB and WOW. In particular, for each ssv
we might select different observations.
This dynamic selection is necessary, because in case of an incomplete data set,
if we select the same BOB and WOW for all the ssv
, we might end up with many
missing values for particular ssv
. In that case the hypothesis test loses
statistical power, because it is used on a smaller sample or worse, might
fail altogether if the sample size gets too small.
For those ssv
determined to be significant, control bands are extracted. The rationale is:
If the value for an ssv
is in the interval [good_lower_bound
,good_upper_bound
]
the target
is likely to be good. If it is in the interval
[bad_lower_bound
,bad_upper_bound
], the target
is likely to be bad.
A data frame with the summary statistics for those parameters that were selected
at least floor(iterations*threshold)
times:
Causes  Those ssv that have been found to be influential to the target variable. 
median_count  The median value returned by the counting method for this parameter. 
median_p_value  The median pvalue of the hypothesis test performed, i.e. either of the
Wilcoxon rank test (in case test = "w" ) or the ttest (if test = "t" ). 
median_good_lower_bound  The median lower bound for this Cause for good quality. 
median_good_upper_bound  The median upper bound for this Cause for good quality. 
median_bad_lower_bound  The median lower bound for this Cause for bad quality. 
median_bad_upper_bound  The median upper bound for this Cause for bad quality.

1 2  robust.categorical.igate(mtcars, target = "cyl",
best.cat = "8", worst.cat = "4", iterations = 50, threshold = 0.5)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.