| basis | R Documentation | 
Calculate the basis value for a given data set. There are various functions
to calculate the basis values for different distributions.
The basis value is the lower one-sided tolerance bound of a certain
proportion of the population. For more information on tolerance bounds,
see Meeker, et. al. (2017).
For B-Basis, set the content of tolerance bound to p=0.90 and
the confidence level to conf=0.95; for A-Basis, set p=0.99 and
conf=0.95. While other tolerance bound
contents and confidence levels may be computed, they are infrequently
needed in practice.
These functions also perform some automated diagnostic tests of the data prior to calculating the basis values. These diagnostic tests can be overridden if needed.
basis_normal(
  data = NULL,
  x,
  batch = NULL,
  p = 0.9,
  conf = 0.95,
  override = c()
)
basis_lognormal(
  data = NULL,
  x,
  batch = NULL,
  p = 0.9,
  conf = 0.95,
  override = c()
)
basis_weibull(
  data = NULL,
  x,
  batch = NULL,
  p = 0.9,
  conf = 0.95,
  override = c()
)
basis_pooled_cv(
  data = NULL,
  x,
  groups,
  batch = NULL,
  p = 0.9,
  conf = 0.95,
  modcv = FALSE,
  override = c()
)
basis_pooled_sd(
  data = NULL,
  x,
  groups,
  batch = NULL,
  p = 0.9,
  conf = 0.95,
  modcv = FALSE,
  override = c()
)
basis_hk_ext(
  data = NULL,
  x,
  batch = NULL,
  p = 0.9,
  conf = 0.95,
  method = c("optimum-order", "woodward-frawley"),
  override = c()
)
basis_nonpara_large_sample(
  data = NULL,
  x,
  batch = NULL,
  p = 0.9,
  conf = 0.95,
  override = c()
)
basis_anova(data = NULL, x, groups, p = 0.9, conf = 0.95, override = c())
| data | a data.frame | 
| x | the variable in the data.frame for which to find the basis value | 
| batch | the variable in the data.frame that contains the batches. | 
| p | the content of the tolerance bound. Should be 0.90 for B-Basis and 0.99 for A-Basis | 
| conf | confidence level Should be 0.95 for both A- and B-Basis | 
| override | a list of names of diagnostic tests to override, if desired. Specifying "all" will override all diagnostic tests applicable to the current method. | 
| groups | the variable in the data.frame representing the groups | 
| modcv | a logical value indicating whether the modified CV approach should be used. Only applicable to pooling methods. | 
| method | the method for Hanson–Koopmans nonparametric basis values. should be "optimum-order" for B-Basis and "woodward-frawley" for A-Basis. | 
data is an optional argument. If data is given, it should
be a
data.frame (or similar object). When data is specified, the
value of x is expected to be a variable within data. If
data is not specified, x must be a vector.
When modcv=TRUE is set, which is only applicable to the
pooling methods,
the data is first modified according to the modified coefficient
of variation (CV)
rules. This modified data is then used when both calculating the
basis values and
also when performing the diagnostic tests. The modified CV approach
is a way of
adding extra variance to datasets with unexpectedly low variance.
basis_normal calculate the basis value by subtracting k times
the standard deviation from the mean. k is given by
the function k_factor_normal(). The equations in
Krishnamoorthy and Mathew (2008) are used.
basis_normal also
performs a diagnostic test for outliers (using
maximum_normed_residual())
and a diagnostic test for normality (using
anderson_darling_normal()).
If the argument batch is given, this function also performs
a diagnostic test for outliers within
each batch (using maximum_normed_residual())
and a diagnostic test for between batch variability (using
ad_ksample()). The argument batch is only used
for these diagnostic tests.
basis_lognormal calculates the basis value in the same way
that basis_normal does, except that the natural logarithm of the
data is taken.
basis_lognormal function also performs
a diagnostic test for outliers (using
maximum_normed_residual())
and a diagnostic test for normality (using
anderson_darling_lognormal()).
If the argument batch is given, this function also performs
a diagnostic test for outliers within
each batch (using maximum_normed_residual())
and a diagnostic test for between batch variability (using
ad_ksample()). The argument batch is only used
for these diagnostic tests.
basis_weibull calculates the basis value for data distributed
according to a Weibull distribution. The confidence level for the
content requested is calculated using the conditional method, as
described in Lawless (1982) Section 4.1.2b. This has good agreement
with tables published in CMH-17-1G. Results differ between this function
and STAT17 by approximately 0.5\
basis_weibull function also performs
a diagnostic test for outliers (using
maximum_normed_residual())
and a diagnostic test for normality (using
anderson_darling_weibull()).
If the argument batch is given, this function also performs
a diagnostic test for outliers within
each batch (using maximum_normed_residual())
and a diagnostic test for between batch variability (using
ad_ksample()). The argument batch is only used
for these diagnostic tests.
basis_hk_ext calculates the basis value using the Extended
Hanson–Koopmans method, as described in CMH-17-1G and Vangel (1994).
For nonparametric distributions, this function should be used for samples
up to n=28 for B-Basis and up to n=299 for A-Basis.
This method uses a pair of order statistics to determine the basis value.
CMH-17-1G suggests that for A-Basis, the first and last order statistic
is used: this is called the "woodward-frawley" method in this package,
after the paper in which this approach is described (as referenced
by Vangel (1994)). For B-Basis, another approach is used whereby the
first and j-th order statistic are used to calculate the basis value.
In this approach, the j-th order statistic is selected to minimize
the difference between the tolerance limit (assuming that the order
statistics are equal to the expected values from a standard normal
distribution) and the population quantile for a standard normal
distribution. This approach is described in Vangel (1994). This second
method (for use when calculating B-Basis values) is called
"optimum-order" in this package.
The results of basis_hk_ext have been
verified against example results from the program STAT-17. Agreement is
typically well within 0.2%.
Note that the implementation of hk_ext_z_j_opt changed after cmstatr
version 0.8.0. This function is used internally by basis_hk_ext
when method = "optimum-order". This implementation change may mean
that basis values computed using this method may change slightly
after version 0.8.0. However, both implementations seem to be equally
valid. See the included vignette
for a discussion of the differences between the implementation before
and after version 0.8.0, as well as the factors given in CMH-17-1G.
To access this vignette, run: vignette("hk_ext", package = "cmstatr")
basis_hk_ext also performs
a diagnostic test for outliers (using
maximum_normed_residual())
and performs a pair of tests that the sample size and method selected
follow the guidance described above.
If the argument batch is given, this function also performs
a diagnostic test for outliers within
each batch (using maximum_normed_residual())
and a diagnostic test for between batch variability (using
ad_ksample()). The argument batch is only used
for these diagnostic tests.
basis_nonpara_large_sample calculates the basis value
using the large sample method described in CMH-17-1G. This method uses
a sum of binomials to determine the rank of the ordered statistic
corresponding with the desired tolerance limit (basis value). Results
of this function have been verified against results of the STAT-17
program.
basis_nonpara_large_sample also performs
a diagnostic test for outliers (using
maximum_normed_residual())
and performs a test that the sample size is sufficiently large.
If the argument batch is given, this function also performs
a diagnostic test for outliers within
each batch (using maximum_normed_residual())
and a diagnostic test for between batch variability (using
ad_ksample()). The argument batch is only used
for these diagnostic tests.
basis_anova calculates basis values using the ANOVA method.
x specifies the data (normally strength) and groups
indicates the group corresponding to each observation. This method is
described in CMH-17-1G, but when the ratio of between-batch mean
square to the within-batch mean square is less than or equal
to one, the tolerance factor is calculated based on pooling the data
from all groups. This approach is recommended by Vangel (1992)
and by Krishnamoorthy and Mathew (2008), and is also implemented
by the software CMH17-STATS and STAT-17.
This function automatically performs a diagnostic
test for outliers within each group
(using maximum_normed_residual()) and a test for between
group variability (using ad_ksample()) as well as checking
that the data contains at least 5 groups.
This function has been verified against the results of the STAT-17 program.
basis_pooled_sd calculates basis values by pooling the data from
several groups together. x specifies the data (normally strength)
and group indicates the group corresponding to each observation.
This method is described in CMH-17-1G and matches the pooling method
implemented in ASAP 2008.
basis_pooled_cv calculates basis values by pooling the data from
several groups together. x specifies the data (normally strength)
and group indicates the group corresponding to each observation.
This method is described in CMH-17-1G.
basis_pooled_sd and basis_pooled_cv both automatically
perform a number of diagnostic tests. Using
maximum_normed_residual(), they check that there are no
outliers within each group and batch (provided that batch is
specified). They check the between batch variability using
ad_ksample(). They check that there are no outliers within
each group (pooling all batches) using
maximum_normed_residual(). They check for the normality
of the pooled data using anderson_darling_normal().
basis_pooled_sd checks for equality of variance of all
data using levene_test() and basis_pooled_cv
checks for equality of variances of all data after transforming it
using normalize_group_mean()
using levene_test().
The object returned by these functions includes the named vector
diagnostic_results. This contains all of the diagnostic tests
performed. The name of each element of the vector corresponds with the
name of the diagnostic test. The contents of each element will be
"P" if the diagnostic test passed, "F" if the diagnostic test failed,
"O" if the diagnostic test was overridden and NA if the
diagnostic test was skipped (typically because an optional
argument was not supplied).
The objects produced by the diagnostic tests are included in the named
list diagnostic_obj. The name of each element in the list corresponds with
the name of the test. This can be useful when evaluating diagnostic test
failures.
The following list summarizes the diagnostic tests automatically performed by each function.
basis_normal
outliers_within_batch
between_batch_variability
outliers
anderson_darling_normal
basis_lognormal
outliers_within_batch
between_batch_variability
outliers
anderson_darling_lognormal
basis_weibull
outliers_within_batch
between_batch_variability
outliers
anderson_darling_weibull
basis_pooled_cv
outliers_within_batch
between_group_variability
outliers_within_group
pooled_data_normal
normalized_variance_equal
basis_pooled_sd
outliers_within_batch
between_group_variability
outliers_within_group
pooled_data_normal
pooled_variance_equal
basis_hk_ext
outliers_within_batch
between_batch_variability
outliers
sample_size
basis_nonpara_large_sample
outliers_within_batch
between_batch_variability
outliers
sample_size
basis_anova
outliers_within_group
equality_of_variance
number_of_groups
an object of class basis
This object has the following fields:
call the expression used to call this function
distribution the distribution used (normal, etc.)
p the value of p supplied
conf the value of conf supplied
modcv a logical value indicating whether the modified
CV approach was used. Only applicable to pooling methods.
data a copy of the data used in the calculation
groups a copy of the groups variable.
Only used for pooling and ANOVA methods.
batch a copy of the batch data used for diagnostic tests
modcv_transformed_data the data after the modified CV transformation
override a vector of the names of diagnostic tests that
were overridden. NULL if none were overridden
diagnostic_results a named character vector containing the
results of all the diagnostic tests. See the Details section for
additional information
diagnostic_obj a named list containing the objects produced by the
diagnostic tests.
diagnostic_failures a vector containing any diagnostic tests
that produced failures
n the number of observations
r the number of groups, if a pooling method was used.
Otherwise it is NULL.
basis the basis value computed. This is a number
except when pooling methods are used, in which case it is a data.frame.
J. F. Lawless, Statistical Models and Methods for Lifetime Data. New York: John Wiley & Sons, 1982.
“Composite Materials Handbook, Volume 1. Polymer Matrix Composites Guideline for Characterization of Structural Materials,” SAE International, CMH-17-1G, Mar. 2012.
M. Vangel, “One-Sided Nonparametric Tolerance Limits,” Communications in Statistics - Simulation and Computation, vol. 23, no. 4. pp. 1137–1154, 1994.
K. Krishnamoorthy and T. Mathew, Statistical Tolerance Regions: Theory, Applications, and Computation. Hoboken: John Wiley & Sons, 2008.
W. Meeker, G. Hahn, and L. Escobar, Statistical Intervals: A Guide for Practitioners and Researchers, Second Edition. Hoboken: John Wiley & Sons, 2017.
M. Vangel, “New Methods for One-Sided Tolerance Limits for a One-Way Balanced Random-Effects ANOVA Model,” Technometrics, vol. 34, no. 2. Taylor & Francis, pp. 176–185, 1992.
hk_ext_z_j_opt()
k_factor_normal()
transform_mod_cv()
maximum_normed_residual()
anderson_darling_normal()
anderson_darling_lognormal()
anderson_darling_weibull()
ad_ksample()
normalize_group_mean()
library(dplyr)
# A single-point basis value can be calculated as follows
# in this example, three failed diagnostic tests are
# overridden.
res <- carbon.fabric %>%
  filter(test == "FC") %>%
  filter(condition == "RTD") %>%
  basis_normal(strength, batch,
               override = c("outliers",
                            "outliers_within_batch",
                            "anderson_darling_normal"))
print(res)
## Call:
## basis_normal(data = ., x = strength, batch = batch,
##     override = c("outliers", "outliers_within_batch",
##    "anderson_darling_normal"))
##
## Distribution:  Normal 	( n = 18 )
## The following diagnostic tests were overridden:
##     `outliers`,
##     `outliers_within_batch`,
##     `anderson_darling_normal`
## B-Basis:   ( p = 0.9 , conf = 0.95 )
## 76.94656
print(res$diagnostic_obj$between_batch_variability)
## Call:
## ad_ksample(x = x, groups = batch, alpha = 0.025)
##
## N = 18           k = 3
## ADK = 1.73       p-value = 0.52151
## Conclusion: Samples come from the same distribution ( alpha = 0.025 )
# A set of pooled basis values can also be calculated
# using the pooled standard deviation method, as follows.
# In this example, one failed diagnostic test is overridden.
carbon.fabric %>%
  filter(test == "WT") %>%
  basis_pooled_sd(strength, condition, batch,
                  override = c("outliers_within_batch"))
## Call:
## basis_pooled_sd(data = ., x = strength, groups = condition,
##                 batch = batch, override = c("outliers_within_batch"))
##
## Distribution:  Normal - Pooled Standard Deviation 	( n = 54, r = 3 )
## The following diagnostic tests were overridden:
##     `outliers_within_batch`
## B-Basis:   ( p = 0.9 , conf = 0.95 )
## CTD  127.6914
## ETW  125.0698
## RTD  132.1457
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.