# basis: Calculate basis values In cmstatr: Statistical Methods for Composite Material Data

## Description

Calculate the basis value for a given data set. There are various functions to calculate the basis values for different distributions. The basis value is the lower one-sided tolerance bound of a certain proportion of the population. For more information on tolerance bounds, see Meeker, et. al. (2017). For B-Basis, set the content of tolerance bound to p=0.90 and the confidence level to conf=0.95; for A-Basis, set p=0.99 and conf=0.95. While other tolerance bound contents and confidence levels may be computed, they are infrequently needed in practice.

These functions also perform some automated diagnostic tests of the data prior to calculating the basis values. These diagnostic tests can be overridden if needed.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69``` ```basis_normal( data = NULL, x, batch = NULL, p = 0.9, conf = 0.95, override = c() ) basis_lognormal( data = NULL, x, batch = NULL, p = 0.9, conf = 0.95, override = c() ) basis_weibull( data = NULL, x, batch = NULL, p = 0.9, conf = 0.95, override = c() ) basis_pooled_cv( data = NULL, x, groups, batch = NULL, p = 0.9, conf = 0.95, modcv = FALSE, override = c() ) basis_pooled_sd( data = NULL, x, groups, batch = NULL, p = 0.9, conf = 0.95, modcv = FALSE, override = c() ) basis_hk_ext( data = NULL, x, batch = NULL, p = 0.9, conf = 0.95, method = c("optimum-order", "woodward-frawley"), override = c() ) basis_nonpara_large_sample( data = NULL, x, batch = NULL, p = 0.9, conf = 0.95, override = c() ) basis_anova(data = NULL, x, groups, p = 0.9, conf = 0.95, override = c()) ```

## Arguments

 `data` a data.frame `x` the variable in the data.frame for which to find the basis value `batch` the variable in the data.frame that contains the batches. `p` the content of the tolerance bound. Should be 0.90 for B-Basis and 0.99 for A-Basis `conf` confidence level Should be 0.95 for both A- and B-Basis `override` a list of names of diagnostic tests to override, if desired. Specifying "all" will override all diagnostic tests applicable to the current method. `groups` the variable in the data.frame representing the groups `modcv` a logical value indicating whether the modified CV approach should be used. Only applicable to pooling methods. `method` the method for Hanson–Koopmans nonparametric basis values. should be "optimum-order" for B-Basis and "woodward-frawley" for A-Basis.

## Details

`data` is an optional argument. If `data` is given, it should be a `data.frame` (or similar object). When `data` is specified, the value of `x` is expected to be a variable within `data`. If `data` is not specified, `x` must be a vector.

When `modcv=TRUE` is set, which is only applicable to the pooling methods, the data is first modified according to the modified coefficient of variation (CV) rules. This modified data is then used when both calculating the basis values and also when performing the diagnostic tests. The modified CV approach is a way of adding extra variance to datasets with unexpectedly low variance.

`basis_normal` calculate the basis value by subtracting k times the standard deviation from the mean. k is given by the function `k_factor_normal()`. The equations in Krishnamoorthy and Mathew (2008) are used. `basis_normal` also performs a diagnostic test for outliers (using `maximum_normed_residual()`) and a diagnostic test for normality (using `anderson_darling_normal()`). If the argument `batch` is given, this function also performs a diagnostic test for outliers within each batch (using `maximum_normed_residual()`) and a diagnostic test for between batch variability (using `ad_ksample()`). The argument `batch` is only used for these diagnostic tests.

`basis_lognormal` calculates the basis value in the same way that `basis_normal` does, except that the natural logarithm of the data is taken.

`basis_lognormal` function also performs a diagnostic test for outliers (using `maximum_normed_residual()`) and a diagnostic test for normality (using `anderson_darling_lognormal()`). If the argument `batch` is given, this function also performs a diagnostic test for outliers within each batch (using `maximum_normed_residual()`) and a diagnostic test for between batch variability (using `ad_ksample()`). The argument `batch` is only used for these diagnostic tests.

`basis_weibull` calculates the basis value for data distributed according to a Weibull distribution. The confidence level for the content requested is calculated using the conditional method, as described in Lawless (1982) Section 4.1.2b. This has good agreement with tables published in CMH-17-1G. Results differ between this function and STAT17 by approximately 0.5\

`basis_weibull` function also performs a diagnostic test for outliers (using `maximum_normed_residual()`) and a diagnostic test for normality (using `anderson_darling_weibull()`). If the argument `batch` is given, this function also performs a diagnostic test for outliers within each batch (using `maximum_normed_residual()`) and a diagnostic test for between batch variability (using `ad_ksample()`). The argument `batch` is only used for these diagnostic tests.

`basis_hk_ext` calculates the basis value using the Extended Hanson–Koopmans method, as described in CMH-17-1G and Vangel (1994). For nonparametric distributions, this function should be used for samples up to n=28 for B-Basis and up to n=299 for A-Basis. This method uses a pair of order statistics to determine the basis value. CMH-17-1G suggests that for A-Basis, the first and last order statistic is used: this is called the "woodward-frawley" method in this package, after the paper in which this approach is described (as referenced by Vangel (1994)). For B-Basis, another approach is used whereby the first and `j-th` order statistic are used to calculate the basis value. In this approach, the `j-th` order statistic is selected to minimize the difference between the tolerance limit (assuming that the order statistics are equal to the expected values from a standard normal distribution) and the population quantile for a standard normal distribution. This approach is described in Vangel (1994). This second method (for use when calculating B-Basis values) is called "optimum-order" in this package. The results of `basis_hk_ext` have been verified against example results from the program STAT-17. Agreement is typically well within 0.2%.

Note that the implementation of `hk_ext_z_j_opt` changed after `cmstatr` version 0.8.0. This function is used internally by `basis_hk_ext` when `method = "optimum-order"`. This implementation change may mean that basis values computed using this method may change slightly after version 0.8.0. However, both implementations seem to be equally valid. See the included vignette for a discussion of the differences between the implementation before and after version 0.8.0, as well as the factors given in CMH-17-1G. To access this vignette, run: `vignette("hk_ext", package = "cmstatr")`

`basis_hk_ext` also performs a diagnostic test for outliers (using `maximum_normed_residual()`) and performs a pair of tests that the sample size and method selected follow the guidance described above. If the argument `batch` is given, this function also performs a diagnostic test for outliers within each batch (using `maximum_normed_residual()`) and a diagnostic test for between batch variability (using `ad_ksample()`). The argument `batch` is only used for these diagnostic tests.

`basis_nonpara_large_sample` calculates the basis value using the large sample method described in CMH-17-1G. This method uses a sum of binomials to determine the rank of the ordered statistic corresponding with the desired tolerance limit (basis value). Results of this function have been verified against results of the STAT-17 program.

`basis_nonpara_large_sample` also performs a diagnostic test for outliers (using `maximum_normed_residual()`) and performs a test that the sample size is sufficiently large. If the argument `batch` is given, this function also performs a diagnostic test for outliers within each batch (using `maximum_normed_residual()`) and a diagnostic test for between batch variability (using `ad_ksample()`). The argument `batch` is only used for these diagnostic tests.

`basis_anova` calculates basis values using the ANOVA method. `x` specifies the data (normally strength) and `groups` indicates the group corresponding to each observation. This method is described in CMH-17-1G, but when the ratio of between-batch mean square to the within-batch mean square is less than or equal to one, the tolerance factor is calculated based on pooling the data from all groups. This approach is recommended by Vangel (1992) and by Krishnamoorthy and Mathew (2008), and is also implemented by the software CMH17-STATS and STAT-17. This function automatically performs a diagnostic test for outliers within each group (using `maximum_normed_residual()`) and a test for between group variability (using `ad_ksample()`) as well as checking that the data contains at least 5 groups. This function has been verified against the results of the STAT-17 program.

`basis_pooled_sd` calculates basis values by pooling the data from several groups together. `x` specifies the data (normally strength) and `group` indicates the group corresponding to each observation. This method is described in CMH-17-1G and matches the pooling method implemented in ASAP 2008.

`basis_pooled_cv` calculates basis values by pooling the data from several groups together. `x` specifies the data (normally strength) and `group` indicates the group corresponding to each observation. This method is described in CMH-17-1G.

`basis_pooled_sd` and `basis_pooled_cv` both automatically perform a number of diagnostic tests. Using `maximum_normed_residual()`, they check that there are no outliers within each group and batch (provided that `batch` is specified). They check the between batch variability using `ad_ksample()`. They check that there are no outliers within each group (pooling all batches) using `maximum_normed_residual()`. They check for the normality of the pooled data using `anderson_darling_normal()`. `basis_pooled_sd` checks for equality of variance of all data using `levene_test()` and `basis_pooled_cv` checks for equality of variances of all data after transforming it using `normalize_group_mean()` using `levene_test()`.

The object returned by these functions includes the named vector `diagnostic_results`. This contains all of the diagnostic tests performed. The name of each element of the vector corresponds with the name of the diagnostic test. The contents of each element will be "P" if the diagnostic test passed, "F" if the diagnostic test failed, "O" if the diagnostic test was overridden and `NA` if the diagnostic test was skipped (typically because an optional argument was not supplied).

The following list summarizes the diagnostic tests automatically performed by each function.

• `basis_normal`

• `outliers_within_batch`

• `between_batch_variability`

• `outliers`

• `anderson_darling_normal`

• `basis_lognormal`

• `outliers_within_batch`

• `between_batch_variability`

• `outliers`

• `anderson_darling_lognormal`

• `basis_weibull`

• `outliers_within_batch`

• `between_batch_variability`

• `outliers`

• `anderson_darling_weibull`

• `basis_pooled_cv`

• `outliers_within_batch`

• `between_group_variability`

• `outliers_within_group`

• `pooled_data_normal`

• `normalized_variance_equal`

• `basis_pooled_sd`

• `outliers_within_batch`

• `between_group_variability`

• `outliers_within_group`

• `pooled_data_normal`

• `pooled_variance_equal`

• `basis_hk_ext`

• `outliers_within_batch`

• `between_batch_variability`

• `outliers`

• `sample_size`

• `basis_nonpara_large_sample`

• `outliers_within_batch`

• `between_batch_variability`

• `outliers`

• `sample_size`

• `basis_anova`

• `outliers_within_group`

• `equality_of_variance`

• `number_of_groups`

## Value

an object of class `basis` This object has the following fields:

• `call` the expression used to call this function

• `distribution` the distribution used (normal, etc.)

• `p` the value of p supplied

• `conf` the value of conf supplied

• `modcv` a logical value indicating whether the modified CV approach was used. Only applicable to pooling methods.

• `data` a copy of the data used in the calculation

• `groups` a copy of the groups variable. Only used for pooling and ANOVA methods.

• `batch` a copy of the batch data used for diagnostic tests

• `modcv_transformed_data` the data after the modified CV transformation

• `override` a vector of the names of diagnostic tests that were overridden. `NULL` if none were overridden

• `diagnostic_results` a named character vector containing the results of all the diagnostic tests. See the Details section for additional information

• `diagnostic_failures` a vector containing any diagnostic tests that produced failures

• `n` the number of observations

• `r` the number of groups, if a pooling method was used. Otherwise it is NULL.

• `basis` the basis value computed. This is a number except when pooling methods are used, in which case it is a data.frame.

## References

J. F. Lawless, Statistical Models and Methods for Lifetime Data. New York: John Wiley & Sons, 1982.

“Composite Materials Handbook, Volume 1. Polymer Matrix Composites Guideline for Characterization of Structural Materials,” SAE International, CMH-17-1G, Mar. 2012.

M. Vangel, “One-Sided Nonparametric Tolerance Limits,” Communications in Statistics - Simulation and Computation, vol. 23, no. 4. pp. 1137–1154, 1994.

K. Krishnamoorthy and T. Mathew, Statistical Tolerance Regions: Theory, Applications, and Computation. Hoboken: John Wiley & Sons, 2008.

W. Meeker, G. Hahn, and L. Escobar, Statistical Intervals: A Guide for Practitioners and Researchers, Second Edition. Hoboken: John Wiley & Sons, 2017.

M. Vangel, “New Methods for One-Sided Tolerance Limits for a One-Way Balanced Random-Effects ANOVA Model,” Technometrics, vol. 34, no. 2. Taylor & Francis, pp. 176–185, 1992.

`hk_ext_z_j_opt()`

`k_factor_normal()`

`transform_mod_cv()`

`maximum_normed_residual()`

`anderson_darling_normal()`

`anderson_darling_lognormal()`

`anderson_darling_weibull()`

`ad_ksample()`

`normalize_group_mean()`

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46``` ```library(dplyr) # A single-point basis value can be calculated as follows # in this example, three failed diagnostic tests are # overridden. carbon.fabric %>% filter(test == "FC") %>% filter(condition == "RTD") %>% basis_normal(strength, batch, override = c("outliers", "outliers_within_batch", "anderson_darling_normal")) ## Call: ## basis_normal(data = ., x = strength, batch = batch, ## override = c("outliers", "outliers_within_batch", ## "anderson_darling_normal")) ## ## Distribution: Normal ( n = 18 ) ## The following diagnostic tests were overridden: ## `outliers`, ## `outliers_within_batch`, ## `anderson_darling_normal` ## B-Basis: ( p = 0.9 , conf = 0.95 ) ## 76.94656 # A set of pooled basis values can also be calculated # using the pooled standard deviation method, as follows. # In this example, one failed diagnostic test is overridden. carbon.fabric %>% filter(test == "WT") %>% basis_pooled_sd(strength, condition, batch, override = c("outliers_within_batch")) ## Call: ## basis_pooled_sd(data = ., x = strength, groups = condition, ## batch = batch, override = c("outliers_within_batch")) ## ## Distribution: Normal - Pooled Standard Deviation ( n = 54, r = 3 ) ## The following diagnostic tests were overridden: ## `outliers_within_batch` ## B-Basis: ( p = 0.9 , conf = 0.95 ) ## CTD 127.6914 ## ETW 125.0698 ## RTD 132.1457 ```

cmstatr documentation built on Sept. 30, 2021, 5:08 p.m.