# conf_dist: Create and Plot _P_-Value Functions, S-Value Functions,... In pvaluefunctions: Creates and Plots P-Value Functions, S-Value Functions, Confidence Distributions and Confidence Densities

## Description

The function `conf_dist` generates confidence distributions (cdf), confidence densities (pdf), Shannon suprisal (s-value) functions and p-value functions for several commonly used estimates. In addition, counternulls (see Rosenthal et al. 1994), point estimates and the area under the confidence curve (AUCC) are calculated.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33``` ```conf_dist( estimate = NULL, n = NULL, df = NULL, stderr = NULL, tstat = NULL, type = NULL, plot_type = c("p_val", "s_val", "cdf", "pdf"), n_values = 10000L, est_names = NULL, conf_level = NULL, null_values = NULL, trans = "identity", alternative = c("two_sided", "one_sided"), log_yaxis = FALSE, cut_logyaxis = 0.05, xlab = NULL, xlim = NULL, together = FALSE, plot_legend = TRUE, same_color = FALSE, col = "black", nrow = NULL, ncol = NULL, plot_p_limit = (1 - 0.999), plot_counternull = FALSE, title = NULL, ylab = NULL, ylab_sec = NULL, inverted = FALSE, x_scale = c("default", "linear", "logarithm"), plot = TRUE ) ```

## Arguments

 `estimate` Numerical vector containing the estimate(s). `n` Numerical vector containing the sample size(s). Required for correlations, variances, proportions and differences between proportions. Must be equal the number of estimates. `df` Numerical vector containing the degrees of freedom. Required for statistics based on the t-distribution (e.g. linear regression) and t-tests. Must be equal the number of estimates. `stderr` Numerical vector containing the standard error(s) of the estimate(s). Required for statistics based on the t-distribution (e.g. linear regression) and the normal distribution (e.g. logistic regression). Must be equal the number of estimate(s). `tstat` Numerical vector contaiqning the t-statistic(s). Required for t-tests (means and mean differences). Must be equal the number of estimates. `type` String indicating the type of the estimate. Must be one of the following: `ttest`, `linreg`, `gammareg`, `general_t`, `logreg`, `poisreg`, `coxreg`, `general_z`, `pearson`, `spearman`, `kendall`, `var`, `prop`, `propdiff`. `plot_type` String indicating the type of plot. Must be one of the following: `cdf` (confidence distribution), `pdf` (confidence density), `p_val` (p-value function, the default), `s_val` (Surprisal value functions). For differences between independent proportions, only p-value functions and Surprisal values are available. `n_values` (optional) Integer indicating the number of points that are used to generate the graphics. The higher this number, the higher the computation time and resolution. `est_names` (optional) String vector indicating the names of the estimate(s). Must be equal the number of estimates. `conf_level` (optional) Numerical vector indicating the confidence level(s). Bust be between 0 and 1. `null_values` (optional) Numerical vector indicating the null value(s) in the plot on the untransformed (original) scale. For example: The null values for an odds ratio of 1 is 0 on the log-odds scale. If x limits are specified with `xlim`, all null values outside of the specified x limits are ignored for plotting and a message is printed. `trans` (optional) String indicating the transformation function that will be applied to the estimates and confidence curves. For example: `"exp"` for an exponential transformation of the log-odds in logistic regression. Can be a custom function. `alternative` String indicating if the confidence level(s) are two-sided or one-sided. Must be one of the following: `two_sided`, `one_sided`. `log_yaxis` Logical. Indicating if a portion of the y-axis should be displayed on the logarithmic scale. `cut_logyaxis` Numerical value indicating the threshold below which the y-axis will be displayed logarithmically. Must lie between 0 and 1. `xlab` (optional) String indicating the label of the x-axis. `xlim` (optional) Optional numerical vector of length 2 (x1, x2) indicating the limits of the x-axis on the untransformed scale if `trans` is not `identity`. The scale of the x-axis set by `x_scale` does not affect the x limits. For example: If you want to plot p-value functions for odds ratios from logistic regressions, the limits have to be given on the log-odds scale if `trans = "exp"`. Note that x1 > x2 is allowed but then x2 will be the left limit and x1 the right limit (i.e. the limits are sorted before plotting). Null values (specified in `null_values`) that are outside of the specified limits are ignored and a message is printed. `together` Logical. Indicating if graphics for multiple estimates should be displayed together or on separate plots. `plot_legend` Logical. Indicating if a legend should be plotted if multiple curves are plotted together with different colors (i.e. `together = TRUE)` and `same_color = FALSE`). `same_color` Logical. Indicating if curves should be distinguished using colors if they are plotted together (i.e. `together = TRUE`). `col` String indicating the colour of the curves. Only relevant for single curves, multiple curves not plotted together (i.e. `together = FALSE`) and multiple curves plotted together but with the option `same_color` set to `TRUE`. `nrow` (optional) Integer greater than 0 indicating the number of rows when `together = FALSE` is specified for multiple estimates. Used in `facet_wrap` in ggplot2. `ncol` (optional) Integer greater than 0 indicating the number of columns when `together = FALSE` is specified for multiple estimates. Used in `facet_wrap` in ggplot2. `plot_p_limit` Numerical value indicating the lower limit of the y-axis. Must be greater than 0 for a logarithmic scale (i.e. `log_yaxis = TRUE`). The default is to omit plotting p-values smaller than 1 - 0.999 = 0.001. `plot_counternull` Logical. Indicating if the counternull should be plotted as a point. Only available for p-value functions and s-value functions. Counternull values that are outside of the plotted functions are not shown. `title` (optional) String containing a title of the plot. `ylab` (optional) String indicating the title for the primary (left) y-axis. `ylab_sec` (optional) String indicating the title for the secondary (right) y-axis. `inverted` Logical. Indicating the orientation of the y-axis for the P-value function (`p_val`), S-value function (`s_val`) and the confidence distribution (`cdf`). By default (i.e. `inverted = FALSE`) small P-values are plotted at the bottom and large ones at the top so that the cusp of the P-value function is a the top. By setting `inverted = TRUE`, the y-axis is inverted. Ignored for confidence densities. `x_scale` String indicating the scaling of the x-axis. The default is to scale the x-axis logarithmically if the transformation specified in `trans` is "exp" (exponential) and linearly otherwise. The option `linear` (can be abbreviated) forces a linear scaling and the option `logarithm` (can be abbreviated) forces a logarithmic scaling, regardless what has been specified in `trans`. `plot` Logical. Should a plot be created (`TRUE`, the default) or not (`FALSE`). `FALSE` can be useful if users want to create their own plots using the returned data from the function. If `FALSE`, no ggplot2 object is returned.

## Details

P-value functions and confidence intervals are calculated based on the t-distribution for t-tests, linear regression coefficients, and gamma regression models (GLM). The normal distribution is used for logistic regression, poisson regression and cox regression models. For correlation coefficients, Fisher's transform is used using the corresponding variances (see Bonett et al. 2000). P-value functions and confidence intervals for variances are constructed using the Chi2 distribution. Finally, Wilson's score intervals are used for one proportion. For differences of proportions, the Wilson score interval with continuity correction is used (Newcombe 1998).

## Value

`conf_dist` returns four data frames and if `plot = TRUE` was specified, a ggplot2-plot object: `res_frame` (contains parameter values (e.g. mean differences, odds ratios etc.), p-values (one- and two-sided), s-values, confidence distributions and densities, variable names and type of hypothesis), `conf_frame` (contains the specified confidence level(s) and the corresponding lower and upper limits as well as the corresponding variable name), `counternull_frame` (contains the counternull and the corresponding null values), `point_est` (contains the mean, median and mode point estimates) and if `plot = TRUE` was specified, `aucc_frame` contains the estimated AUCC (area under the confidence curves) calculated by trapezoidal integration on the untransformed scale, `plot` (a ggplot2 object).

## References

Bender R, Berg G, Zeeb H. Tutorial: using confidence curves in medical research. Biom J. 2005;47(2):237-247.

Berrar D. Confidence curves: an alternative to null hypothesis significance testing for the comparison of classifiers. Mach Learn. 2017;106:911-949.

Bonett DG, Wright TA. Sample size requirements for estimating Pearson, Kendall and Spearman correlations. Psychometrika. 2000;65(1):23-28.

Infanger D, Schmidt-TrucksĂ¤ss A. P value functions: An underused method to present research results and to promote quantitative reasoning. Stat Med. 2019;38:4189-4197.

Newcombe RG. Interval estimation for the difference between independent proportions: comparison of eleven methods. Stat Med. 1998;17:873-890.

Poole C. Confidence intervals exclude nothing. Am J Public Health. 1987;77(4):492-493.

Poole C. Beyond the confidence interval. Am J Public Health. 1987;77(2):195-199.

Rosenthal R, Rubin D. The counternull value of an effect size: a new statistic. Psychological Science. 1994;5(6):329-334.

Rothman KJ, Greenland S, Lash TL. Modern epidemiology. 3rd ed. Philadelphia, PA: Wolters Kluwer; 2008.

Schweder T, Hjort NL. Confidence, likelihood, probability: statistical inference with confidence distributions. New York, NY: Cambridge University Press; 2016.

Sullivan KM, Foster DA. Use of the confidence interval function. Epidemiology. 1990;1(1):39-42.

Xie Mg, Singh K. Confidence distribution, the frequentist distribution estimator of a parameter: A review. Internat Statist Rev. 2013;81(1):3-39.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207``` ```#====================================================================================== # Create a p-value function for an estimate using the normal distribution #====================================================================================== res <- conf_dist( estimate = c(-0.13) , stderr = c(0.224494) , type = "general_z" , plot_type = "p_val" , n_values = 1e4L , est_names = c("Parameter value") , log_yaxis = FALSE , cut_logyaxis = 0.05 , conf_level = c(0.95) , null_values = c(0) , trans = "identity" , alternative = "two_sided" , xlab = "Var" , xlim = c(-1, 1) , together = TRUE , plot_p_limit = 1 - 0.9999 , plot_counternull = TRUE , title = NULL , ylab = NULL , ylab_sec = NULL , inverted = FALSE , x_scale = "default" , plot = TRUE ) #====================================================================================== # P-value function for a single regression coefficient (Agriculture in the model below) #====================================================================================== mod <- lm(Infant.Mortality~Agriculture + Fertility + Examination, data = swiss) summary(mod) res <- conf_dist( estimate = c(-0.02143) , df = c(43) , stderr = (0.02394) , type = "linreg" , plot_type = "p_val" , n_values = 1e4L , conf_level = c(0.95, 0.90, 0.80) , null_values = c(0) , trans = "identity" , alternative = "two_sided" , log_yaxis = TRUE , cut_logyaxis = 0.05 , xlab = "Coefficient Agriculture" , together = FALSE , plot_p_limit = 1 - 0.999 , plot_counternull = FALSE , title = NULL , ylab = NULL , ylab_sec = NULL , inverted = FALSE , x_scale = "default" , plot = TRUE ) #======================================================================================= # P-value function for an odds ratio (logistic regression), plotted with inverted y-axis #======================================================================================= res <- conf_dist( estimate = c(0.804037549) , stderr = c(0.331819298) , type = "logreg" , plot_type = "p_val" , n_values = 1e4L , est_names = c("GPA") , conf_level = c(0.95, 0.90, 0.80) , null_values = c(log(1)) # null value on the log-odds scale , trans = "exp" , alternative = "two_sided" , log_yaxis = FALSE , cut_logyaxis = 0.05 , xlab = "Odds Ratio (GPA)" , xlim = log(c(0.7, 5.2)) # axis limits on the log-odds scale , together = FALSE , plot_p_limit = 1 - 0.999 , plot_counternull = TRUE , title = NULL , ylab = NULL , ylab_sec = NULL , inverted = TRUE , x_scale = "default" , plot = TRUE ) #====================================================================================== # Difference between two independent proportions: Newcombe with continuity correction #====================================================================================== res <- conf_dist( estimate = c(68/100, 98/150) , n = c(100, 150) , type = "propdiff" , plot_type = "p_val" , n_values = 1e4L , conf_level = c(0.95, 0.90, 0.80) , null_values = c(0) , trans = "identity" , alternative = "two_sided" , log_yaxis = FALSE , cut_logyaxis = 0.05 , xlab = "Difference between proportions" , together = FALSE , col = "#A52A2A" # Color curve in auburn , plot_p_limit = 1 - 0.9999 , plot_counternull = FALSE , title = NULL , ylab = NULL , ylab_sec = NULL , inverted = FALSE , x_scale = "default" , plot = TRUE ) #====================================================================================== # Difference between two independent proportions: Agresti & Caffo #====================================================================================== # First proportion x1 <- 8 n1 <- 40 # Second proportion x2 <- 11 n2 <- 30 # Apply the correction p1hat <- (x1 + 1)/(n1 + 2) p2hat <- (x2 + 1)/(n2 + 2) # The original estimator est0 <- (x1/n1) - (x2/n2) # The unmodified estimator and its standard error using the correction est <- p1hat - p2hat se <- sqrt(((p1hat*(1 - p1hat))/(n1 + 2)) + ((p2hat*(1 - p2hat))/(n2 + 2))) res <- conf_dist( estimate = c(est) , stderr = c(se) , type = "general_z" , plot_type = "p_val" , n_values = 1e4L , log_yaxis = FALSE , cut_logyaxis = 0.05 , conf_level = c(0.95, 0.99) , null_values = c(0, 0.3) , trans = "identity" , alternative = "two_sided" , xlab = "Difference of proportions" , together = FALSE , plot_p_limit = 1 - 0.9999 , plot_counternull = FALSE , title = "P-value function for the difference of two independent proportions" , ylab = NULL , ylab_sec = NULL , inverted = FALSE , x_scale = "default" , plot = TRUE ) #======================================================================================== # P-value function and confidence distribution for the relative survival effect (1 - HR%) # Replicating Figure 1 in Bender et al. (2005) #======================================================================================== # Define the transformation function and its inverse for the relative survival effect rse_fun <- function(x){ # x is the log-hazard ratio 100*(1 - exp(x)) } rse_fun_inv <- function(x){ log(1 - (x/100)) } res <- conf_dist( estimate = log(0.72) , stderr = 0.187618 , type = "coxreg" , plot_type = "p_val" , n_values = 1e4L , est_names = c("RSE") , conf_level = c(0.95, 0.8, 0.5) , null_values = rse_fun_inv(0) , trans = "rse_fun" , alternative = "two_sided" , log_yaxis = FALSE , cut_logyaxis = 0.05 , xlab = "Relative survival effect (1 - HR%)" , xlim = rse_fun_inv(c(-30, 60)) , together = FALSE , plot_p_limit = 1 - 0.999 , plot_counternull = TRUE , inverted = TRUE , title = "Figure 1 in Bender et al. (2005)" , x_scale = "default" , plot = TRUE ) ```

pvaluefunctions documentation built on Jan. 13, 2021, 6:34 a.m.