knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The quest for certainty is the biggest obstacle to becoming risk savvy. (Gerd Gigerenzer)^[Gigerenzer, G. (2014). Risk savvy: How to make good decisions. New York, NY: Penguin. (p. 21).]
A major challenge in mastering risk literacy is coping with inevitable uncertainty. Fortunately, uncertainty in the form of risk can be expressed in terms of probabilities and thus be measured and calculated or "reckoned" with (Gigerenzer, 2002). Nevertheless, probabilistic information is often difficult to understand, even for experts in risk management and statistics. A smart and effective way to communicate probabilities is by expressing them in terms of frequencies.
The problems addressed by riskyr
and the scientific discussion surrounding them can be framed in terms of two representational formats: Basically, information expressed in frequencies is distinguished from information expressed in probabilities (see the user guide for background and references.)
riskyr
reflects this division by distinguishing between the same two data types and hence provides objects that contain frequencies (specifically, a list called freq
) and objects that contain probabilities (a list called prob
). But before we explain their contents, it is important to realize that any such separation is an abstract and artificial one. It may make sense to distinguish frequencies from probabilities for conceptual and educational reasons, but both in theory and in reality both representations are intimately intertwined.
In the following, we will first consider frequencies and probabilities by themselves, but then show how both are related. As a sneak preview, the following prism (or network) plot shows frequencies as its nodes and probabilities as the edges that link the nodes:
library("riskyr") # load the "riskyr" package plot_prism(prev = .01, sens = .80, spec = NA, fart = .096, # 3 essential probabilities N = 1000, # 1 frequency area = "no", # same size for all boxes p_lbl = "abb", # show abbreviated names of probabilities on edges title_lbl = "Example")
For our purposes, frequencies simply are numbers that can be counted  either 0 or positive integers.^[It seems plausible that the notion of a frequency is simpler than the notion of probability. Nevertheless, confusion is possible and typically causes serious scientific disputes. See Gigerenzer & Hoffrage, 1999, and Hoffrage et al., 2002, for different types of frequencies and the concept of "natural frequencies".]
The following 11 frequencies are distinguished by riskyr
and contained in freq
:
Nr. Variable  Definition 
:: :
 1.  N
 The number of cases (or individuals) in the population. 
 2.  cond_true
 The number of cases for which the condition is present (TRUE
). 
 3.  cond_false
 The number of cases for which the condition is absent (FALSE
). 
 4.  dec_pos
 The number of cases for which the decision is positive (TRUE
). 
 5.  dec_neg
 The number of cases for which the decision is negative (FALSE
). 
 6.  dec_cor
 The number of cases for which the decision is correct (correspondence of decision to condition). 
 7.  dec_err
 The number of cases for which the decision is erroneous (difference between decision and condition). 
 8.  hi
 The number of hits or true positives: condition present (TRUE
) & decision positive (TRUE
). 
 9.  mi
 The number of misses or false negatives: condition present (TRUE
) & decision negative (FALSE
). 
 10.  fa
 The number of false alarms or false positives: condition absent (FALSE
) & decision positive (TRUE
). 
 11.  cr
 The number of correct rejections or true negatives: condition absent (FALSE
) & decision negative (FALSE
). 
The frequencies contained in freq
can be viewed from two perspectives:
Topdown: From the entire population to different parts or subgroups:
Whereas N
specifies the size of the entire population, the other 10 frequencies denote the number of individuals or cases in some subset. For instance, the frequency dec_pos
denotes individuals for which the decision or diagnosis is positive. As this frequency is contained within the population, its numeric value must range from 0 to N
.
Bottomup: From the 4 essential subgroups to various combinations of them:
As the 4 frequencies hi
, mi
, fa
, and cr
are not further split into subgroups, we can think of them as atomic elements or four essential frequencies. All other frequencies in freq
are sums of various combinations of these four essential frequencies. This implies that the entire network of frequencies and probabilities (shown in the network diagram above) can be reconstructed from these four essential frequencies.
The following relationships hold among the 11 frequencies:
The population size N
can be split into several subgroups by classifying individuals by 4 different criteria:
a. by condition; b. by decision; c. by accuracy (i.e., the correspondence of decisions to conditions); d. by the actual combination of condition and decision.
Depending on the criterion used, the following relationships hold:
$$ \begin{aligned} \texttt{N} &= \texttt{cond_true} + \texttt{cond_false} & \textrm{(a)}\ &= \texttt{dec_pos} + \texttt{dec_neg} & \textrm{(b)}\ &= \texttt{dec_cor} + \texttt{dec_err} & \textrm{(c)}\ &= \texttt{hi} + \texttt{mi} + \texttt{fa} + \texttt{cr} & \textrm{(d)}\ \end{aligned} $$
Similarly, each of the subsets resulting from using the splits by condition, by decision, or by accuracy, can also be expressed as a sum of two of the four essential frequencies. This results in three different ways of grouping the four essential frequencies:
(a) by condition (corresponding to the two columns of the confusion matrix):
$$ \begin{aligned} \texttt{N} \ &= \ \texttt{cond_true} & +\ \ \ \ \ &\texttt{cond_false} & \textrm{(a)} \ \ &= \ (\texttt{hi} + \texttt{mi}) & +\ \ \ \ \ &(\texttt{fa} + \texttt{cr}) \ \end{aligned} $$
(b) by decision (corresponding to the two rows of the confusion matrix):
$$ \begin{aligned} \texttt{N} \ &= \ \texttt{dec_pos} & +\ \ \ \ \ &\texttt{dec_neg} & \ \ \ \ \ \textrm{(b)} \ \ &= \ (\texttt{hi} + \texttt{fa}) & +\ \ \ \ \ &(\texttt{mi} + \texttt{cr}) \ \end{aligned} $$
(c) by accuracy (or the correspondence of decisions to conditions, corresponding to the two diagonals of the confusion matrix):
$$ \begin{aligned} \texttt{N} \ &= \ \texttt{dec_cor} & +\ \ \ \ \ &\texttt{dec_err} & \ \ \ \ \textrm{(c)} \ \ &= \ (\texttt{hi} + \texttt{cr}) & +\ \ \ \ \ &(\texttt{mi} + \texttt{fa}) \ \end{aligned} $$
It may be tempting to refer to instances of dec_cor
and dec_err
as "true decisions" and "false decisions".
However, this would invite conceptual confusion, as "true decisions" actually include cond_false
cases (cr
) and "false decisions" actually include cond_true
cases (mi
).
The notions of probability is as elusive as ubiquitous (see Hájek, 2012, for a solid exposition of its different concepts and interpretations). For our present purposes, probabilities are simply numbers between 0 and 1. These numbers are defined to reflect particular quantities and can be expressed as percentages, as functions of and ratios between other numbers (frequencies or probabilities).
riskyr
distinguishes between 13 probabilities (see prob
for current values):
Nr. Variable  Name  Definition 
: : : :
 1.  prev
 prevalence  The probability of the condition being TRUE
. 
 2.  sens
 sensitivity  The conditional probability of a positive decision provided that the condition is TRUE
. 
 3.  mirt
 miss rate  The conditional probability of a negative decision provided that the condition is TRUE
. 
 4.  spec
 specificity  The conditional probability of a negative decision provided that the condition is FALSE
. 
 5.  fart
 false alarm rate  The conditional probability of a positive decision provided that the condition is FALSE
. 
 6.  ppod
 proportion of positive decisions  The proportion (baseline probability or rate) of the decision being positive (but not necessarily TRUE
). 
 7.  PPV
 positive predictive value  The conditional probability of the condition being TRUE
provided that the decision is positive. 
 8.  FDR
 false detection rate  The conditional probability of the condition being FALSE
provided that the decision is positive. 
 9.  NPV
 negative predictive value  The conditional probability of the condition being FALSE
provided that the decision is negative. 
 10.  FOR
 false omission rate  The conditional probability of the condition being TRUE
provided that the decision is negative. 
 11.  acc
 accuracy  The probability of a correct decision (i.e., correspondence of decisions to conditions). 
 12.  p_acc_hi
   The conditional probability of the condition being TRUE
provided that a decision or prediction is accurate. 
 13.  p_err_fa
   The conditional probability of the condition being FALSE
provided that a decision or prediction is inaccurate or erroneous. 
Note that the prism diagram (plot_prism
) shows a total of 18 probabilities: 3 perspectives (by = "cd"
, by = "dc"
, and by = "ac"
) and 6 links denoting probabilities per perspective. However, as some probabilities are the complements of others, we currently do not identify all possible probabilities.
Note that a typical riskyr
scenario contains several nonconditional probabilities:
prev
(1.) only depends on features of the condition.ppod
(6.) only depends on features of the decision. acc
(11.) depends on prev
and ppod
, but unconditionally dissects a population into 2 groups (dec_cor
vs. dec_err
). The other probabilities are conditional probabilities based on 3 perspectives:
prev
and features of the decision. ppod
and features of the condition. acc
are currently not computed or defined. The following relationships hold among the conditional probabilities:
sens
and miss rate mirt
are complements:$$
\texttt{sens} = 1  \texttt{mirt}
$$
 The specificity spec
and false alarm rate fart
are complements:
$$
\texttt{spec} = 1  \texttt{fart}
$$
 The positive predictive value PPV
and false detection rate FDR
are complements:
$$
\texttt{PPV} = 1  \texttt{FDR}
$$
 The negative predictive value NPV
and false omission rate FOR
are complements:
$$ \texttt{NPV} = 1  \texttt{FOR} $$
It is possible to adapt Bayes' formula to define PPV
and NPV
in terms of prev
, sens
, and spec
:
$$ \texttt{PPV} = \frac{\texttt{prev} \cdot \texttt{sens}}{\texttt{prev} \cdot \texttt{sens} + (1  \texttt{prev}) \cdot (1  \texttt{sens})}\ \ \ \texttt{NPV} = \frac{(1  \texttt{prev}) \cdot \texttt{spec}}{\texttt{prev} \cdot (1  \texttt{sens}) + (1  \texttt{prev}) \cdot \texttt{spec}} $$
Although this is how the functions comp_PPV
and comp_NPV
compute the desired conditional probability, it is difficult to remember and think in these terms. Instead, we recommend thinking about and defining all conditional probabilities in terms of frequencies (see below).
The easiest way to think about, define, and compute the probabilities (contained in prob
) is in terms of frequencies (contained in freq
):
 Nr. Variable  Name  Definition  as Frequencies 
: : : : : 
 1.  prev
 prevalence  The probability of the condition being TRUE
.  prev
= cond_true
/N

 2.  sens
 sensitivity  The conditional probability of a positive decision provided that the condition is TRUE
.  sens
= hi
/cond_true

 3.  mirt
 miss rate  The conditional probability of a negative decision provided that the condition is TRUE
.  mirt
= mi
/cond_true

 4.  spec
 specificity  The conditional probability of a negative decision provided that the condition is FALSE
.  spec
= cr
/cond_false

 5.  fart
 false alarm rate  The conditional probability of a positive decision provided that the condition is FALSE
.  fart
= fa
/cond_false

 6.  ppod
 proportion of positive decisions  The proportion (baseline probability or rate) of the decision being positive (but not necessarily TRUE
).  ppod
= dec_pos
/N

 7.  PPV
 positive predictive value  The conditional probability of the condition being TRUE
provided that the decision is positive.  PPV
= hi
/dec_pos

 8.  FDR
 false detection rate  The conditional probability of the condition being FALSE
provided that the decision is positive.  FDR
= fa
/dec_pos

 9.  NPV
 negative predictive value  The conditional probability of the condition being FALSE
provided that the decision is negative.  NPV
= cr
/dec_neg

 10.  FOR
 false omission rate  The conditional probability of the condition being TRUE
provided that the decision is negative.  FOR
= mi
/dec_neg

 11.  acc
 accuracy  The probability of a correct decision (i.e., correspondence of decisions to conditions).  acc
= dec_cor
/N

 12.  p_acc_hi
   The conditional probability of the condition being TRUE
provided that a decision or prediction is accurate.  p_acc_hi
= hi
/dec_cor

 13.  p_err_fa
   The conditional probability of the condition being FALSE
provided that a decision or prediction is inaccurate or erroneous.  p_err_fa
= fa
/dec_err

Note that the ratios of frequencies are straightforward consequences of the probabilities' definitions:
The unconditional probabilities (1., 6. and 11.) are proportions of the entire population:
prev
= cond_true
/N
ppod
= dec_pos
/N
acc
= dec_cor
/N
The conditional probabilities (2.5., 7.10., and 11.12.) can be computed as a proportion of the reference group on which they are conditional. More specifically, if we schematically read each definition as "The conditional probability of $X$ provided that $Y$", then the ratio of the corresponding frequencies is X & Y
/Y
. More explicitly,
the ratio's numerator is the frequency of the joint occurrence (i.e., both X & Y
) being the case;
Y
) being the case.When computing probabilities from rounded frequencies, their numeric values may deviate from the true underlying probabilities, particularly for small population sizes N
. (Use the scale
argument of many riskyr
plotting functions to control whether probabilities are based on frequencies.)
The following prism (or network) diagram is based on the following inputs:
prev = .50
);sens = .80
);spec = .60
);N = 10
);and illustrates the relationship between frequencies and probabilities:
plot_prism(prev = .50, sens = .80, spec = .60, # 3 essential probabilities N = 10, # population frequency scale = "f", # scale by frequency, rather than probability ("p") area = "sq", # boxes as squares, with sizes scaled by current scale p_lbl = "num", # show numeric probability values on edges title_lbl = "Probabilities as ratios of frequencies")
Verify that the probabilities (shown as numeric values on the edges) match the ratios of the corresponding frequencies (shown in the boxes). What are the names of these probabilities?
What is the frequency of dec_cor
and dec_err
cases? Where do these cases appear in the diagram?
The parameter values in the example do not require any rounding of frequencies. Change them (e.g., to N = 5
) and explore what happens when alternating between scale = "f"
and scale = "p"
.
Gigerenzer, G. (2002). Reckoning with risk: Learning to live with uncertainty. London, UK: Penguin.
Gigerenzer, G. (2014). Risk savvy: How to make good decisions. New York, NY: Penguin.
Gigerenzer, G., & Hoffrage, U. (1999). Overcoming difficulties in Bayesian reasoning: A reply to Lewis and Keren (1999) and Mellers and McGraw (1999). Psychological Review, 106, 425430.
Hájek, A (2012) Interpretations of Probability. In Edward N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy. URL: https://plato.stanford.edu/entries/probabilityinterpret/ 2012 Archive
Hoffrage, U., Gigerenzer, G., Krauss, S., & Martignon, L. (2002). Representation facilitates reasoning: What natural frequencies are and what they are not. Cognition, 84, 343352.
Trevethan, R. (2017).
Sensitivity, specificity, and predictive values: Foundations, pliabilities, and pitfalls in research and practice.
Frontiers in Public Health, 5, 307.
doi: 10.3389/fpubh.2017.00307
The following resources and versions are currently available:
Type:  Version:  URL: 
:::
A. riskyr
(R package):  Release version  https://CRAN.Rproject.org/package=riskyr 
\  Development version  https://github.com/hneth/riskyr 
B. riskyrApp
(R Shiny code):  Online version  http://riskyr.org 
\  Development version  https://github.com/hneth/riskyrApp 
C. Online documentation:  Release version  https://hneth.github.io/riskyr 
\  Development version  https://hneth.github.io/riskyr/dev 
We appreciate your feedback, comments, or questions.
Please report any riskyr
related issues at https://github.com/hneth/riskyr/issues.
Email us at contact.riskyr@gmail.com if you want to modify or share this software.
 Nr.  Vignette  Content 
 : ::
 A.  User guide  Motivation and general instructions 
 B.  Data formats  Data formats: Frequencies and probabilities 
 C.  Confusion matrix  Confusion matrix and accuracy metrics 
 D.  Functional perspectives  Adopting functional perspectives 
 E.  Quick start primer  Quick start primer 
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.