In erichare/discreteRV: Create and Manipulate Discrete Random Variables

library(MASS)
library(discreteRV)

discreteRV is an R package that allows simple creation and manipluation of discrete random variables using a clean and familiar syntactical interface. This vignette will guide you through the basic operations supported by discreteRV.

Creating discrete random variables

A discrete random variable is defined through the use of the \code{RV} function. \code{RV} accepts a vector of outcomes, a vector of probabilities, and returns an \code{RV} object. The code to create X, a random variable representing the roll of a fair die, is as follows:

(X <- RV(outcomes = 1:6, probs = 1/6))

Probabilities are uniform by default, so the following code is equivalent:

(X <- RV(1:6))

Outcomes can be specified as a range of values, which is useful for distributions in which the outcomes that can occur with non-zero probability are unbounded. To define a Poisson random variable Y, we specify the outcomes as a range and the probabilities as a function:

pois.func <- function(y, lambda) { return(lambda^y * exp(-lambda) / factorial(y)) }

(Y <- RV(outcomes = c(0, Inf), probs = pois.func, lambda = 2))

Several common distributions are natively supported so that the function need not be defined manually. For instance, an equivalent method of defining Y is:

(Y <- RV("poisson", lambda = 2))

The RV function also allows the definition of a random variable in terms of odds. We construct a loaded die in which a roll of one is four times as likely as any other roll as:

(X.loaded <- RV(outcomes = 1:6, odds = c(4, 1, 1, 1, 1, 1)))

Probability Calculations

Basic probability calculations are performed using the $P$ function, such as:

P(X == 2)
P(X < 3)
P(X < 3 | X < 4)

Consider our Poisson random variable Y, and suppose we want to obtain the probability that Y is within a distance $\delta$ of its mean parameter $\lambda = 2$:

delta <- 3
lambda <- 2

P((Y >= lambda - delta) %AND% (Y <= lambda + delta))

Alternatively, we could have also used the slightly more complicated looking expression:

P((Y - lambda)^2 <= delta^2)

We can compute several other distributional quantities, including the expected value and the variance of a random variable:

E(X)
V(X)
E( (X-E(X))^2 )

Joint Distributions

A joint distribution can be defined with the jointRV function, with a list of possible outcomes and probabilities in row-major order:

(AandB <- jointRV(outcomes = list(1:3, 0:2), probs = 1:9 / sum(1:9)))

The individual marginal distributions can be obtained by use of the marginal function:

A <- marginal(AandB, 1)
B <- marginal(AandB, 2)

Although the marginal distributions allow all the same computations of any univariate random variable, they maintain a special property. The joint distribution that produced the marginals is stored as attributes in the object. This allows for several more advanced probability calculations, involving the marginal and conditional distributions:

P(A < B)
P(A == 2 | B > 0)
P(A == 2 | (B == 1) %OR% (B == 2))
independent(A, B)
A | (A > 1)
A | (B == 2)
E(A | (B == 2))

A joint distribution can be defined from a univariate distribution using the iid function (to create a joint distribution of n independent realizations of the specified random variable) of the SofIID function (to create a distribution for the sum of n independent realizations of the random variable):

(X2 <- iid(X, n = 2))
(X3 <- iid(X, n = 3))
(X2 <- SofIID(X, n = 2))
(X20 <- SofIID(X, n = 20, progress = FALSE))

Alternatively, the + and * operators have been overloaded to allow these computations in a cleaner syntax, such as:

RV(1:6) + RV(1:6)

Plotting

Plot and qqnorm methods are defined for random variable objects:

plot(X)
plot(X2)
plot(X20)
qqnorm(X20)
abline()

Simulation

discreteRV also includes a set of functions to simulate trials from a random variable.

(X.sim <- rsim(X, 10))

props(X.sim)
Prop(X.sim == 3)
Prop(X.sim > 3)

\section{Extended example: playing Craps} Craps is a common dice game played in casinos. The game begins with what is called the "Come Out" roll, in which two fair dice are rolled. If a sum of seven or eleven is obtained, the player wins. If a sum of two, three, or twelve is obtained, the player loses. In all other cases, the roll obtained is declared the ``Point" and the player rolls again in an attempt to obtain this same point value. If the player rolls the Point, they win, but if they roll a seven, they lose. Rolls continue until one of these two outcomes is achieved.

discreteRV allows for a seamless analysis and simulation of the probabilities associated with different events in Craps. Let us begin by asking "What is the probability that the game ends after the first roll?" To answer this question we construct two random variables. We note that calling \code{RV(1:6)} returns a random variable for a single roll of a fair die, and then we use the overloaded \code{+} operator to sum over two rolls to obtain the random variable \code{Roll}.

(Roll <- RV(1:6) + RV(1:6))

Recall that the game ends after the first roll if and only if a seven or eleven is obtained (resulting in a win), or a two, three, or twelve is obtained (resulting in a loss). Hence, we calculate the probability that the game ends after the first roll as follows:

P(Roll %in% c(7, 11, 2, 3, 12))

Now suppose we would like to condition on the game having ended after the first roll. Using the conditional probability operator in discreteRV, we can obtain the probabilities of winning and losing given that the game ended after the first roll:

P(Roll %in% c(7, 11) | Roll %in% c(7, 11, 2, 3, 12))
P(Roll %in% c(2, 3, 12) | Roll %in% c(7, 11, 2, 3, 12))

Now, let's turn our attention to calculating the probability of winning a game in two rolls. Recall that we can use the \code{iid} function to generate joint distributions of independent and identically distributed random variables. In this case, we would like to generate the joint distribution for two independent rolls of two dice. Now, we will have $11^2$ possible outcomes, and our job is to determine which outcomes result in a win. We know that any time the first roll is a seven or eleven, we will have won. We also know that if the roll is between four and ten inclusive, then we will get to roll again. To win within two rolls given that we've received a four through ten requires that the second roll match the first. We can enumerate the various possibilities to calculate the probability of winning in two rolls, which is approximately 30%.

TwoRolls <- iid(Roll, 2)

First <- marginal(TwoRolls, 1)
Second <- marginal(TwoRolls, 2)

P(First %in% c(7, 11) %OR% (First %in% 4:10 %AND% (First == Second)))

Finally, suppose we are interested in the empirical probability of winning a game of Craps. Using the simulation functions in discreteRV, we can write a routine to simulate playing Craps. Using the \code{rsim} function, we simulate a single game of Craps by rolling from our random variable \code{Roll}, which represents the sum of two dice. We then perform this simulation 100000 times. The results indicate that the player wins a game of craps approximately 49% of the time.

craps_game <- function(RV) {

    my.roll <- rsim(RV, 1)

    if (my.roll %in% c(7, 11)) { return(1) }
    else if (my.roll %in% c(2, 3, 12)) { return(0) }
    else {
        new.roll <- 0
        while (new.roll != my.roll & new.roll != 7) {
            new.roll <- rsim(RV, 1)
        }

        return(as.numeric(new.roll == my.roll))
    }
}

sim.results <- replicate(100000, craps_game(Roll))
mean(sim.results)