In IPSUR: Introduction to Probability and Statistics Using R

Multivariate Distributions {#cha-multivariable-distributions}

#    IPSUR: Introduction to Probability and Statistics Using R
#    Copyright (C) 2018  G. Jay Kerns
#
#    Chapter: Multivariate Distributions
#
#    This file is part of IPSUR.
#
#    IPSUR is free software: you can redistribute it and/or modify
#    it under the terms of the GNU General Public License as published by
#    the Free Software Foundation, either version 3 of the License, or
#    (at your option) any later version.
#
#    IPSUR is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#    GNU General Public License for more details.
#
#    You should have received a copy of the GNU General Public License
#    along with IPSUR.  If not, see <http://www.gnu.org/licenses/>.

# This chapter's package dependencies
library(ggplot2)
library(grid)
library(lattice)
library(mvtnorm)
library(prob)

We have built up quite a catalogue of distributions, discrete and continuous. They were all univariate, however, meaning that we only considered one random variable at a time. We can imagine nevertheless many random variables associated with a single person: their height, their weight, their wrist circumference (all continuous), or their eye/hair color, shoe size, whether they are right handed, left handed, or ambidextrous (all categorical), and we can even surmise reasonable probability distributions to associate with each of these variables.

But there is a difference: for a single person, these variables are related. For instance, a person's height betrays a lot of information about that person's weight.

The concept we are hinting at is the notion of dependence between random variables. It is the focus of this chapter to study this concept in some detail. Along the way, we will pick up additional models to add to our catalogue. Moreover, we will study certain classes of dependence, and clarify the special case when there is no dependence, namely, independence.

The interested reader who would like to learn more about any of the below mentioned multivariate distributions should take a look at Discrete Multivariate Distributions by Johnson et al [@Johnson1997] or Continuous Multivariate Distributions [@Kotz2000] by Kotz et al.

What do I want them to know?

the basic notion of dependence and how it is manifested with multiple variables (two, in particular)
joint versus marginal distributions/expectation (discrete and continuous)
some numeric measures of dependence
conditional distributions, in the context of independence and exchangeability
some details of at least one multivariate model (discrete and continuous)
what it looks like when there are more than two random variables present

Joint and Marginal Probability Distributions {#sec-joint-probability-distributions}

Consider two discrete random variables (X) and (Y) with PMFs (f_{X}) and (f_{Y}) that are supported on the sample spaces (S_{X}) and (S_{Y}), respectively. Let (S_{X,Y}) denote the set of all possible observed pairs ((x,y)), called the joint support set of (X) and (Y). Then the joint probability mass function of (X) and (Y) is the function (f_{X,Y}) defined by \begin{equation} \label{eq-joint-pmf} f_{X,Y}(x,y)=\mathbb{P}(X=x,\, Y=y),\quad \mbox{for }(x,y)\in S_{X,Y}. \end{equation} Every joint PMF satisfies \begin{equation} f_{X,Y}(x,y)>0\mbox{ for all }(x,y)\in S_{X,Y}, \end{equation} and \begin{equation} \sum_{(x,y)\in S_{X,Y}}f_{X,Y}(x,y)=1. \end{equation} It is customary to extend the function (f_{X,Y}) to be defined on all of (\mathbb{R}^{2}) by setting (f_{X,Y}(x,y)=0) for ((x,y)\not\in S_{X,Y}).

In the context of this chapter, the PMFs (f_{X}) and (f_{Y}) are called the marginal PMFs of (X) and (Y), respectively. If we are given only the joint PMF then we may recover each of the marginal PMFs by using the Theorem of Total Probability (see Equation \eqref{eq-theorem-total-probability}): observe \begin{eqnarray} f_{X}(x) & = & \mathbb{P}(X=x),\ & = & \sum_{y\in S_{Y}}\mathbb{P}(X=x,\, Y=y),\ & = & \sum_{y\in S_{Y}}f_{X,Y}(x,y). \end{eqnarray} By interchanging the roles of (X) and (Y) it is clear that \begin{equation} \label{eq-marginal-pmf} f_{Y}(y)=\sum_{x\in S_{Y}}f_{X,Y}(x,y). \end{equation} Given the joint PMF we may recover the marginal PMFs, but the converse is not true. Even if we have both marginal distributions they are not sufficient to determine the joint PMF; more information is needed[^multdist-1].

[^multdist-1]: We are not at a total loss, however. There are Frechet bounds which pose limits on how large (and small) the joint distribution must be at each point.

Associated with the joint PMF is the joint cumulative distribution function (F_{X,Y}) defined by [ F_{X,Y}(x,y)=\mathbb{P}(X\leq x,\, Y\leq y),\quad \mbox{for }(x,y)\in\mathbb{R}^{2}. ] The bivariate joint CDF is not quite as tractable as the univariate CDFs, but in principle we could calculate it by adding up quantities of the form in Equation \eqref{eq-joint-pmf}. The joint CDF is typically not used in practice due to its inconvenient form; one can usually get by with the joint PMF alone.

We now introduce some examples of bivariate discrete distributions. The first we have seen before, and the second is based on the first.

\bigskip

```{example, label="toss-two-dice-joint-pmf"} Roll a fair die twice. Let (X) be the face shown on the first roll, and let (Y) be the face shown on the second roll. We have already seen this example in Chapter \@ref(cha-probability), Example \@ref(ex:toss-a-six-sided-die-twice). For this example, it suffices to define [ f_{X,Y}(x,y)=\frac{1}{36},\quad x=1,\ldots,6,\ y=1,\ldots,6. ] The marginal PMFs are given by (f_{X}(x)=1/6), (x=1,2,\ldots,6), and (f_{Y}(y)=1/6), (y=1,2,\ldots,6), since [ f_{X}(x)=\sum_{y=1}^{6}\frac{1}{36}=\frac{1}{6},\quad x=1,\ldots,6, ] and the same computation with the letters switched works for (Y).

In the previous example, and in many other ones, the joint support can
be written as a product set of the support of \(X\) "times" the
support of \(Y\), that is, it may be represented as a cartesian
product set, or rectangle, \(S_{X,Y}=S_{X}\times S_{Y}\), where
\(S_{X} \times S_{Y}= \{ (x,y):\ x\in S_{X},\, y\in S_{Y} \}\). As we
shall see presently in Section \@ref(sec-independent-random-variables), this form
is a necessary condition for \(X\) and \(Y\) to be *independent* (or
alternatively *exchangeable* when \(S_{X}=S_{Y}\)). But please note
that in general it is not required for \(S_{X,Y}\) to be of rectangle
form. We next investigate just such an example.

\bigskip

```{example, label="max-sum-two-dice"}
Let the random experiment again be to roll a
fair die twice, except now let us define the random variables \(U\)
and \(V\) by
\begin{eqnarray*}
U & = & \mbox{the maximum of the two rolls, and }\\
V & = & \mbox{the sum of the two rolls.}
\end{eqnarray*}
We see that the support of \(U\) is \(S_{U}= \{ 1,2,\ldots,6 \}\) and
the support of \(V\) is \(S_{V}= \{ 2,3,\ldots,12 \}\). We may
represent the sample space with a matrix, and for each entry in the
matrix we may calculate the value that \(U\) assumes. The result is in
the left half of Figure \@ref(fig:max-and-sum-two-dice).

We can use the table to calculate the marginal PMF of (U), because from Example \@ref(ex:toss-a-six-sided-die-twice) we know that each entry in the matrix has probability (1/36) associated with it. For instance, there is only one outcome in the matrix with (U=1), namely, the bottom left corner. This single entry has probability (1/36), therefore, it must be that (f_{U}(1)=\mathbb{P}(U=1)=1/36). Similarly we see that there are three entries in the matrix with (U=2), thus (f_{U}(2)=3/36). Continuing in this fashion we will find the marginal distribution of (U) may be written \begin{equation} f_{U}(u)=\frac{2u-1}{36},\quad u=1,\,2,\ldots,6. \end{equation} We may do a similar thing for (V); see the right half of Figure \@ref(fig:max-and-sum-two-dice). Collecting all of the probability we will find that the marginal PMF of (V) is \begin{equation} f_{V}(v)=\frac{6-|v-7|}{36},\quad v=2,\,3,\ldots,12. \end{equation}

A <- rolldie(2, makespace = TRUE)
A <- addrv(A, max, name = "U")
A <- addrv(A, sum, name = "V", invars = c("X1", "X2"))
p1 <- ggplot(A, aes(x=X1, y=X2, label=U))
p1 <- p1 + geom_text(size = 6) + xlab("X1 = First roll") + 
      ylab("X2 = Second roll") + labs(title = "U = max(X1,X2)")
p2 <- ggplot(A, aes(x=X1, y=X2, label=V))
p2 <- p2 + geom_text(size = 6) + xlab("X1 = First roll") + 
      ylab("") + labs(title = "V = X1 + X2")
grid.newpage()
pushViewport(viewport(layout = grid.layout(1, 2)))
vplayout <- function(x, y) viewport(layout.pos.row = x, layout.pos.col = y)
print(p1, vp = vplayout(1, 1))
print(p2, vp = vplayout(1, 2))

(ref:cap-max-and-sum-two-dice) \small Rolling two dice. The value of U is the maximum of the two rolls, while the value of V is the sum of the two rolls.

We may collapse the two matrices from Figure \@ref(fig:max-and-sum-two-dice) into one, big matrix of pairs of values ((u,v)). The result is shown in Table \@ref(fig:max-sum-two-dice-joint).

labs <- with(A, paste("(", U, ",", V, ")", sep = ""))
p3 <- ggplot(A, aes(x = X1, y = X2, label = labs))
p3 + geom_text(size = 6) + xlab("First roll") + ylab("Second roll") + 
  labs(title = "Joint distribution of (U,V) pairs")

(ref:cap-max-sum-two-dice-joint) \small Rolling two dice. Joint values of U and V are shown as pairs, for each outcome in the sample space.

Again, each of these pairs has probability (1/36) associated with it and we are looking at the joint PDF of ((U,V)) albeit in an unusual form. Many of the pairs are repeated, but some of them are not: ((1,2)) appears only once, but ((2,3)) appears twice. We can make more sense out of this by writing a new table with (U) on one side and (V) along the top. We will accumulate the probability just like we did in Example \@ref(ex:toss-two-dice-joint-pmf). See Table \@ref(tab-max-sum-joint-pmf).

(ref:tab-max-sum-joint-pm)

| | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | Total | |-------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+------------------+-------------------| | 1 | (\frac{1}{36}) | | | | | | | | | | | (\frac{1}{36}) | | 2 | | (\frac{2}{36}) | (\frac{1}{36}) | | | | | | | | | (\frac{3}{36}) | | 3 | | | (\frac{2}{36}) | (\frac{2}{36}) | (\frac{1}{36}) | | | | | | | (\frac{5}{36}) | | 4 | | | | (\frac{2}{36}) | (\frac{2}{36}) | (\frac{2}{36}) | (\frac{1}{36}) | | | | | (\frac{7}{36}) | | 5 | | | | | (\frac{2}{36}) | (\frac{2}{36}) | (\frac{2}{36}) | (\frac{2}{36}) | (\frac{1}{36}) | | | (\frac{9}{36}) | | 6 | | | | | | (\frac{2}{36}) | (\frac{2}{36}) | (\frac{2}{36}) | (\frac{2}{36}) | (\frac{2}{36}) | (\frac{1}{36}) | (\frac{11}{36}) | | Total | (\frac{1}{36}) | (\frac{2}{36}) | (\frac{3}{36}) | (\frac{4}{36}) | (\frac{5}{36}) | (\frac{6}{36}) | (\frac{5}{36}) | (\frac{4}{36}) | (\frac{3}{36}) | (\frac{2}{36}) | (\frac{1}{36}) | 1 |

Table: The joint PMF of ((U,V)). The outcomes of (U) are along the left and the outcomes of (V) are along the top. Empty entries in the table have zero probability. The row totals (on the right) and column totals (on the bottom) correspond to the marginal distribution of (U) and (V), respectively.

The joint support of ((U,V)) is concentrated along the main diagonal; note that the nonzero entries do not form a rectangle. Also notice that if we form row and column totals we are doing exactly the same thing as Equation \eqref{eq-marginal-pmf}, so that the marginal distribution of (U) is the list of totals in the right "margin" of the Table \@ref(tab-max-sum-joint-pmf), and the marginal distribution of (V) is the list of totals in the bottom "margin".

\newpage

Continuing the reasoning for the discrete case, given two continuous random variables (X) and (Y) there similarly exists[^multdist-2] a function (f_{X,Y}(x,y)) associated with (X) and (Y) called the joint probability density function of (X) and (Y). Every joint PDF satisfies \begin{equation} f_{X,Y}(x,y)\geq0\mbox{ for all }(x,y)\in S_{X,Y}, \end{equation} and \begin{equation} \iintop_{S_{X,Y}}f_{X,Y}(x,y)\,\mathrm{d} x\,\mathrm{d} y=1. \end{equation} In the continuous case there is not such a simple interpretation for the joint PDF; however, we do have one for the joint CDF, namely, [ F_{X,Y}(x,y)=\mathbb{P}(X\leq x,\, Y\leq y)=\int_{-\infty}^{x}\int_{-\infty}^{y}f_{X,Y}(u,v)\,\mathrm{d} v\,\mathrm{d} u, ] for ((x,y)\in\mathbb{R}^{2}). If (X) and (Y) have the joint PDF (f_{X,Y}), then the marginal density of (X) may be recovered by \begin{equation} f_{X}(x)=\int_{S_{Y}}f_{X,Y}(x,y)\,\mathrm{d} y,\quad x \in S_{X} \end{equation} and the marginal PDF of (Y) may be found with \begin{equation} f_{Y}(y)=\int_{S_{X}}f_{X,Y}(x,y)\,\mathrm{d} x, \quad y \in S_{Y}. \end{equation}

[^multdist-2]: Strictly speaking, the joint density function does not necessarily exist. But the joint CDF always exists.

\bigskip

```{example, label="joint-pdf"} Let the joint PDF of ((X,Y)) be given by [ f_{X,Y}(x,y)=\frac{6}{5}\left(x+y^{2}\right),\quad 0 < x < 1,\ 0 < y < 1. ] The marginal PDF of (X) is \begin{eqnarray} f_{X}(x) & = & \int_{0}^{1}\frac{6}{5}\left(x+y^{2}\right)\,\mathrm{d} y,\ & = & \left.\frac{6}{5}\left(xy+\frac{y^{3}}{3}\right)\right|_{y=0}^{1},\ & = & \frac{6}{5}\left(x+\frac{1}{3}\right), \end{eqnarray} for (0 < x < 1), and the marginal PDF of (Y) is \begin{eqnarray} f_{Y}(y) & = & \int_{0}^{1}\frac{6}{5}\left(x+y^{2}\right)\,\mathrm{d} x,\ & = & \left.\frac{6}{5}\left(\frac{x^{2}}{2}+xy^{2}\right)\right|_{x=0}^{1},\ & = & \frac{6}{5}\left(\frac{1}{2}+y^{2}\right), \end{eqnarray} for (0 < y < 1). In this example the joint support set was a rectangle ([0,1]\times[0,1]), but it turns out that (X) and (Y) are not independent. See Section \@ref(sec-independent-random-variables).

### How to do it with R

We will show how to do Example \@ref(ex:max-sum-two-dice) using R;
it is much simpler to do it with R than without. First we
set up the sample space with the `rolldie` function. Next, we add
random variables \(U\) and \(V\) with the `addrv` function. We take a
look at the very top of the data frame (probability space) to make
sure that everything is operating according to plan.

```r
S <- rolldie(2, makespace = TRUE)
S <- addrv(S, FUN = max, invars = c("X1","X2"), name = "U")
S <- addrv(S, FUN = sum, invars = c("X1","X2"), name = "V")
head(S)

Yes, the (U) and (V) columns have been added to the data frame and have been computed correctly. This result would be fine as it is, but the data frame has too many rows: there are repeated pairs ((u,v)) which show up as repeated rows in the data frame. The goal is to aggregate the rows of (S) such that the result has exactly one row for each unique pair ((u,v)) with positive probability. This sort of thing is exactly the task for which the marginal function was designed. We may take a look at the joint distribution of (U) and (V) (we only show the first few rows of the data frame, but the complete one has 11 rows).

UV <- marginal(S, vars = c("U", "V"))
head(UV)

The data frame is difficult to understand. It would be better to have a tabular display like Table \@ref(tab-max-sum-joint-pmf). We can do that with the xtabs function.

xtabs(round(probs,3) ~ U + V, data = UV)

Compare these values to the ones shown in Table \@ref(tab-max-sum-joint-pmf). We can repeat the process with marginal to get the univariate marginal distributions of (U) and (V) separately.

marginal(UV, vars = "U")
head(marginal(UV, vars = "V"))

Another way to do the same thing is with the rowSums and colSums of the xtabs object. Compare

temp <- xtabs(probs ~ U + V, data = UV)
rowSums(temp)
colSums(temp)

You should check that the answers that we have obtained exactly match the same (somewhat laborious) calculations that we completed in Example \@ref(ex:max-sum-two-dice).

Joint and Marginal Expectation {#sec-joint-and-marginal-expectation}

Given a function (g) with arguments ((x,y)) we would like to know the long-run average behavior of (g(X,Y)) and how to mathematically calculate it. Expectation in this context is computed in the pedestrian way. We simply integrate (sum) with respect to the joint probability density (mass) function. \begin{equation} \mathbb{E}\, g(X,Y)=\iintop_{S_{X,Y}}g(x,y)\, f_{X,Y}(x,y)\,\mathrm{d} x\,\mathrm{d} y, \end{equation} or in the discrete case, \begin{equation} \mathbb{E}\, g(X,Y)=\mathop{\sum\sum}\limits {(x,y)\in S{X,Y}}g(x,y)\, f_{X,Y}(x,y). \end{equation}

Covariance and Correlation

There are two very special cases of joint expectation: the covariance and the correlation. These are measures which help us quantify the dependence between (X) and (Y).

\bigskip

The *covariance* of \(X\) and \(Y\) is
\begin{equation}
\mbox{Cov}(X,Y)=\mathbb{E}(X-\mathbb{E} X)(Y-\mathbb{E} Y).
\end{equation}

By the way, there is a shortcut formula for covariance which is almost as handy as the shortcut for the variance: \begin{equation} \mbox{Cov}(X,Y)=\mathbb{E}(XY)-(\mathbb{E} X)(\mathbb{E} Y). \end{equation} The proof is left to Exercise \@ref(xca-prove-cov-shortcut).

The Pearson product moment correlation between (X) and (Y) is the covariance between (X) and (Y) rescaled to fall in the interval ([-1,1]). It is formally defined by \begin{equation} \mbox{Corr}(X,Y)=\frac{\mbox{Cov}(X,Y)}{\sigma_{X}\sigma_{Y}}. \end{equation}

The correlation is usually denoted by (\rho_{X,Y}) or simply (\rho) if the random variables are clear from context. There are some important facts about the correlation coefficient: 1. The range of correlation is (-1\leq\rho_{X,Y}\leq1). 1. Equality holds above ((\rho_{X,Y}=\pm1)) if and only if (Y) is a linear function of (X) with probability one.

\bigskip

```{example, label="max-sum-dice-covariance"} We will compute the covariance for the discrete distribution in Example \@ref(ex:max-sum-two-dice). The expected value of (U) is [ \mathbb{E} U=\sum_{u=1}^{6}u\, f_{U}(u)=\sum_{u=1}^{6}u\,\frac{2u-1}{36}=1\left(\frac{1}{36}\right)+2\left(\frac{3}{36}\right)+\cdots+6\left(\frac{11}{36}\right)=\frac{161}{36}, ] and the expected value of (V) is [ \mathbb{E} V=\sum_{v=2}^{12}v\,\frac{6-|7-v|}{36}=2\left(\frac{1}{36}\right)+3\left(\frac{2}{36}\right)+\cdots+12\left(\frac{1}{36}\right)=7, ] and the expected value of (UV) is [ \mathbb{E} UV=\sum_{u=1}^{6}\sum_{v=2}^{12}uv\, f_{U,V}(u,v)=1\cdot2\left(\frac{1}{36}\right)+2\cdot3\left(\frac{2}{36}\right)+\cdots+6\cdot12\left(\frac{1}{36}\right)=\frac{308}{9}. ] Therefore the covariance of ((U,V)) is [ \mbox{Cov}(U,V)=\mathbb{E} UV-\left(\mathbb{E} U\right)\left(\mathbb{E} V\right)=\frac{308}{9}-\frac{161}{36}\cdot7=\frac{35}{12}. ] All we need now are the standard deviations of (U) and (V) to calculate the correlation coefficient (omitted).

We will do a continuous example so that you can see how it works.

\bigskip

```{example}
Let us find the covariance of the variables \((X,Y)\) from Example
\@ref(ex:joint-pdf). The expected value of \(X\) is 
\[ \mathbb{E}
X=\int_{0}^{1}x\cdot\frac{6}{5}\left(x+\frac{1}{3}\right)\mathrm{d}
x=\left.\frac{2}{5}x^{3}+\frac{1}{5}x^{2}\right|_{x=0}^{1}=\frac{3}{5},
\] 
and the expected value of \(Y\) is 
\[ \mathbb{E}
Y=\int_{0}^{1}y\cdot\frac{6}{5}\left(\frac{1}{2}+y^{2}\right)\mathrm{d}
x=\left.\frac{3}{10}y^{2}+\frac{3}{20}y^{4}\right|_{y=0}^{1}=\frac{9}{20}.
\] 
Finally, the expected value of \(XY\) is
\begin{eqnarray*}
\mathbb{E} XY & = & \int_{0}^{1}\int_{0}^{1}xy\,\frac{6}{5}\left(x+y^{2}\right)\mathrm{d} x\,\mathrm{d} y,\\
 & = & \int_{0}^{1}\left.\left(\frac{2}{5}x^{3}y+\frac{3}{5}x^{2}y^{3}\right)\right|_{x=0}^{1}\mathrm{d} y,\\
 & = & \int_{0}^{1}\left(\frac{2}{5}y+\frac{3}{5}y^{3}\right)\mathrm{d} y,\\
 & = & \frac{1}{5}+\frac{3}{20},
\end{eqnarray*}
which is 7/20. Therefore the covariance of \((X,Y)\) is
\[
\mbox{Cov}(X,Y)=\frac{7}{20}-\left(\frac{3}{5}\right)\left(\frac{9}{20}\right)=\frac{2}{25}.
\]

How to do it with R

There are not any specific functions in the prob package [@prob] designed for multivariate expectation. This is not a problem, though, because it is easy enough to do expectation the long way -- with column operations. We just need to keep the definition in mind. For instance, we may compute the covariance of ((U,V)) from Example \@ref(ex:max-sum-dice-covariance).

Eu <- sum(S$U*S$probs)
Ev <- sum(S$V*S$probs)
Euv <- sum(S$U*S$V*S$probs)
Euv - Eu * Ev

Compare this answer to what we got in Example \@ref(ex:max-sum-dice-covariance).

To do the continuous case we could use the computer algebra utilities of Yacas and the associated R package Ryacas [@Ryacas]. See Section \@ref(sub-bivariate-transf-r) for another example where the Ryacas package appears.

Conditional Distributions {#sec-conditional-distributions}

If (x\in S_{X}) is such that (f_{X}(x)>0), then we define the conditional density of (Y|\, X=x), denoted (f_{Y|x}), by \begin{equation} f_{Y|x}(y|x)=\frac{f_{X,Y}(x,y)}{f_{X}(x)},\quad y\in S_{Y}. \end{equation} We define (f_{X|y}) in a similar fashion.

BLANK

\bigskip

Let the joint PMF of \(X\) and \(Y\) be given by
\[
f_{X,Y}(x,y) =
\]

Bayesian Connection

Conditional distributions play a fundamental role in Bayesian probability and statistics. There is a parameter (\theta) which is of primary interest, and about which we would like to learn. But rather than observing (\theta) directly, we instead observe a random variable (X) whose probability distribution depends on (\theta). Using the information we provided by (X,) we would like to update the information that we have about (\theta).

Our initial beliefs about (\theta) are represented by a probability distribution, called the prior distribution, denoted by (\pi). The PDF (f_{X|\theta}) is called the likelihood function, also called the likelihood of (X) conditional on (\theta). Given an observation (X=x), we would like to update our beliefs (\pi) to a new distribution, called the posterior distribution of (\theta) given the observation (X=x), denoted (\pi_{\theta|x}). It may seem a mystery how to obtain (\pi_{\theta|x}) based only on the information provided by (\pi) and (f_{X|\theta}), but it should not be. We have already studied this in Section \@ref(sec-bayes-rule) where it was called Bayes' Rule: \begin{equation} \pi(\theta|x)=\frac{\pi(\theta)\, f(x|\theta)}{\int\pi(u)\, f(x|u)\mathrm{d} u}. \end{equation} Compare the above expression to Equation \eqref{eq-bayes-rule}.

\bigskip

Suppose the parameter \(\theta\) is the \(\mathbb{P}(\mbox{Heads})\)
for a biased coin. It could be any value from 0 to 1. Perhaps we have
some prior information about this coin, for example, maybe we have
seen this coin before and we have reason to believe that it shows
Heads less than half of the time. Suppose that we represent our
beliefs about \(\theta\) with a
\(\mathsf{beta}(\mathtt{shape1}=1,\,\mathtt{shape2}=3)\) prior
distribution, that is, we assume \[
\theta\sim\pi(\theta)=3(1-\theta)^{2},\quad 0 < \theta < 1.  \] To
learn more about \(\theta\), we will do what is natural: flip the
coin. We will observe a random variable \(X\) which takes the value
\(1\) if the coin shows Heads, and 0 if the coin shows Tails. Under
these circumstances, \(X\) will have a Bernoulli distribution, and in
particular,
\(X|\theta\sim\mathsf{binom}(\mathtt{size}=1,\,\mathtt{prob}=\theta)\):
\[ f_{X|\theta}(x|\theta)=\theta^{x}(1-\theta)^{1-x},\quad x=0,1.  \]
Based on the observation \(X=x\), we will update the prior
distribution to the posterior distribution, and we will do so with
Bayes' Rule: it says
\begin{eqnarray*}
\pi(\theta|x) & \propto & f(x|\theta) \, \pi(\theta),\\
 & = & \theta^{x}(1-\theta)^{1-x}\cdot3(1-\theta)^{2},\\
 & = & 3\,\theta^{x}(1-\theta)^{3-x},\quad 0 < \theta < 1,
\end{eqnarray*}
where the constant of proportionality is given by \[ \int3\,
u^{x}(1-u)^{3-x}\mathrm{d} u=\int3\,
u^{(1+x)-1}(1-u)^{(4-x)-1}\mathrm{d}
u=3\,\frac{\Gamma(1+x)\Gamma(4-x)}{\Gamma[(1+x)+(4-x)]}, \] the
integral being calculated by inspection of the formula for a
\(\mathsf{beta}(\mathtt{shape1}=1+x,\,\mathtt{shape2}=4-x)\)
distribution. That is to say, our posterior distribution is precisely
\[
\theta|x\sim\mathsf{beta}(\mathtt{shape1}=1+x,\,\mathtt{shape2}=4-x).
\] The Bayesian statistician uses the posterior distribution for all
matters concerning inference about \(\theta\).

\bigskip

```{block, type="remark"} We usually do not restrict ourselves to the observation of only one (X) conditional on (\theta). In fact, it is common to observe an entire sample (X_{1}), (X_{2}),...,(X_{n}) conditional on (\theta) (which itself is often multidimensional). Do not be frightened, however, because the intuition is the same. There is a prior distribution (\pi(\theta)), a likelihood (f(x_{1},x_{2},\ldots,x_{n}|\theta)), and a posterior distribution (\pi(\theta|x_{1},x_{2},\ldots,x_{n})). Bayes' Rule states that the relationship between the three is [ \pi(\theta|x_{1},x_{2},\ldots,x_{n})\propto\pi(\theta)\, f(x_{1},x_{2},\ldots,x_{n}|\theta), ] where the constant of proportionality is (\int\pi(u)\, f(x_{1},x_{2},\ldots,x_{n}|u)\,\mathrm{d} u).

Any good textbook on
Bayesian Statistics will explain these notions in detail; to the
interested reader I recommend Gelman [@Gelman2004] or Lee
[@Lee1997].


## Independent Random Variables {#sec-independent-random-variables}

### Independent Random Variables {#sub-independent-random-variables}


We recall from Chapter \@ref(cha-probability) that the events \(A\) and \(B\) are
said to be independent when
\begin{equation}
\mathbb{P}(A\cap B)=\mathbb{P}(A)\mathbb{P}(B).
\end{equation}
If it happens that
\begin{equation}
\mathbb{P}(X=x,Y=y)=\mathbb{P}(X=x)\mathbb{P}(Y=y),\quad \mbox{for every }x\in S_{X},\ y\in S_{Y},
\end{equation}
then we say that \(X\) and \(Y\) are *independent random
variables*. Otherwise, we say that \(X\) and \(Y\) are
*dependent*. Using the PMF notation from above, we see that
independent discrete random variables satisfy
\begin{equation}
f_{X,Y}(x,y)=f_{X}(x)f_{Y}(y)\quad \mbox{for every }x\in S_{X},\ y\in S_{Y}.
\end{equation}
Continuing the reasoning, given two continuous random variables \(X\)
and \(Y\) with joint PDF \(f_{X,Y}\) and respective marginal PDFs
\(f_{X}\) and \(f_{Y}\) that are supported on the sets \(S_{X}\) and
\(S_{Y}\), if it happens that
\begin{equation}
f_{X,Y}(x,y)=f_{X}(x)f_{Y}(y)\quad \mbox{for every }x\in S_{X},\ y\in S_{Y},
\end{equation}
then we say that \(X\) and \(Y\) are independent.

\bigskip

```{example}
In Example \@ref(ex:toss-two-dice-joint-pmf) we considered the random experiment of
rolling a fair die twice. There we found the joint PMF to be 
\[
f_{X,Y}(x,y)=\frac{1}{36},\quad x=1,\ldots,6,\ y=1,\ldots,6,
\] 
and we
found the marginal PMFs \(f_{X}(x)=1/6\), \(x=1,2,\ldots,6\), and
\(f_{Y}(y)=1/6\), \(y=1,2,\ldots,6\). Therefore in this experiment
\(X\) and \(Y\) are independent since for every \(x\) and \(y\) in the
joint support the joint PMF satisfies 
\[
f_{X,Y}(x,y)=\frac{1}{36}=\left(\frac{1}{6}\right)\left(\frac{1}{6}\right)=f_{X}(x)\,
f_{Y}(y). 
\]

\bigskip

In Example \@ref(ex:max-sum-two-dice) we considered the same experiment but
different random variables \(U\) and \(V\). We can prove that \(U\)
and \(V\) are not independent if we can find a single pair \((u,v)\)
where the independence equality does not hold. There are many such
pairs. One of them is \((6,12)\): 
\[
f_{U,V}(6,12)=\frac{1}{36}\neq\left(\frac{11}{36}\right)\left(\frac{1}{36}\right)=f_{U}(6)\,
f_{V}(12).  
\]

Independent random variables are very useful to the mathematician. They have many, many, tractable properties. We mention some of the more important ones.

\bigskip

```{proposition, label="indep-implies-prodexpect"} If (X) and (Y) are independent, then for any functions (u) and (v), \begin{equation} \mathbb{E}\left(u(X)v(Y)\right)=\left(\mathbb{E} u(X)\right)\left(\mathbb{E} v(Y)\right). \end{equation}

\bigskip

```{proof}
This is straightforward from the definition.
\begin{eqnarray*}
\mathbb{E}\left(u(X)v(Y)\right) & = & \iint\, u(x)v(y)\, f_{X,Y}(x,y)\,\mathrm{d} x\mathrm{d} y\\
 & = & \iint\, u(x)v(y)\, f_{X}(x)\, f_{Y}(y)\,\mathrm{d} x\mathrm{d} y\\
 & = & \int u(x)\, f_{X}(x)\,\mathrm{d} x\ \int v(y)\, f_{Y}(y)\,\mathrm{d} y
\end{eqnarray*}
and this last quantity is exactly \(\left(\mathbb{E} u(X)\right)\left(\mathbb{E} v(Y)\right)\).

Now that we have Proposition \@ref(pro:indep-implies-prodexpect) we mention a corollary that will help us later to quickly identify those random variables which are not independent.

\bigskip

```{corollary, label="cor-indep-implies-uncorr"} If (X) and (Y) are independent, then (\mbox{Cov}(X,Y)=0), and consequently, (\mbox{Corr}(X,Y)=0).

\bigskip

```{proof}
When \(X\) and \(Y\) are independent then \(\mathbb{E} XY=\mathbb{E}
X\,\mathbb{E} Y\). And when the covariance is zero the numerator of
the correlation is 0.

\bigskip

```{block, type="remark", label="rem-cov0-not-imply-indep"} Unfortunately, the converse of Corollary \@ref(cor:indep-implies-uncorr) is not true. That is, there are many random variables which are dependent yet their covariance and correlation is zero.

For more details, see Casella and Berger
[@Casella2002]. Proposition \@ref(pro:indep-implies-prodexpect) is
useful to us and we will receive mileage out of it, but there is
another fact which will play an even more important
role. Unfortunately, the proof is beyond the techniques presented
here. The inquisitive reader should consult Casella and Berger
[@Casella2002], Resnick [@Resnick1999], *etc*.

\bigskip

```{block, type="fact", label="fac-indep-then-function-indep"}
If \(X\) and \(Y\) are independent,
then \(u(X)\) and \(v(Y)\) are independent for any functions \(u\) and
\(v\).

Combining Independent Random Variables {#sub-combining-independent-random}

Another important corollary of Proposition \@ref(pro:indep-implies-prodexpect) will allow us to find the distribution of sums of random variables.

\bigskip

If \(X\) and \(Y\) are independent, then the moment generating
function of \(X+Y\) is
\begin{equation}
M_{X+Y}(t)=M_{X}(t)\cdot M_{Y}(t).
\end{equation}

\bigskip

Choose \(u(x)=\mathrm{e}^{x}\) and \(v(y)=\mathrm{e}^{y}\) in
Proposition \@ref(pro:indep-implies-prodexpect), and remember the identity
\(\mathrm{e}^{t(x+y)}=\mathrm{e}^{tx}\,\mathrm{e}^{ty}\).

Let us take a look at some examples of the corollary in action.

\bigskip

Let \(X\sim\mathsf{binom}(\mathtt{size}=n_{1},\,\mathtt{prob}=p)\) and
\(Y\sim\mathsf{binom}(\mathtt{size}=n_{2},\,\mathtt{prob}=p)\) be
independent. Then \(X+Y\) has MGF \[ M_{X+Y}(t)=M_{X}(t)\,
M_{Y}(t)=\left(q+p\mathrm{e}^{t}\right)^{n_{1}}\left(q+p\mathrm{e}^{t}\right)^{n_{2}}=\left(q+p\mathrm{e}^{t}\right)^{n_{1}+n_{2}},
\] which is the MGF of a
\(\mathsf{binom}(\mathtt{size}=n_{1}+n_{2},\,\mathtt{prob}=p)\)
distribution. Therefore,
\(X+Y\sim\mathsf{binom}(\mathtt{size}=n_{1}+n_{2},\,\mathtt{prob}=p)\).

\bigskip

Let
\(X\sim\mathsf{norm}(\mathtt{mean}=\mu_{1},\,\mathtt{sd}=\sigma_{1})\)
and
\(Y\sim\mathsf{norm}(\mathtt{mean}=\mu_{2},\,\mathtt{sd}=\sigma_{2})\)
be independent. Then \(X+Y\) has MGF \[ M_{X}(t)\,
M_{Y}(t)=\exp\left\{ \mu_{1}t+t^{2}\sigma_{1}^{2}/2\right\}
\exp\left\{ \mu_{2}t+t^{2}\sigma_{2}^{2}/2\right\} =\exp\left\{
\left(\mu_{1}+\mu_{2}\right)t+t^{2}\left(\sigma_{1}^{2}+\sigma_{2}^{2}\right)/2\right\}
, \] which is the MGF of a
\(\mathsf{norm}\left(\mathtt{mean}=\mu_{1}+\mu_{2},\,\mathtt{sd}=\sqrt{\sigma_{1}^{2}+\sigma_{2}^{2}}\right)\)
distribution.

Even when we cannot use the MGF trick to identify the exact distribution of a linear combination of random variables, we can still say something about its mean and variance.

\bigskip

```{proposition, label="mean-sd-lin-comb-two"} Let (X_{1}) and (X_{2}) be independent with respective population means (\mu_{1}) and (\mu_{2}) and population standard deviations (\sigma_{1}) and (\sigma_{2}). For given constants (a_{1}) and (a_{2}), define (Y=a_{1}X_{1}+a_{2}X_{2}). Then the mean and standard deviation of (Y) are given by the formulas \begin{equation} \mu_{Y}=a_{1}\mu_{1}+a_{2}\mu_{2},\quad \sigma_{Y}=\left(a_{1}^{2}\sigma_{1}^{2}+a_{2}^{2}\sigma_{2}^{2}\right)^{1/2}. \end{equation}

\bigskip

```{proof}
We use Proposition \@ref(pro:expectation-properties) to see 
\[ 
\mathbb{E}
Y=\mathbb{E}\left(a_{1}X_{1}+a_{2}X_{2}\right)=a_{1}\mathbb{E}X_{1}+a_{2}\mathbb{E} X_{2}=a_{1}\mu_{1}+a_{2}\mu_{2}.
\]
For the
standard deviation, we will find the variance and take the square root
at the end. And to calculate the variance we will first compute
\(\mathbb{E} Y^{2}\) with an eye toward using the identity
\(\sigma_{Y}^{2}=\mathbb{E} Y^{2}-\left(\mathbb{E} Y\right)^{2}\) as a
final step.
\[ \mathbb{E}
Y^{2}=\mathbb{E}\left(a_{1}X_{1}+a_{2}X_{2}\right)^{2}=\mathbb{E}\left(a_{1}^{2}X_{1}^{2}+a_{2}^{2}X_{2}^{2}+2a_{1}a_{2}X_{1}X_{2}\right).
\] 
Using linearity of expectation the \(\mathbb{E}\) distributes
through the sum. Now \(\mathbb{E}
X_{i}^{2}=\sigma_{i}^{2}+\mu_{i}^{2}\), for \(i=1\) and 2 and
\(\mathbb{E} X_{1}X_{2}=\mathbb{E} X_{1}\mathbb{E}
X_{2}=\mu_{1}\mu_{2}\) because of independence. Thus
\begin{eqnarray*}
\mathbb{E} Y^{2} & = & a_{1}^{2}(\sigma_{1}^{2}+\mu_{1}^{2})+a_{2}^{2}(\sigma_{2}^{2}+\mu_{2}^{2})+2a_{1}a_{2}\mu_{1}\mu_{2},\\
 & = & a_{1}^{2}\sigma_{1}^{2}+a_{2}^{2}\sigma_{2}^{2}+\left(a_{1}^{2}\mu_{1}^{2}+a_{2}^{2}\mu_{2}^{2}+2a_{1}a_{2}\mu_{1}\mu_{2}\right).
\end{eqnarray*}
But notice that the expression in the parentheses is exactly
\(\left(a_{1}\mu_{1}+a_{2}\mu_{2}\right)^{2}=\left(\mathbb{E}
Y\right)^{2}\), so the proof is complete.

Exchangeable Random Variables {#sec-exchangeable-random-variables}

Two random variables (X) and (Y) are said to be exchangeable if their joint CDF is a symmetric function of its arguments: \begin{equation} F_{X,Y}(x,y)=F_{X,Y}(y,x),\quad \mbox{for all }(x,y)\in\mathbb{R}^{2}. \end{equation} When the joint density (f) exists, we may equivalently say that (X) and (Y) are exchangeable if (f(x,y)=f(y,x)) for all ((x,y)).

Exchangeable random variables exhibit symmetry in the sense that a person may exchange one variable for the other with no substantive changes to their joint random behavior. While independence speaks to a lack of influence between the two variables, exchangeability aims to capture the symmetry between them.

\bigskip

Let \(X\) and \(Y\) have joint PDF
\[
f_{X,Y}(x,y)=(1+\alpha)\lambda^{2}\mathrm{e}^{-\lambda(x+y)}+\alpha(2\lambda)^{2}\mathrm{e}^{-2\lambda(x+y)}-2\alpha\lambda^{2}\left(\mathrm{e}^{-\lambda(2x+y)}+\mathrm{e}^{-\lambda(x+2y)}\right).
\]
It is straightforward and tedious to check that \(\iint f=1\). We may
see immediately that \(f_{X,Y}(x,y)=f_{X,Y}(y,x)\) for all \((x,y)\),
which confirms that \(X\) and \(Y\) are exchangeable. Here, \(\alpha\)
is said to be an association parameter.

The example above is one from the Farlie-Gumbel-Morgenstern family of distributions; see [@Kotz2000].

\bigskip

```{example label="binom-exchangeable"} Suppose (X) and (Y) are IID (\mathsf{binom}(\mathtt{size}=n,\,\mathtt{prob}=p)). Then their joint PMF is \begin{eqnarray} f_{X,Y}(x,y) & = & f_{X}(x)f_{Y}(y)\ & = & {n \choose x}\, p^{x}(1-p)^{n-x}\,{n \choose y}\, p^{y}(1-p)^{n-y},\ & = & {n \choose x}{n \choose y}\, p^{x+y}(1-p)^{2n-(x+y)}, \end{eqnarray} and the value is the same if we exchange (x) and (y). Therefore ((X,Y)) are exchangeable.

Looking at Example \@ref(ex:binom-exchangeable) more closely we see that the
fact that \((X,Y)\) are exchangeable has nothing to do with the
\(\mathsf{binom}(\mathtt{size}=n,\,\mathtt{prob}=p)\) distribution; it
only matters that they are independent (so that the joint PDF factors)
and they are identically distributed (in which case we may swap
letters to no effect). We could just have easily used any other
marginal distribution. We will take this as a proof of the following
proposition.

\bigskip

```{proposition}
If \(X\) and \(Y\) are IID (with common marginal distribution \(F\))
then \(X\) and \(Y\) are exchangeable.

Exchangeability thus contains IID as a special case.

The Bivariate Normal Distribution {#sec-the-bivariate-normal}

The bivariate normal PDF is given by the unwieldy formula \begin{multline} f_{X,Y}(x,y)=\frac{1}{2\pi\,\sigma_{X}\sigma_{Y}\sqrt{1-\rho^{2}}}\exp\left{ -\frac{1}{2(1-\rho^{2})}\left[\left(\frac{x-\mu_{X}}{\sigma_{X}}\right)^{2}+\cdots\right.\right.\ \left.\left.\cdots+2\rho\left(\frac{x-\mu_{X}}{\sigma_{X}}\right)\left(\frac{y-\mu_{Y}}{\sigma_{Y}}\right)+\left(\frac{y-\mu_{Y}}{\sigma_{Y}}\right)^{2}\right]\right} , \end{multline} for ((x,y)\in\mathbb{R}^{2}). We write ((X,Y)\sim\mathsf{mvnorm}(\mathtt{mean}=\upmu,\,\mathtt{sigma}=\Sigma)), where \begin{equation} \upmu=(\mu_{X},\,\mu_{Y})^{T},\quad \sum=\left( \begin{array}{cc} \sigma_{X}^{2} & \rho\sigma_{X}\sigma_{Y}\ \rho\sigma_{X}\sigma_{Y} & \sigma_{Y}^{2} \end{array} \right). \end{equation} See Appendix \@ref(cha-mathematical-machinery). The vector notation allows for a more compact rendering of the joint PDF: \begin{equation} f_{X,Y}(\mathbf{x})=\frac{1}{2\pi\left|\Sigma\right|^{1/2}}\exp\left{ -\frac{1}{2}\left(\mathbf{x}-\upmu\right)^{\top}\Sigma^{-1}\left(\mathbf{x}-\upmu\right)\right} , \end{equation} where in an abuse of notation we have written (\mathbf{x}) for ((x,y)). Note that the formula only holds when (\rho\neq\pm1).

\bigskip

```{block, type="remark"} In Remark \@ref(rem-cov0-not-imply-indep) we noted that just because random variables are uncorrelated it does not necessarily mean that they are independent. However, there is an important exception to this rule: the bivariate normal distribution. Indeed, ((X,Y)\sim\mathsf{mvnorm}(\mathtt{mean}=\upmu,\,\mathtt{sigma}=\Sigma)) are independent if and only if (\rho=0).

\bigskip

```{block, type="remark"}
Inspection of the joint PDF shows that if \(\mu_{X}=\mu_{Y}\) and
\(\sigma_{X}=\sigma_{Y}\) then \(X\) and \(Y\) are exchangeable.

The bivariate normal MGF is \begin{equation} M_{X,Y}(\mathbf{t})=\exp\left(\upmu^{\top}\mathbf{t}+\frac{1}{2}\mathbf{t}^{\top}\Sigma\mathbf{t}\right), \end{equation} where (\mathbf{t}=(t_{1},t_{2})).

The bivariate normal distribution may be intimidating at first but it turns out to be very tractable compared to other multivariate distributions. An example of this is the following fact about the marginals.

\bigskip

```{block, type="fact"} If ((X,Y)\sim\mathsf{mvnorm}(\mathtt{mean}=\upmu,\,\mathtt{sigma}=\Sigma)) then \begin{equation} X\sim\mathsf{norm}(\mathtt{mean}=\mu_{X},\,\mathtt{sd}=\sigma_{X})\mbox{ and }Y\sim\mathsf{norm}(\mathtt{mean}=\mu_{Y},\,\mathtt{sd}=\sigma_{Y}). \end{equation}

From this we immediately get that \(\mathbb{E} X=\mu_{X}\) and
\(\mbox{Var}(X)=\sigma_{X}^{2}\) (and the same is true for \(Y\) with
the letters switched). And it should be no surprise that the
correlation between \(X\) and \(Y\) is exactly
\(\mbox{Corr}(X,Y)=\rho\).

\bigskip

```{proposition, label="mvnorm-cond-dist"}
The conditional distribution of \(Y|\, X=x\)
is \(\mathsf{norm}(\mathtt{mean} = \mu_{Y|x}, \, \mathtt{sd} =
\sigma_{Y|x})\), where
\begin{equation}
\mu_{Y|x}=\mu_{Y}+\rho\frac{\sigma_{Y}}{\sigma_{X}}\left(x-\mu_{X}\right),\mbox{ and }\sigma_{Y|x}=\sigma_{Y}\sqrt{1-\rho^{2}}.
\end{equation}

There are a few things to note about Proposition \@ref(pro:mvnorm-cond-dist) which will be important in Chapter \@ref(cha-simple-linear-regression). First, the conditional mean of (Y|x) is linear in (x), with slope \begin{equation} \label{eq-population-slope-slr} \rho\,\frac{\sigma_{Y}}{\sigma_{X}}. \end{equation} Second, the conditional variance of (Y|x) is independent of (x).

How to do it with R

The multivariate normal distribution is implemented in both the mvtnorm package [@mvtnorm] and the mnormt package [@mnormt]. We use the mvtnorm package in this book simply because it is a dependency of another package used in the book.

The mvtnorm package has functions dmvnorm and rmvnorm for the PDF and to generate random vectors, respectively. Let us get started with a graph of the bivariate normal PDF. We can make the plot with the following code, where the workhorse is the persp function in base R.

Another way to do this is with the curve3d function in the emdbook package [@emdbook]. It looks like this:

library("emdbook"); library("mvtnorm") # note: the order matters
mu <- c(0,0); sigma <- diag(2)
f <- function(x,y) dmvnorm(c(x,y), mean = mu, sigma = sigma)
curve3d(f(x,y), from=c(-3,-3), to=c(3,3), theta=-30, phi=30)

The code above is slightly shorter than that using persp and is easier to understand. One must be careful, however. If the library calls are swapped then the code will not work because both packages emdbook and mvtnorm have a function called dmvnorm; one must load them to the search path in the correct order or R will use the wrong one (the arguments are named differently and the underlying algorithms are different).

x <- y <- seq(from = -3, to = 3, length.out = 30)
f <- function(x,y) dmvnorm(cbind(x,y), mean = c(0,0), sigma = diag(2))
z <- outer(x, y, FUN = f)
persp(x, y, z, theta = -30, phi = 30, ticktype = "detailed")

We chose the standard bivariate normal, (\mathsf{mvnorm}(\mathtt{mean}=\mathbf{0},\,\mathtt{sigma}=\mathbf{I})), to display.

x <- y <- seq(from = -3, to = 3, length.out = 30)
f <- function(x,y) dmvnorm(cbind(x,y), mean = c(0,0), sigma = diag(2))
z <- outer(x, y, FUN = f)
persp(x, y, z, theta = -30, phi = 30, ticktype = "detailed")

(ref:cap-mvnorm-pdf) \small A graph of a bivariate normal PDF.

Bivariate Transformations of Random Variables {#sec-transformations-multivariate}

We studied in Section \@ref(sec-functions-of-continuous) how to find the PDF of (Y=g(X)) given the PDF of (X). But now we have two random variables (X) and Y, with joint PDF (f_{X,Y}), and we would like to consider the joint PDF of two new random variables \begin{equation} U=g(X,Y)\quad \mbox{and}\quad V=h(X,Y), \end{equation} where (g) and (h) are two given functions, typically "nice" in the sense of Appendix \@ref(sec-multivariable-calculus).

Suppose that the transformation ((x,y)\longmapsto(u,v)) is one-to-one. Then an inverse transformation (x=x(u,v)) and (y=y(u,v)) exists, so let (\partial(x,y)/\partial(u,v)) denote the Jacobian of the inverse transformation. Then the joint PDF of ((U,V)) is given by \begin{equation} f_{U,V}(u,v)=f_{X,Y}\left[x(u,v),\, y(u,v)\right]\left|\frac{\partial(x,y)}{\partial(u,v)}\right|, \end{equation} or we can rewrite more shortly as \begin{equation} \label{eq-biv-trans-pdf-short} f_{U,V}(u,v)=f_{X,Y}(x,y)\left|\frac{\partial(x,y)}{\partial(u,v)}\right|. \end{equation} Take a moment and compare Equation \eqref{eq-biv-trans-pdf-short} to Equation \eqref{eq-univ-trans-pdf-short}. Do you see the connection?

\bigskip

```{block, type="remark"} It is sometimes easier to postpone solving for the inverse transformation (x=x(u,v)) and (y=y(u,v)). Instead, leave the transformation in the form (u=u(x,y)) and (v=v(x,y)) and calculate the Jacobian of the original transformation \begin{equation} \frac{\partial(u,v)}{\partial(x,y)}=\left|\begin{array}{cc} \frac{\partial u}{\partial x} & \frac{\partial u}{\partial y}\ \frac{\partial v}{\partial x} & \frac{\partial v}{\partial y}\end{array}\right|=\frac{\partial u}{\partial x}\frac{\partial v}{\partial y}-\frac{\partial u}{\partial y}\frac{\partial v}{\partial x}. \end{equation} Once this is known, we can get the PDF of ((U,V)) by \begin{equation} f_{U,V}(u,v)=f_{X,Y}(x,y)\left|\frac{1}{\frac{\partial(u,v)}{\partial(x,y)}}\right|. \end{equation} In some cases there will be a cancellation and the work will be lot shorter. Of course, it is not always true that \begin{equation} \label{eq-biv-jacob-recip} \frac{\partial(x,y)}{\partial(u,v)}=\frac{1}{\frac{\partial(u,v)}{\partial(x,y)}}, \end{equation} but for the well-behaved examples that we will see in this book it works just fine... do you see the connection between Equations \eqref{eq-biv-jacob-recip} and \eqref{eq-univ-jacob-recip}?

\bigskip

```{example}
Let
\((X,Y)\sim\mathsf{mvnorm}(\mathtt{mean}=\mathbf{0}_{2\times1},\,\mathtt{sigma}=\mathbf{I}_{2\times2})\)
and consider the transformation
\begin{align*}
U= & \ 3X+4Y,\\
V= & \ 5X+6Y.
\end{align*}
We can solve the system of equations to find the inverse
transformations; they are
\begin{align*}
X= & -3U+2V,\\
Y= & \ \frac{5}{2}U-\frac{3}{2}V,
\end{align*}
in which case the Jacobian of the inverse transformation is
\[ \begin{vmatrix} -3 & 2\\ \frac{5}{2} & -\frac{3}{2} \end{vmatrix} = -3\left(-\frac{3}{2}\right)-2\left(\frac{5}{2}\right) = -\frac{1}{2}.\]
As \((x,y)\) traverses \(\mathbb{R}^{2}\), so too does \((u,v)\). Since the joint PDF of \((X,Y)\) is
\[
f_{X,Y}(x,y)=\frac{1}{2\pi}\exp\left\{ -\frac{1}{2}\left(x^{2}+y^{2}\right)\right\} ,\quad (x,y)\in\mathbb{R}^{2},
\]
we get that the joint PDF of \((U,V)\) is
\begin{equation}
\label{eq-biv-norm-hidden}
f_{U,V}(u,v)=\frac{1}{2\pi}\exp\left\{ -\frac{1}{2}\left[\left(-3u+2v\right)^{2}+\left(\frac{5u-3v}{2}\right)^{2}\right]\right\} \cdot\frac{1}{2},\quad (u,v)\in\mathbb{R}^{2}.
\end{equation}

\bigskip

```{block, type="remark"} It may not be obvious, but Equation \eqref{eq-biv-norm-hidden} is the PDF of a (\mathsf{mvnorm}) distribution. For a more general result see Theorem \@ref(thm:mvnorm-dist-matrix-prod).

#### How to do it with R {#sub-bivariate-transf-r}

It is possible to do the computations above in R with the
`Ryacas` package. The package is an interface to the open-source
computer algebra system, "Yacas". The user installs Yacas, then
employs `Ryacas` to submit commands to Yacas, after which the output
is displayed in the R console.

There are not yet any examples of Yacas in this book, but there are
online materials to help the interested reader: see <http://code.google.com/p/ryacas/> to get
started.

## Remarks for the Multivariate Case {#sec-remarks-for-the-multivariate}


There is nothing spooky about \(n\geq3\) random variables. We just
have a whole bunch of them: \(X_{1}\), \(X_{2}\),..., \(X_{n}\), which
we can shorten to
\(\mathbf{X}=(X_{1},X_{2},\ldots,X_{n})^{\mathrm{T}}\) to make the
formulas prettier (now may be a good time to check out Appendix
\@ref(sec-linear-algebra)). For \(\mathbf{X}\) supported on the set
\(S_{\mathbf{X}}\), the joint PDF \(f_{\mathbf{X}}\) (if it exists)
satisfies
\begin{equation}
f_{\mathbf{X}}(\mathbf{x})>0,\quad \mbox{for }\mathbf{x}\in S_{\mathbf{X}},
\end{equation}
and
\begin{equation}
\int\!\!\!\int\cdots\int f_{\mathbf{X}}(\mathbf{x})\,\mathrm{d} x_{1}\mathrm{d} x_{2}\cdots\mathrm{d} x_{n}=1,
\end{equation}
or even shorter: \(\int
f_{\mathbf{X}}(\mathbf{x})\,\mathrm{d}\mathbf{x}=1\). The joint CDF
\(F_{\mathbf{X}}\) is defined by
\begin{equation}
F_{\mathbf{X}}(\mathbf{x})=\mathbb{P}(X_{1}\leq x_{1},\, X_{2}\leq x_{2},\ldots,\, X_{n}\leq x_{n}),
\end{equation}
for \(\mathbf{x}\in\mathbb{R}^{n}\). The expectation of a function
\(g(\mathbf{X})\) is defined just as we would imagine:
\begin{equation}
\mathbb{E} g(\mathbf{X})=\int g(\mathbf{x})\, f_{\mathbf{X}}(\mathbf{x})\,\mathrm{d}\mathbf{x}.
\end{equation}
provided the integral exists and is finite. And the moment generating
function in the multivariate case is defined by
\begin{eqnarray} 
M_{\mathbf{X}}(\mathbf{t}) & = & \mathbb{E}\exp\left\{ \mathbf{t}^{\mathrm{T}}\mathbf{X}\right\},
\end{eqnarray}
whenever the integral exists and is finite for all \(\mathbf{t}\) in a
neighborhood of \(\mathbf{0}_{\mathrm{n}\times1}\) (note that
\(\mathbf{t}^{\mathrm{T}}\mathbf{X}\) is shorthand for
\(t_{1}X_{1}+t_{2}X_{2}+\cdots+t_{n}X_{n}\)). The only difference in
any of the above for the discrete case is that integrals are replaced
by sums.

Marginal distributions are obtained by integrating out remaining
variables from the joint distribution. And even if we are given all of
the univariate marginals it is not enough to determine the joint
distribution uniquely.

We say that \(X_{1}\), \(X_{2}\), ..., \(X_{n}\) are *mutually
independent* if their joint PDF factors into the product of the
marginals
\begin{equation}
f_{\mathbf{X}}(\mathbf{x})=f_{X_{1}}(x_{1})\, f_{X_{2}}(x_{2})\,\cdots\, f_{X_{n}}(x_{n}),
\end{equation}
for every \(\mathbf{x}\) in their joint support \(S_{\mathbf{X}}\),
and we say that \(X_{1}\), \(X_{2}\), ..., \(X_{n}\) are
*exchangeable* if their joint PDF (or CDF) is a symmetric function of
its \(n\) arguments, that is, if
\begin{equation}
f_{\mathbf{X}}(\mathbf{x^{\ast}})=f_{\mathbf{X}}(\mathbf{x}),
\end{equation}
for any reordering (permutation) \(\mathbf{x^{\ast}}\) of the elements of \(\mathbf{x}=(x_{1},x_{2},\ldots,x_{n})\) in the joint support.

\bigskip

```{proposition, label="mean-sd-lin-comb"}
Let \(X_{1}\), \(X_{2}\), ..., \(X_{n}\) be
independent with respective population means \(\mu_{1}\), \(\mu_{2}\),
..., \(\mu_{n}\) and standard deviations \(\sigma_{1}\),
\(\sigma_{2}\), ..., \(\sigma_{n}\). For given constants \(a_{1}\),
\(a_{2}\), ...,\(a_{n}\) define \(Y=\sum_{i=1}^{n}a_{i}X_{i}\). Then
the mean and standard deviation of \(Y\) are given by the formulas
\begin{equation}
\mu_{Y}=\sum_{i=1}^{n}a_{i}\mu_{i},\quad \sigma_{Y}=\left(\sum_{i=1}^{n}a_{i}^{2}\sigma_{i}^{2}\right)^{1/2}.
\end{equation}

\bigskip

The mean is easy: \[ \mathbb{E}
Y=\mathbb{E}\left(\sum_{i=1}^{n}a_{i}X_{i}\right)=\sum_{i=1}^{n}a_{i}\mathbb{E}
X_{i}=\sum_{i=1}^{n}a_{i}\mu_{i}.  \] The variance is not too
difficult to compute either. As an intermediate step, we calculate
\(\mathbb{E} Y^{2}\).  \[ \mathbb{E}
Y^{2}=\mathbb{E}\left(\sum_{i=1}^{n}a_{i}X_{i}\right)^{2}=\mathbb{E}\left(\sum_{i=1}^{n}a_{i}^{2}X_{i}^{2}+2\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}a_{i}a_{j}X_{i}X_{j}\right).
\] Using linearity of expectation the \(\mathbb{E}\) distributes
through the sums. Now \(\mathbb{E}
X_{i}^{2}=\sigma_{i}^{2}+\mu_{i}^{2}\) and \(\mathbb{E}
X_{i}X_{j}=\mathbb{E} X_{i}\mathbb{E} X_{j}=\mu_{i}\mu_{j}\) when
\(i\neq j\) because of independence. Thus
\begin{eqnarray*}
\mathbb{E} Y^{2} & = & \sum_{i=1}^{n}a_{i}^{2}(\sigma_{i}^{2}+\mu_{i}^{2})+2\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}a_{i}a_{j}\mu_{i}\mu_{j}\\
 & = & \sum_{i=1}^{n}a_{i}^{2}\sigma_{i}^{2}+\left(\sum_{i=1}^{n}a_{i}^{2}\mu_{i}^{2}+2\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}a_{i}a_{j}\mu_{i}\mu_{j}\right)
\end{eqnarray*}
To complete the proof, note that the expression in the parentheses is
exactly \(\left(\mathbb{E} Y\right)^{2}\), and recall the identity
\(\sigma_{Y}^{2}=\mathbb{E} Y^{2}-\left(\mathbb{E} Y\right)^{2}\).

There is a corresponding statement of Fact \@ref(fac-indep-then-function-indep) for the multivariate case. The proof is also omitted here.

```{block, type="fact"} If (\mathbf{X}) and (\mathbf{Y}) are mutually independent random vectors, then (u(\mathbf{X})) and (v(\mathbf{Y})) are independent for any functions (u) and (v).

Bruno de Finetti was a strong proponent of the subjective approach to
probability. He proved an important theorem in 1931 which illuminates
the link between exchangeable random variables and independent random
variables. Here it is in one of its simplest forms.

\bigskip

```{theorem, name="De Finetti's Theorem"}
Let \(X_{1}\), \(X_{2}\), ... be a sequence of
\(\mathsf{binom}(\mathtt{size}=1,\,\mathtt{prob}=p)\) random variables
such that \((X_{1},\ldots,X_{k})\) are exchangeable for every
\(k\). Then there exists a random variable \(\Theta\) with support
\([0,1]\) and PDF \(f_{\Theta}(\theta)\) such that
\begin{equation}
\label{eq-definetti-binary}
\mathbb{P}(X_{1}=x_{1},\ldots,\, X_{k}=x_{k})=\int_{0}^{1}\theta^{\sum x_{i}}(1-\theta)^{k-\sum x_{i}}\, f_{\Theta}(\theta)\,\mathrm{d}\theta,
\end{equation}
for all \(x_{i}=0,\,1\), \(i=1,\,2,\ldots,k\).

To get a handle on the intuitive content de Finetti's theorem, imagine that we have a bunch of coins in our pocket with each having its own unique value of (\theta=\mathbb{P}(\mbox{Heads})). We reach into our pocket and select a coin at random according to some probability -- say, (f_{\Theta}(\theta)). We take the randomly selected coin and flip it (k) times.

Think carefully: the conditional probability of observing a sequence (X_{1}=x_{1},\ldots,\, X_{k}=x_{k}), given a specific coin (\theta) would just be (\theta^{\sum x_{i}}(1-\theta)^{k-\sum x_{i}}), because the coin flips are an independent sequence of Bernoulli trials. But the coin is random, so the Theorem of Total Probability says we can get the unconditional probability (\mathbb{P}(X_{1}=x_{1},\ldots,\, X_{k}=x_{k})) by adding up terms that look like \begin{equation} \theta^{\sum x_{i}}(1-\theta)^{k-\sum x_{i}}\, f_{\Theta}(\theta), \end{equation} where we sum over all possible coins. The right-hand side of Equation \eqref{eq-definetti-binary} is a sophisticated way to denote this process.

Of course, the integral's value does not change if we jumble the (x_{i})'s, so ((X_{1},\ldots,X_{k})) are clearly exchangeable. The power of de Finetti's Theorem is that every infinite binary exchangeable sequence can be written in the above form.

The connection to subjective probability: our prior information about (\theta) corresponds to (f_{\Theta}(\theta)) and the likelihood of the sequence (X_{1}=x_{1},\ldots,\, X_{k}=x_{k}) (conditional on (\theta)) corresponds to (\theta^{\sum x_{i}}(1-\theta)^{k-\sum x_{i}}). Compare Equation \eqref{eq-definetti-binary} to Section \@ref(sec-Bayes-Rule) and Section \@ref(sec-conditional-distributions).

The multivariate normal distribution immediately generalizes from the bivariate case. If the matrix (\Sigma) is nonsingular then the joint PDF of (\mathbf{X}\sim\mathsf{mvnorm}(\mathtt{mean}=\upmu,\,\mathtt{sigma}=\Sigma)) is \begin{equation} f_{\mathbf{X}}(\mathbf{x})=\frac{1}{(2\pi)^{n/2}\left|\Sigma\right|^{1/2}}\exp\left{ -\frac{1}{2}\left(\mathbf{x}-\upmu\right)^{\top}\Sigma^{-1}\left(\mathbf{x}-\upmu\right)\right}, \end{equation} and the MGF is \begin{equation} M_{\mathbf{X}}(\mathbf{t})=\exp\left{ \upmu^{\top}\mathbf{t}+\frac{1}{2}\mathbf{t}^{\top}\Sigma\mathbf{t}\right}. \end{equation} We will need the following in Chapter \@ref(cha-multiple-linear-regression).

\bigskip

```{theorem, label="mvnorm-dist-matrix-prod"} If (\mathbf{X} \sim \mathsf{mvnorm}(\mathtt{mean} = \upmu, \, \mathtt{sigma} = \Sigma)) and (\mathbf{A}) is any matrix, then the random vector (\mathbf{Y}=\mathbf{AX}) is distributed \begin{equation} \mathbf{Y}\sim\mathsf{mvnorm}(\mathtt{mean}=\mathbf{A}\upmu,\,\mathtt{sigma}=\mathbf{A}\Sigma\mathbf{A}^{\mathrm{T}}). \end{equation}

\bigskip

```{proof}
Look at the MGF of \(\mathbf{Y}\):
\begin{eqnarray*}
M_{\mathbf{Y}}(\mathbf{t}) & = & \mathbb{E}\,\exp\left\{ \mathbf{t}^{\mathrm{T}}(\mathbf{AX})\right\} ,\\
 & = & \mathbb{E}\,\exp\left\{ (\mathbf{A}^{\mathrm{T}}\mathbf{t})^{\mathrm{T}}\mathbf{X}\right\} ,\\
 & = & \exp\left\{ \upmu^{\mathrm{T}}(\mathbf{A}^{\top}\mathbf{t})+\frac{1}{2}(\mathbf{A}^{\mathrm{T}}\mathbf{t})^{\mathrm{T}}\Sigma(\mathbf{A}^{\mathrm{T}}\mathbf{t})\right\} ,\\
 & = & \exp\left\{ \left(\mathbf{A}\upmu\right)^{\mathrm{T}}\mathbf{t}+\frac{1}{2}\mathbf{t}^{\mathrm{T}}\left(\mathbf{A}\Sigma\mathbf{A}^{\mathrm{T}}\right)\mathbf{t}\right\},
\end{eqnarray*}
and the last expression is the MGF of an
\(\mathsf{mvnorm}(\mathtt{mean}=\mathbf{A}\upmu,\,\mathtt{sigma}=\mathbf{A}\Sigma\mathbf{A}^{\mathrm{T}})\)
distribution.

The Multinomial Distribution {#sec-multinomial}

We sample (n) times, with replacement, from an urn that contains balls of (k) different types. Let (X_{1}) denote the number of balls in our sample of type 1, let (X_{2}) denote the number of balls of type 2, ..., and let (X_{k}) denote the number of balls of type (k). Suppose the urn has proportion (p_{1}) of balls of type 1, proportion (p_{2}) of balls of type 2, ..., and proportion (p_{k}) of balls of type (k). Then the joint PMF of ((X_{1},\ldots,X_{k})) is \begin{eqnarray} f_{X_{1},\ldots,X_{k}}(x_{1},\ldots,x_{k}) & = & {n \choose x_{1}\, x_{2}\,\cdots\, x_{k}}\, p_{1}^{x_{1}}p_{2}^{x_{2}}\cdots p_{k}^{x_{k}}, \end{eqnarray} for ((x_{1},\ldots,x_{k})) in the joint support (S_{X_{1},\ldots X_{K}}). We write \begin{equation} (X_{1},\ldots,X_{k})\sim\mathsf{multinom}(\mathtt{size}=n,\,\mathtt{prob}=\mathbf{p}{\mathrm{k}\times1}). \end{equation} Several comments are in order. First, the joint support set (S{X_{1},\ldots X_{K}}) contains all nonnegative integer (k)-tuples ((x_{1},\ldots,x_{k})) such that (x_{1}+x_{2}+\cdots+x_{k}=n). A support set like this is called a simplex. Second, the proportions (p_{1}), (p_{2}), ..., (p_{k}) satisfy (p_{i}\geq0) for all (i) and (p_{1}+p_{2}+\cdots+p_{k}=1). Finally, the symbol \begin{equation} {n \choose x_{1}\, x_{2}\,\cdots\, x_{k}}=\frac{n!}{x_{1}!\, x_{2}!\,\cdots x_{k}!} \end{equation} is called a multinomial coefficient which generalizes the notion of a binomial coefficient we saw in Equation \eqref{eq-binomial-coefficient}.

The form and notation we have just described matches the R usage but is not standard among other texts. Most other books use the above for a (k-1) dimension multinomial distribution, because the linear constraint (x_{1}+x_{2}+\cdots+x_{k}=n) means that once the values of (X_{1}), (X_{2}), ..., (X_{k-1}) are known the final value (X_{k}) is determined, not random. Another term used for this is a singular distribution.

For the most part we will ignore these difficulties, but the careful reader should keep them in mind. There is not much of a difference in practice, except that below we will use a two-dimensional support set for a three-dimension multinomial distribution. See Figure \@ref(fig:multinom-pmf2).

When (k=2), we have (x_{1}=x) and (x_{2}=n-x), we have (p_{1}=p) and (p_{2}=1-p), and the multinomial coefficient is literally a binomial coefficient. In the previous notation we have thus shown that the (\mathsf{multinom}(\mathtt{size}=n,\,\mathtt{prob}=\mathbf{p}_{2\times1})) distribution is the same as a (\mathsf{binom}(\mathtt{size}=n,\,\mathtt{prob}=p)) distribution.

\bigskip

```{example, name="Dinner with Barack Obama"} During the 2008 U.S. presidential primary, Barack Obama offered to have dinner with three randomly selected monetary contributors to his campaign. Imagine the thousands of people in the contributor database. For the sake of argument, Suppose that the database was approximately representative of the U.S. population as a whole, Suppose Barack Obama wants to have http://pewresearch.org/pubs/773/fewer-voters-identify-as-republicans with 36 democrats, 27 republicans, and 37 independents.

BLANK

\bigskip

```{block, type="remark"}
Here are some facts about the multinomial distribution.

1. The expected value of \((X_{1},\, X_{2},\,\ldots,\, X_{k})\) is
   \(n\mathbf{p}_{k\times1}\).
2. The variance-covariance matrix \(\Sigma\) is symmetric with
   diagonal entries \(\sigma_{i}^{2}=np_{i}(1-p_{i})\),
   \(i=1,\,2,\,\ldots,\, k\) and off-diagonal entries
   \(\mbox{Cov}(X_{i},\, X_{j})=-np_{i}p_{j}\), for \(i\neq j\). The
   correlation between \(X_{i}\) and \(X_{j}\) is therefore
   \(\mbox{Corr}(X_{i},\,
   X_{j})=-\sqrt{p_{i}p_{j}/(1-p_{i})(1-p_{j})}\).
3. The marginal distribution of \((X_{1},\, X_{2},\,\ldots,\,
   X_{k-1})\) is
   \(\mathsf{multinom}(\mathtt{size}=n,\,\mathtt{prob}=\mathbf{p}_{(k-1)\times1})\)
   with
   \begin{equation}
   \mathbf{p}_{(k-1)\times1}=\left(p_{1},\, p_{2},\,\ldots,\, p_{k-2},\, p_{k-1}+p_{k}\right),
   \end{equation}
   and in particular,
   \(X_{i}\sim\mathsf{binom}(\mathtt{size}=n,\,\mathtt{prob}=p_{i})\).

How to do it with R

There is support for the multinomial distribution in base R, namely in the stats package [@stats]. The dmultinom function represents the PMF and the rmultinom function generates random variates.

tmp <- t(xsimplex(3, 6))
p <- apply(tmp, MARGIN = 1, FUN = dmultinom, prob = c(36,27,37))
S <- probspace(tmp, probs = p)
ProbTable <- xtabs(probs ~ X1 + X2, data = S)
round(ProbTable, 3)

BLANK

Do some examples of rmultinom.

Another way to do the plot is with the scatterplot3d function in the scatterplot3d package [@scatterplot3d]. It looks like this:

library("scatterplot3d")
X <- t(as.matrix(expand.grid(0:6, 0:6)))
X <- X[, colSums(X) <= 6 ]; X <- rbind(X, 6 - colSums(X))
Z <- round(apply(X, 2, function(x) dmultinom(x, prob = 1:3)), 3)
A <- data.frame(x = X[1, ], y = X[2, ], probability = Z)
scatterplot3d(A, type = "h", lwd = 3, box = FALSE)

The scatterplot3d graph looks better in this example, but the code is more difficult to understand. And with cloud one can easily do conditional plots of the form cloud(z ~ x + y | f), where f is a factor.

cloud(probs ~ X1 + X2, data = S, type = c("p","h"), lwd = 2, 
            pch = 16, cex = 1.5, screen = list(z = 15, x = -70))

print(cloud(probs ~ X1 + X2, data = S, type = c("p","h"), lwd = 2, 
            pch = 16, cex = 1.5), screen = list(z = 15, x = -70))

(ref:cap-multinom-pmf2) \small A plot of a multinomial PMF.

Exercises

```{block, type="xca",label="xca-prove-cov-shortcut"} Prove that (\mbox{Cov}(X,Y)=\mathbb{E}(XY)-(\mathbb{E} X)(\mathbb{E} Y).)

\bigskip

```{block, type="xca",label="xca-sum-indep-chisq"} 
Suppose
\(X\sim\mathsf{chisq}(\mathtt{df}=p_{1})\) and
\(Y\sim\mathsf{chisq}(\mathtt{df}=p_{2})\) are independent. Find the
distribution of \(X+Y\) (you may want to refer to Equation
\eqref{eq-mgf-chisq}).

\bigskip

{block, type="xca",label="xca-diff-indep-norm"} Show that when \(X\) and \(Y\) are independent the MGF of \(X-Y\) is \(M_{X}(t)M_{Y}(-t)\). Use this to find the distribution of \(X-Y\) when \(X\sim\mathsf{norm}(\mathtt{mean}=\mu_{1},\,\mathtt{sd}=\sigma_{1})\) and \(Y\sim\mathsf{norm}(\mathtt{mean}=\mu_{2},\,\mathtt{sd}=\sigma_{2})\) are independent.

Any scripts or data that you put into this service are public.

IPSUR documentation built on May 2, 2019, 9:15 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

IPSUR
Introduction to Probability and Statistics Using R

In IPSUR: Introduction to Probability and Statistics Using R

Multivariate Distributions {#cha-multivariable-distributions}

Joint and Marginal Probability Distributions {#sec-joint-probability-distributions}

Joint and Marginal Expectation {#sec-joint-and-marginal-expectation}

Covariance and Correlation

How to do it with R

Conditional Distributions {#sec-conditional-distributions}

Bayesian Connection

Combining Independent Random Variables {#sub-combining-independent-random}

Exchangeable Random Variables {#sec-exchangeable-random-variables}

The Bivariate Normal Distribution {#sec-the-bivariate-normal}

How to do it with R

Bivariate Transformations of Random Variables {#sec-transformations-multivariate}

The Multinomial Distribution {#sec-multinomial}

How to do it with R

Exercises

Try the IPSUR package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

IPSUR Introduction to Probability and Statistics Using R

In IPSUR: Introduction to Probability and Statistics Using R

Multivariate Distributions {#cha-multivariable-distributions}

Joint and Marginal Probability Distributions {#sec-joint-probability-distributions}

Joint and Marginal Expectation {#sec-joint-and-marginal-expectation}

Covariance and Correlation

How to do it with R

Conditional Distributions {#sec-conditional-distributions}

Bayesian Connection

Combining Independent Random Variables {#sub-combining-independent-random}

Exchangeable Random Variables {#sec-exchangeable-random-variables}

The Bivariate Normal Distribution {#sec-the-bivariate-normal}

How to do it with R

Bivariate Transformations of Random Variables {#sec-transformations-multivariate}

The Multinomial Distribution {#sec-multinomial}

How to do it with R

Exercises

Try the IPSUR package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

IPSUR
Introduction to Probability and Statistics Using R