knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.pos = 'center', fig.align = 'center', fig.height = 6, fig.width = 5 )

*Edited by Joseph Stachelek: r format(Sys.time(), '%d %B, %Y')*

\newcommand{\rep}{\mathrm{rep}} \newcommand{\D}{\mathrm{d}} \newcommand{\inds}[1]{\mathrm{1}_{#1}} \newcommand{\Normal}{\mathrm{Normal}}

set.seed(20100129) options(digits=2)

`rv`

is an implementation of a
simulation-based random variable object class for R,
originally introduced in
@kerman_gelman_2007.

`rv`

implements a new class of vectors that
contain a 'hidden dimension' of simulations in each scalar component.
These `rv objects`

can be manipulated much like any numeric vectors,
but the arithmetic operations are performed on the simulations,
and summaries are calculated from the simulation vectors.

`rv`

is convenient for manipulating
posterior simulations obtained from MCMC samplers,
for example using Umacs [@kerman_2006]
or R2WinBUGS [@sturtz_2005]
(the package provides a coercion method to convert
`bugs`

objects to `rv`

objects.)

The paper by
@kerman_gelman_2007
introduces the principles
of the design of random variable objects.
This document is a short overview of some of the
commands provided by the package `rv`

.
At the end of the document there is a short description
of the implementation.

Install the package `rv`

(version 2.1.1 or higher)
using the Package Installer command in R (from the menu),
and load the package using,

```
library(rv)
```

The rv objects (or, "random vectors")
that we manipulate usually come from a Markov chain sampler.
To introduce some commands quickly, we will instead
use some random vectors generated by
*random-vector generating functions*
which sample directly from a given (standard) distribution.

First, we will set the number of simulations we use. We choose 4000 simulations per each scalar component of a random vector:

setnsims(4000)

We will not usually change this value during our session, unless we want to repeat our analysis with more (or fewer) simulations. The default value is 4000, set whenever the package is loaded for the first time in the workspace; therefore this is not strictly a necessary step to do every time we start the package.

To draw a random Gaussian (Normal) vector of length 5 with corresponding means $1,2,3,4,5$ and s.d. 1,

x <- rvnorm(mean=1:5, sd=1)

In effect, the object `x`

now
contains five vectors of length $4000$,
drawn (internally) using `rnorm`

,
but we see `x`

as a *vector of length 5*.

The length of the vector is derived from the length of the
mean vector (and the sd vector), and it is not necessary to
specify a parameter "`n`

".

To summarize the distribution of `x`

by viewing quantiles,
means, and s.d.'s, we only type the name of the object at the console:

x

Similarly we can draw from Poisson
(`rvpois`

) Gamma, (`rvgamma`

),
Binomial (`rvbinom`

):

y <- rvpois(lambda=10)

To extract the means, we use `rvmean`

,
the s.d.'s, we use `rvsd`

, the minimum,
`rvmin`

, the maximum `rvmax`

,
and the quantiles, we use `rvquantile`

.
The componentwise medians are also obtained
by `rvmedian`

:

rvmean(x) rvsd(x) rvquantile(x, c(0.025,0.25,0.5,0.75,0.975)) rvmedian(x) rvmin(y) rvmax(y)

For convenience, there is an alias
`E(...)`

for `rvmean(...)`

which gives the "expectation" of a random vector.

Since the random vectors are all represented
by simulations, the expectation and all other functions
that we compute are just numerical approximations.
Generating a "standard normal random variable"
with `z <- rvnorm(n=1, mean=0, sd=1)`

will not have an expectation exactly zero.
Our main purpose here is to handle simulations,
so the answers will be approximate and
necessarily involve a simulation error.

Since rv objects work just like vectors, we can extract and replace components by using the bracket notation. Here we replace the 3rd and 4th components with random variables having (an approximate) binomial distributions:

x[3:4] <- rvbinom(size = 1, prob = c(0.1, 0.9)) x[3:4]

The "mean" column now shows the estimate of the expectation of the two indicator functions we generated.

To "impute" a random vector in a regular
numeric vector, we can either first turn the constant
vector into an `rv`

object:

y <- as.rv(1:5) y[3:4] <- x[3:4] y

or, use the special function `impute`

that can handle regular vectors and `rv`

objects:

y <- (1:5) impute(y, 3:4) <- x[3:4] y

The non-random components appearing as "constants," or in other words, random variables with point-mass distributions (and therefore having a zero variance).

Standard numerical functions can be applied directly to random vectors. To find a summary of the distribution of the function $1/(1+\exp(-x_1))$, we would write,

1/(1 + exp(-x[1]))

Or of the function of almost anything we like:

2*log(abs(x[2]))

To simulate the order statistics of a random vector `x`

,
we can use `sort(x)`

, `min(x)`

, `max(x)`

.

x <- rvpois(lambda=1:5) x sort(x) min(x) max(x)

Note: the `order`

method is not implemented.

`rv`

objects behave like numerical vectors in R;
thus you can set their dimension attributes to make them appear as
arrays, and also use the matrix multiplication operator.
(Note: ```
\%**\%}
performs the matrix multiplication, ensuring that
non-rv and
```

rv```
objects get properly multiplied.
Using
```

\%*\%` does not work if the

matrix or vector on the left is not an rv object.)

p <- runif(4) # Some prior probabilities. y <- rvbinom(size=1, prob=p) # y is now a rv of length 4. dim(y) <- c(2,2) # Make y into a 2x2 matrix. y y %**% y

The componentwise summary functions
such as `E`

(`rvmean`

) and `rvsd`

return the
summaries with the correct dimension attribute set:

```
E(y)
```

Applying logical operators gives indicators of events.
If `z`

is a standard normal random variable
the indicator of the event ${z>1}$ is given by
the statement `z>1`

:

z <- rvnorm(1) z > 1

We can also use the convenience function
`Pr(...)`

to compute the estimates
of the expectation of these indicators:

Pr(z > 1)

Of course, we can find joint events as well and computer their probabilities similarly. To find the probability that $Z_1>Z_2^2$, where both $Z_1$ and $Z_2$ are independent standard normal, we'd type

z <- rvnorm(2) Pr(z[1] > z[2]^2)

We can even compute probabilities of intersections or unions of events,

Pr(x[1] > x[2] & x[1] > x[4]) Pr(x[1] > x[2] | x[1] > x[4])

We can use random vectors, regular vectors, standard elementary functions, logical operations in any combination as we wish.

Example. Let $z_1,z_2$ be standard normal, and let $y_1=\exp(z_1), y_2=y_1\exp(z_2)$.

Compute the expectation of $x=(y_1-1)\inds{y_1>1}\inds{y_2>1}$ and find the probability $\Pr(x>1)$.

z <- rvnorm(n=2, mean=0, sd=1) y <- exp(z) y[2] <- y[2] * y[1] x <- (y[1]-1) * (y[1]>1) * (y[2]>1) E(x) Pr(x>1)

We can generate posterior simulations
from a classical regression model,
using the standard assumptions for the priors.
For convenience there is a function `posterior`

to do this.

n <- 10 ## Some covariates X <- data.frame(x1=rnorm(n, mean=0), x2=rpois(n, 10) - 10) y.mean <- (1.0 + 2.0 * X$x1 + 3.0 * X$x2) y <- rnorm(n, y.mean, sd=1.5) ## n random numbers D <- cbind(data.frame(y=y), X) ## Regression model fit fit <- lm(y ~ x1 + x2, data=D)

The Bayesian estimates (posterior distributions) are represented by,

Post <- posterior(fit) Post

Continuing the previous example,
we'll resample from the sampling distribution of $y$
using the posterior simulations we got.
We can use the function `rvnorm`

to do this,
since it accepts *random vectors as arguments*.
Rather than think `rvnorm`

to draw normal
random vectors, it rather "samples from the normal model."
The vector will be normal *given* (constant) mean and s.d.,
but if the mean and s.d. are not constants, the resulting
vector will not be normal.

sigma <- Post$sigma betas <- Post$beta M <- model.matrix(fit) y.rep <- rvnorm(mean=M %**% betas, sd=sigma) mlplot(y.rep) # Summarize graphically.

Note also that `sigma`

is also an rv object.

The matrix multiplication statement returns a random vector of length \Sexpr{nrow(M)}:

```
M %**% betas
```

Thus all the uncertainty in the mean estimate $X\beta$ and the residual s.d. estimate $\sigma$ is propagated when the replicated vector $y^\rep$ is generated. In effect, this single line of code thus will in fact draw from the distribution $p({y}^\rep|y)=\int\int \Normal({y}^\rep|\mu,\sigma)p(\mu,\sigma|y)\D\mu\D\sigma.$

For convenience, there is a generic method
`rvpredict`

to generate replications and predictions:

## Replications y.rep <- rvpredict(fit)

We can also generate predictions at some other covariate values:

## Predictions at the mean of the covariates X.pred <- data.frame(x1=mean(X$x1), x2=mean(X$x2)) y.pred <- rvpredict(fit, newdata=X.pred)

We can also perturb (add uncertainty to) the covariate `x1}, then predict again.

X.rep <- X X.rep$x1 <- rnorm(n=n, mean=X.rep$x1, sd=sd(X.rep$x1)) y.pred2 <- rvpredict(fit, newdata=X.rep) y.pred2

Graphical summaries are still in development,
but it is now possible to plot
a scatterplot with a regular vector against a
random vector, showing the
50\% and 95\% *uncertainty intervals*
along with the median, using
```
plot(y,x,...)},
where
```

y`is not random but`

x```
is.
or we can show two random scalars plotted
as a 2-dimensional scatterplot with
```

plot(x[1],x[2],...)}.

To illustrate, let us plot the predicted intervals of the previous example, along with the data points.

Plot
the predictions against `y`

in red color;
then plot the perturbed predictions
with blue color.

## Plot predictions plot(x = y.rep, y = D$y, rvcol="red") points(y.pred2, D$y + 0.33, rvcol="blue")

Note that the function method needs to be called
explicitly to be able to plot constants vs. rv objects.
If the first argument of `plot(x, ...)`

is an
rv object, one can call `plot`

Or, we can show a random vectors as
horizontal intervals using `mlplot`

:

mlplot(y.rep, rvcol="red") mlplot(D$y, add=TRUE, col="blue", pch="x")

A histogram of the simulations of a random scalar ```
x[1]},
can be plotted with
```

rvhist}:

rvhist(rvnorm(mean=0, sd=1)^3, xlim=c(-3, 3), col="red", main="Cubed standard normal")

This code simulates 200 iterations of the
well-known P\'olya's urn problem.
The parameter `x/(n+1)`

for the Bernoulli-variate-generating
function `rvbern(...)`

is random:
we can generate random variables
using random parameters without much trickery;
our code looks therefore more natural.

The model: \begin{eqnarray} \quad X_0 &= 1 \ \quad X_n-X_{n-1}|X_{n-1} &\sim \text{Bernoulli}( X_{n-1}/(n+1) ) \end{eqnarray}

The R code:

x <- 1 for (n in 1:100) { x <- x + rvbern(n=1, prob=x / (n + 1)) }

rvhist(x / (n + 1)) # Histogram

We can see that the distribution is close to uniform, which is the limiting distribution in this case.

To extract the simulation matrix embedded in an rv object, use
`sims`

:

s <- sims(y.rep) dim(s)

It is our convention to have the columns represent the random vector and the rows represent the draws from the joint distribution of the vector.

A matrix or a vector of simulations is converted into an rv object
by `rvsims`

.
Continuing the above example, we'll convert the matrix back to
an rv object.

y <- rvsims(s)

You can verify that
`all(sims(y)==s)`

returns
\Sexpr{all(sims(y)==s)}. % TRUE
Also note that `length(y)`

gives
\Sexpr{length(y)},
since `y`

is "just a vector."

The function `as.rv(x)`

coerces objects
to rv objects. However, this does not mean that
matrices of simulations are turned into rv objects---this is
done with `rvsims`

, as explained above.
`as.rv(rnorm(4000))`

would return a random vector
of length 4000, where each component has zero variance
(and one single simulation). You probably
mean `rvsims(rnorm(4000))`

, but
the correct way to generate this object is
`rvnorm(1)`

.

R2WinBUGS [@sturtz_2005] is an
interface for calling WinBUGS within R,
and obtaining the simulations as an R matrix
(that is embedded in a "`bugs`

" object).
If `bugsobj`

is the bugs object returned by the
`bugs(...)`

function call, then
`as.rv`

will coerce it into a list of random vectors,
split by the parameter names:
`y <- as.rv(bugsobj)`

Umacs facilitates the construction of a Gibbs/Metropolis
sampler in R \citep{Kerman:2006:umacs},
and returns the simulations wrapped in an "`UmacsRun`

"
object. Again, the coercion method `as.rv`

will convert the Umacs object, say `obj`

,
into a list of named random vectors:
`y <- as.rv(obj)`

.

`rv`

is written in "S3" style object-oriented R
rather than using the `methods`

("S4") package.
The main reason was speed, the secondary consideration was
the ease of writing new functions.

The main class is called `rv`

.
Most functions expecting an rv object have
names starting with `rv`

.
For example, `rvnorm`

, `rvmean`

, etc.

The package also features rv-specific methods
extending the basic numeric vector classes,
e.g. `c.rv`

, `plot.rv`

, etc.
However, the method dispatch in R will not
call the `rv`

class method if
the first object in the argument list is not an
`rv`

object;
for example, `c(...)`

will not call `c.rv`

in the following case:
suppose that `x`

is an object of class `rv`

and `k <- 10`

.
Then `c(k, x)`

will not call `c.rv`

.
To ensure the proper result, wrap the first element
in `as.rv`

:

`c(as.rv(k), x)`

will produce a proper random vector.

This program is a work in progress, and it may contain bugs. Many new features will be eventually (and hopefully) added.

For information about random variables in R, please refer to @kerman_gelman_2007. For information about Umacs (Universal Markov chain sampler) please refer to @kerman_2006.

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.