Description Usage Arguments Value Note References Examples

Markov-Chain Monte Carlo method for simulating draws from the
observed-data posterior distribution of underlying cell probabilities
under hierarchical loglinear models. May be used in conjunction with
`imp.cat`

to create proper multiple imputations.

1 |

`s` |
summary list of an incomplete categorical dataset created by the
function |

`margins` |
vector describing the marginal totals to be fitted. A margin is described by the factors not summed over, and margins are separated by zeros. Thus c(1,2,0,2,3,0,1,3) would indicate fitting the (1,2), (2,3), and (1,3) margins in a three-way table, i.e., the model of no three-way association. |

`start` |
starting value of the parameter. The starting value should lie in the
interior of the parameter space for the given loglinear model. If
structural zeros are present, |

`steps` |
number of complete cycles of data augmentation-Bayesian IPF to be performed. |

`prior` |
optional array of hyperparameters specifying a Dirichlet
prior distribution. The default is the Jeffreys prior (all
hyperparameters = .5). If structural zeros are present, a prior
should be supplied with hyperparameters set to |

`showits` |
if |

array of simulated cell probabilities that satisfy the loglinear model. If the algorithm has converged, this will be a draw from the actual posterior distribution of the parameters.

The random number generator seed must be set at least once by the
function `rngseed`

before this function can be used.

The starting value must lie in the interior of the parameter space.
Hence, caution should be used when using a maximum likelihood estimate
(e.g., from `ecm.cat`

) as a starting value. Random zeros in a table
may produce mle's with expected cell counts of zero. This difficulty
can be overcome by using as a starting value a posterior mode
calculated by `ecm.cat`

with prior hyperparameters greater than one.

Schafer (1996) *Analysis of Incomplete Multivariate Data.*
Chapman \& Hall, Chapter 8.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | ```
#
# Example 1 Based on Schafer's p. 329 and ss. This is a toy version,
# using a much shorter length of chain than required. To
# generate results comparable with those in the book, edit
# the \dontrun{ } line below and comment the previous one.
#
data(belt)
attach(belt.frame)
EB <- ifelse(B1==B2,1,0)
EI <- ifelse(I1==I2,1,0)
belt.frame <- cbind(belt.frame,EB,EI)
colnames(belt.frame)
a <- xtabs(Freq ~ D + S + B2 + I2 + EB + EI,
data=belt.frame)
m <- list(c(1,2,3,4),c(3,4,5,6),c(1,5),
c(1,6),c(2,6))
b <- loglin(a,margin=m) # fits (DSB2I2)B2I2EBEI)(DEB)(DEI)(SEI)
# in Schafer's p. 304
a <- xtabs(Freq ~ D + S + B2 + I2 + B1 + I1,
data=belt.frame)
m <- list(c(1,2,5,6),c(1,2,3,4),c(3,4,5,6),
c(1,3,5),c(1,4,6),c(2,4,6))
b <- loglin(a,margin=m) # fits (DSB1I1)(DSB2I2)(B2I2B1I1)(DB1B2)
# (DI1I2)(SI1I2) in Schafer's p. 329
s <- prelim.cat(x=belt[,-7],counts=belt[,7])
m <- c(1,2,5,6,0,1,2,3,4,0,3,4,5,6,0,1,3,5,0,1,4,6,0,2,4,6)
theta <- ecm.cat(s,margins=m, # excruciantingly slow; needs 2558
maxits=5000) # iterations.
rngseed(1234)
#
# Now ten multiple imputations of the missing variables B2, I2 are
# generated, by running a chain and taking every 2500th observation.
# Prior hyperparameter is set at 0.5 as in Shchafer's p. 329
#
imputations <- vector("list",10)
for (i in 1:10) {
cat("Doing imputation ",i,"\n")
theta <- dabipf(s,m,theta,prior=0.5, # toy chain; for comparison with
steps=25) # results in Schafer's book the next
# statement should be run,
# rather than this one.
## Not run: theta <- dabipf(s,m,theta,prior=0.5,steps=2500)
imputations[[i]] <- imp.cat(s,theta)
}
detach(belt.frame)
#
# Example 2 (reproduces analysis performed in Schafer's p. 327.)
#
# Caveat! I try to reproduce what has been done in that page, but although
# the general appearance of the boxplots generated below is quite similar to
# that of Schafer's Fig. 8.4 (p. 327), the VALUES of the log odds do not
# quite fall in line with those reported by said author. It doesn't look like
# the difference can be traced to decimal vs. natural logs. On the other hand,
# Fig. 8.4 refers to log odds, while the text near the end of page 327 gives
# 1.74 and 1.50 as the means of the *odds* (not log odds). FT, 22.7.2003.
#
#
data(older) # reading data
x <- older[,1:6] # preliminary manipulations
counts <- older[,7]
s <- prelim.cat(x,counts)
colnames(x) # names of columns
rngseed(1234)
m <- c(1,2,3,4,5,0,1,2,3,5,6,0,4,3) # model (ASPMG)(ASPMD)(GD) in
# Schafer's p. 327
# do analysis with different priors
theta <- ecm.cat(s,m,prior=1.5) # Strong pull to uniform table
# for initial estimates
prob1 <- dabipf(s,m,theta,steps=100, # Burn-in period
prior=0.1)
prob2 <- dabipf(s,m,theta,steps=100, # Id. with second prior
prior=1.5)
lodds <- matrix(0,5000,2) # Where to store log odds ratios.
oddsr <- function(x) { # Odds ratio of 2 x 2 table.
o <- (x[1,1]*x[2,2])/
(x[1,2]*x[2,1])
return(o)
}
for(i in 1:5000) { # Now generate 5000 log odds
prob1 <- dabipf(s,m,prob1, prior=0.1)
t1 <- apply(prob1,c(1,2),sum) # Marginal GD table
# Log odds ratio
lodds[i,1] <- log(oddsr(t1))
prob2 <- dabipf(s,m,prob2, prior=1.5) # Id. with second prior
t2 <- apply(prob2,c(1,2),sum)
lodds[i,2] <- log(oddsr(t2))
}
lodds <- as.data.frame(lodds)
colnames(lodds) <- c("0.1","1.5") # Similar to Schafer's Fig. 8.4.
boxplot(lodds,xlab="Prior hyperparameter")
title(main="Log odds ratio generated with DABIPF (5000 draws)")
summary(lodds)
``` |

cat documentation built on May 29, 2017, 9:49 p.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.