Description Usage Arguments Value Examples
spectral
analyzes a rank dataset for order interactions; see examples for details.
1 |
data |
a vector in the frequency format (see examples) |
n |
the number of objects to select from |
k |
the number of objects selected |
levels |
the names of the outcomes, in the proper order |
iter |
iterations in metropolis |
burn |
burn in |
thin |
thinning |
a list containing named elements
effects
: the pure ith order effects as a data frame computed by projecting the data onto the isotypic subspaces. the length of each is the same as the data, choose(n, k).
effectsNorms
: the l2 norms of each effect.
statsMatrices
: the lower order statistics calculating matrices, made with Tmaker.
moves
: the markov moves for moving around each V, computed with markov on the statsMatrices. only the positive moves are given.
samps
: the samples from each space conditioned on each level of statistics. this is the output of metropolis.
obs
: a list of the observed data and its lower level summaries.
exp
: a list of the expected number of samples at each level given the summary statistics at the previous (lower) level. these are computed from the samples from metropolis by (1) summarizing them with Tmaker and then (2) averaging the samples. the expected V2 samples (for example), are determined by taking the samples with the same V1 statistics (in V3, say), summarizing them to V2 with Tmaker, and then averaging every cell across the samples.
fullExp
: this is the result of taking each of the samples with the same lower-order statistics and averaging them. see exp, which is a reduction of fullExp.
residuals
: obs - exp
isotypicBases
: a list of the basis vectors of each isotypic subspace; computed as the eigenvalues of the result of Amaker, and grouped by eigenvalue.
sampsEffects
: the effects determined by each of the samples, projected onto the isotypic subspaces.
sampsEffectsNorms
: the norms of the effects of the samples.
sampsEffectsNormSummary
: a summary of the norms of the effects of the samples.
showStages
: a function that prints out the observed, expected, and residuals of sequentially conditioning on the sample size, first order statistics, second order statistics, and so on.
showFit
: a function that prints out a summary of the fit of the model.
decompose
: a function that takes a vector of the same length of the table given and summarizes it to its lower level statistics.
sampsDecomposed
: every sample decomposed.
statistic
: the pearson's chi-squared (X2), likelihood ratio (G2), Freeman-Tukey (FT), Cressie-Read (CR), and Neyman modified chi-squared (NM) statistics computed using the observed data (obs) and expected data (exp) at each level.
sampsStats
: the statistics computed on each of the samples.
p.value
: the exact p-values of individual tests, accurate to Monte-Carlo error. these are computed as the proportion of samples with statistics equal to or larger than the oberved statistic.
p.value.se
: the standard errors of the p-values computed using the standard asymptotic formula of sqrt(p(1-p)/n). a measure of the Monte-Carlo error.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 | ## Not run:
## voting statistics at different levels
############################################################
# load the cookies dataset:
data(cookie)
cookie$freq
cookie$cookies
# performing the spectral analysis
(out <- spectral(cookie$freq, 6, 3, cookie$cookies))
out$obs # the original observations, and the summary statistics
out$exp # each level is conditional on the previous level's statistics
# (e.g. what you would expect for 1st order effects given sample size)
# these are generated using 10k markov bases based mcmc samples
out$p.value # these are approximate exact test p-values using various
# popular test statistics. the approximations are good to
# monte carlo error
out$p.value.se # these are the standard errors using the sqrt(p*(1-p)/n)
# asymptotic formula, known to have poor performance
# for small/large p; see package binom for better
out$statistic # the individual statistics are also available
# the values are not comprable across Vi levels (the rows)
# as they have asymptotic chi-squared distributions with
# different degrees of freedom
out$fullExp # you can also get the expected number of samples at each scale
# for tables with the same ith order statistics, i = 0, ..., k-1
# these can be seen to (re)construct an expected picture of the
# complete data given each successive collection of statistics
cbind(
obs = cookie$freq,
as.data.frame(lapply(out$fullExp, function(x) round(x[[4]],1)))
)[c(2:4,1)]
# notice that the reconstruction given only the first order statistics
# (the number of individual cookies selected) is quite good
# instead of using the reconstructions from the exp coming from
# the samples, you could reconstruct the summaries of the observed
# data using bump; it's not quite as good :
V0 <- bump(cookie$freq, 6, 3, 3, 0)
V1 <- bump(cookie$freq, 6, 3, 3, 1)
V2 <- bump(cookie$freq, 6, 3, 3, 2)
cbind(
obs = cookie$freq,
round(data.frame(
V0 = bump(V0, 6, 3, 0, 3),
V1 = bump(V1, 6, 3, 1, 3),
V2 = bump(V2, 6, 3, 2, 3)
), 2)
)[c(2:4,1)]
# you can see the model step-by-step with showStages() :
out$showStages()
# notice (1) the significant reduction in the residuals after conditioning
# on the first order statistics and also (2) the powdery noise after
# conditioning on the second order statistics.
# the p-values reflect the same:
# * the residuals from conditioning on the sample size show the first
# order effects are strongly significant (in out$p.value V1 = 0)
# * the residuals from conditioning on the first order effects suggest
# the second order effects might be significant (V2 ~ .04-.13ish)
# * the residuals from conditioning on the second order effects indicate
# the third order effects are entirely insignificant (V3 > .2)
# the isotypic subpaces can be used to determine the pure order effects :
out$isotypicBases # bases of the isotypic subspaces (here 4)
out$effects # pure ith order effects; cookie$freq projected onto the bases
# these are their effects at the data level, so they all have
# the same length as the original dataset: choose(n, k)
zapsmall(rowSums(out$effects)) # the effects sum to the data
# if the Vk effects are 0, then the conclusion is that Vk is perfectly
# predicted with the (k-1)st level statistics. this may lead to the
# conclusion that the l2 norms (say) of the effects might be used to
# gauge the relative strength of effects :
out$effectsNorms # = apply(out$effects, 2, lpnorm)
# the natural (not full-dimensional) residuals can be seen with the summary
out
# or with
out$residuals
# these are the residuals (obs ith level stats) - (exp ith level stats)
# given the (i-1)st statistics
# bump is a useful function :
out$obs
bump(cookie$freq, 6, 3, 3, 0) # the 0 level is the number of voters, not votes
bump(cookie$freq, 6, 3, 3, 1)
bump(cookie$freq, 6, 3, 3, 2)
bump(cookie$freq, 6, 3, 3, 3)
V1 <- out$obs$V1 # = bump(cookie$freq, 6, 3, 3, 1)
bump(V1, 6, 3, 1, 0)
bump(V1, 6, 3, 1, 1)
bump(V1, 6, 3, 1, 2) # cbind(bump(V1, 6, 3, 1, 2), out$exp$V2)
bump(V1, 6, 3, 1, 3) # cbind(bump(V1, 6, 3, 1, 3), out$fullExp$V1[[4]])
# the differences here are between an observation and an expectation
out$obs$V1 - out$exp$V1
out$residuals$V1
out$decompose(out$effects$V1)$V1
out$obs$V2 - out$exp$V2
out$residuals$V2
out$decompose(out$effects$V0)$V2 +
out$decompose(out$effects$V1)$V2 +
out$decompose(out$effects$V2)$V2 -
out$exp$V2
# this is how to reconstruct the observation given the effects
# the cols of out$effects are the Vk order effects reconstructed
# from the lower level effects
out$obs$V0
zapsmall(
out$decompose(out$effects$V0)$V0
)
out$obs$V1
zapsmall(
out$decompose(out$effects$V0)$V1 +
out$decompose(out$effects$V1)$V1
)
out$obs$V2
zapsmall(
out$decompose(out$effects$V0)$V2 +
out$decompose(out$effects$V1)$V2 +
out$decompose(out$effects$V2)$V2
)
out$obs$V3
zapsmall(
out$decompose(out$effects$V0)$V3 +
out$decompose(out$effects$V1)$V3 +
out$decompose(out$effects$V2)$V3 +
out$decompose(out$effects$V3)$V3
)
zapsmall(rowSums(out$effects))
all(cookie$freq == zapsmall(rowSums(out$effects)))
out$effects$V0
out$effects$V0 + out$effects$V1
out$effects$V0 + out$effects$V2
out$effects$V0 + out$effects$V3
str(out$sampsDecomposed)
as.data.frame(lapply(out$sampsDecomposed, function(l) rowMeans(l$V3)))
eff0 <- rowMeans(out$sampsDecomposed$V0$V3)
cbind(eff0, out$effects$V0)
eff1 <- rowMeans(out$sampsDecomposed$V1$V3 - eff0)
cbind(eff1, out$effects$V1)
eff2 <- rowMeans(out$sampsDecomposed$V2$V3 - eff0 - eff1)
cbind(eff2, out$effects$V2)
sum(eff0)
sum(eff1)
sum(eff2)
str(out$sampsEffectsNorms)
data <- out$sampsEffectsNorms$V0$V3
plot(density(data))
curve(dnorm(x, mean(data), sd(data)), col = "red", add = TRUE)
data <- out$sampsEffectsNorms$V0$V2
plot(density(data))
curve(dnorm(x, mean(data), sd(data)), col = "red", add = TRUE)
data <- out$sampsEffectsNorms$V0$V1
plot(density(data))
curve(dnorm(x, mean(data), sd(data)), col = "red", add = TRUE)
data <- out$sampsEffectsNorms$V1$V3
plot(density(data))
curve(dnorm(x, mean(data), sd(data)), col = "red", add = TRUE)
data <- out$sampsEffectsNorms$V1$V2
plot(density(data))
curve(dnorm(x, mean(data), sd(data)), col = "red", add = TRUE)
data <- out$sampsEffectsNorms$V2$V3
plot(density(data))
curve(dnorm(x, mean(data), sd(data)), col = "red", add = TRUE)
## how to convert data into the right format
############################################################
# this essentially just uses some clever indexing tricks
# to reorder the data in the way you want
data <- cookie$raw # an example raw, unordered dataset
levels <- cookie$cookies # the order of the objects you want
levsNndcs <- 1:length(levels)
names(levsNndcs) <- levels
# arrange selections within rows (order of selection doesn't matter)
data <- t(apply(data, 1, function(x) x[order(levsNndcs[x])] ))
# arrange rows (order of selectors doesn't matter)
for(k in ncol(data):1) data <- data[order(levsNndcs[data[,k]]),]
# check that you've done the right thing
all( data == cookie$sorted )
# the data frequency order should match that of subsets:
subsets(levels, 1)
subsets(levels, 2)
sapply(subsets(levels, 2), paste, collapse = ", ")
subsets(levels, 3)
sapply(subsets(levels, 3), paste, collapse = ", ")
names(cookie$freq)
names(cookie$freq) == sapply(subsets(levels, 3), paste, collapse = ", ")
## other examples
############################################################
# rvotes provides uniform samples
n <- 4
k <- 2
raw <- rvotes(250, n, k)
rawTogether <- apply(raw, 1, paste, collapse = " ")
levels <- sapply(subsets(n, k), paste, collapse = " ")
freq <- table( factor(rawTogether, levels = levels) )
(out <- spectral(freq, n, k))
out$p.value
out$showStages()
out$obs
out$exp
n <- 6
k <- 3
raw <- rvotes(250, n, k)
rawTogether <- apply(raw, 1, paste, collapse = " ")
levels <- sapply(subsets(n, k), paste, collapse = " ")
freq <- table( factor(rawTogether, levels = levels) )
(out <- spectral(freq, n, k))
n <- 7
k <- 3
raw <- rvotes(250, n, k)
rawTogether <- apply(raw, 1, paste, collapse = " ")
levels <- sapply(subsets(n, k), paste, collapse = " ")
freq <- table( factor(rawTogether, levels = levels) )
(out <- spectral(freq, n, k))
n <- 8
k <- 3
raw <- rvotes(250, n, k)
rawTogether <- apply(raw, 1, paste, collapse = " ")
levels <- sapply(subsets(n, k), paste, collapse = " ")
freq <- table( factor(rawTogether, levels = levels) )
# out <- spectral(freq, n, k) # breaks
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.