rsm: Generate variates from a softmax distribution.

Description Usage Arguments Details Value Note Author(s) Examples

View source: R/rsm.r

Description

Generate variates from a softmax distribution under Harville or Henery models.

Usage

1
rsm(eta, g = NULL, mu = NULL, gamma = NULL)

Arguments

eta

a vector of the odds. Must be the same length as g if g is given.

g

a vector giving the group indices. If NULL, then we assume only one group is in consideration.

mu

a vector of the probablities. Must be the same length as g if g is given. If mu and eta are both given, we ignore eta and use mu.

gamma

a vector of the gamma parameters for the Henery model. Typically the first element should be 1. Omit or set all values to 1 to recover the Harville model. The last element will be extended if the gamma is not long enough for a given group. Note that gamma is expected to be very short, and is not ‘aligned’ with eta in any way.

Details

Given the η in odds space, and a grouping variable, returns values in one to the number of elements in a group. That is, we have a permutation of 1 through n_g on each group as output.

For a single group, the probability that the ith element of the output is a 1 is equal to

π_{1,i} = \frac{μ_i^{γ_1}}{∑_j μ_j^{γ_1}}.

Once an element has been selected to have output 1, remove it from the set and iterate, but with the next γ elements.

Value

a vector of integers, each a permutation of one through the number of elements in each group.

Note

The output of this function satisfies a kind of order invariance that rhenery does not, at the cost of some computational inefficiency. Namely that for a fixed randseed, and for distinct eta, the output is equivariant with respect to permutation of the vector eta when there is only one group.

Regarding the ‘direction’, we associate higher odds with a smaller outcome. That is, the ith element of the output encodes the place that the ith participant took in the simulated ‘race’; it should be small if the odds for that participant are very high.

Author(s)

Steven E. Pav shabbychef@gmail.com

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
# simple use
set.seed(1234)
g <- ceiling(seq(1,10,by=0.1))
eta <- rnorm(length(g))
y <- rsm(eta,g=g)

# same same:
set.seed(235)
y1 <- rsm(eta,g=g)
set.seed(235)
y2 <- rsm(g=g,mu=smax(eta,g=g))
y1 - y2

# the default model is Harville
set.seed(1212)
y1 <- rsm(eta,g=g)
set.seed(1212)
y2 <- rsm(eta,g=g,gamma=c(1,1,1,1))
y1 - y2

# repeat several times with the cards stack against
# early runners
set.seed(1234)
colMeans(t(replicate(1000,rsm(sort(rnorm(10,sd=1.5)),g=rep(1,10)))))

# illustrate the invariance
mu <- (1:10) / 55
set.seed(1414)
y1 <- rsm(mu=mu,gamma=c(1,1,1))
set.seed(1414)
y2 <- rev(rsm(mu=rev(mu),gamma=c(1,1,1)))
y1 - y2


nfeat <- 5
set.seed(1234)
g <- ceiling(seq(0.1,1000,by=0.1))
X <- matrix(rnorm(length(g) * nfeat),ncol=nfeat)
beta <- rnorm(nfeat)
eta <- X %*% beta
y <- rsm(eta,g=g)

idx <- order(g,y,decreasing=TRUE) - 1
fooey <- harsmlik(g,idx,eta,deleta=X)
set.seed(3493)
dib <- rnorm(length(beta))
xvl <- seq(-0.01,0.01,length.out=301)
rsu <- sapply(xvl,
              function(del) {
                beta1 <- beta + del * dib
                eta1 <- X %*% beta1
                fooey2 <- harsmlik(g,idx,eta1,deleta=X)
                as.numeric(fooey2) - as.numeric(fooey)
              })
drv <- sapply(xvl,
              function(del) {
                beta1 <- beta + del * dib
                eta1 <- X %*% beta1
                fooey2 <- harsmlik(g,idx,eta1,deleta=X)
                sum(attr(fooey2,'gradient') * dib)
              })

if (require('ggplot2') && require('dplyr')) {
  bestx <- xvl[which.max(rsu)]
  ph <- data.frame(x=xvl,lik=rsu,grd=drv) %>%
    ggplot(aes(x=x,y=lik)) + 
    geom_point() + 
    geom_line(aes(y=grd/200)) +
    geom_vline(xintercept=bestx,linetype=2,alpha=0.5) + 
    geom_hline(yintercept=0,linetype=2,alpha=0.5)
  print(ph)
}


if (require('dplyr') && require('knitr')) {
  # expect this to be very small, almost always 1
  set.seed(1234)
  simdraw <- replicate(10000,{
    rsm(eta=c(100,rnorm(7)))[1]
  })

  as.data.frame(table(simdraw)) %>%
    mutate(prob=Freq / sum(Freq)) %>%
    knitr::kable()

  # expect this to be uniform on 2 through 8
  set.seed(1234)
  simdraw <- replicate(10000,{
    rsm(eta=c(100,rnorm(7)))[2]
  })

  as.data.frame(table(simdraw)) %>%
    mutate(prob=Freq / sum(Freq)) %>%
    knitr::kable()
}

ohenery documentation built on Oct. 30, 2019, 9:53 a.m.