ordsample: Drawing a sample of discrete data

Description Usage Arguments Value Author(s) See Also Examples

View source: R/ordsample.R

Description

The function draws a sample from a multivariate discrete variable with correlation matrix Sigma and prescribed marginal distributions marginal

Usage

1
2
ordsample(n, marginal, Sigma, support = list(), Spearman = FALSE,
cormat = "discrete")

Arguments

n

the sample size

marginal

a list of k elements, where k is the number of variables. The i-th element of marginal is the vector of the cumulative probabilities defining the marginal distribution of the i-th component of the multivariate variable. If the i-th component can take k_i values, the i-th element of marginal will contain k_i-1 probabilities (the k_i-th is obviously 1 and shall not be included).

Sigma

the target correlation matrix of the multivariate discrete variable

support

a list of k elements, where k is the number of variables. The i-th element of support is the vector containing the ordered values of the support of the i-th variable. By default, the support of the i-th variable is 1,2,...,k_i

Spearman

if TRUE, the function finds Spearman's correlations (and it is not necessary to provide support), if FALSE (default) Pearson's correlations

cormat

"discrete" if the Sigma in input is the target correlation matrix of the multivariate discrete variable; "continuous" if the Sigma in input is the intermediate correlation matrix of the multivariate standard normal

Value

a n\times k matrix of values drawn from the k-variate discrete r.v. with the desired marginal distributions and correlation matrix

Author(s)

Alessandro Barbiero, Pier Alda Ferrari

See Also

contord, ordcont, corrcheck

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
# Example 1

# draw a sample from a bivariate ordinal variable
# with 4 of categories and asymmetrical marginal distributions
# and correlation coefficient 0.6 (to be checked)
k <- 2
marginal <- list(c(0.1,0.3,0.6), c(0.4,0.7,0.9))
corrcheck(marginal) # check ok
Sigma <- matrix(c(1,0.6,0.6,1),2,2)
# sample size 1000
n <- 1000
# generate a sample of size n
m <- ordsample(n, marginal, Sigma)
head(m)
# sample correlation matrix
cor(m) # compare it with Sigma
# empirical marginal distributions
cumsum(table(m[,1]))/n
cumsum(table(m[,2]))/n # compare them with the two marginal distributions

# Example 1bis

# draw a sample from a bivariate ordinal variable
# with 4 of categories and asymmetrical marginal distributions
# and Spearman correlation coefficient 0.6 (to be checked)
k <- 2
marginal <- list(c(0.1,0.3,0.6), c(0.4,0.7,0.9))
corrcheck(marginal, Spearman=TRUE) # check ok
Sigma <- matrix(c(1,0.6,0.6,1),2,2)
# sample size 1000
n <- 1000
# generate a sample of size n
m <- ordsample(n, marginal, Sigma, Spearman=TRUE)
head(m)
# sample correlation matrix
cor(rank(m[,1]),rank(m[,2])) # compare it with Sigma
# empirical marginal distributions
cumsum(table(m[,1]))/n
cumsum(table(m[,2]))/n # compare them with the two marginal distributions

# Example 1ter

# draw a sample from a bivariate random variable
# with binomial marginal distributions (n=3, p=1/3 and n=4, p=2/3)
# and Pearson correlation coefficient 0.6 (to be checked)
k <- 2
marginal <- list(pbinom(0:2, 3, 1/3),pbinom(0:3, 4, 2/3))
marginal
corrcheck(marginal, support=list(0:3, 0:4)) # check ok
Sigma <- matrix(c(1,0.6,0.6,1),2,2)
# sample size 1000
n <- 1000
# generate a sample of size n
m <- ordsample(n, marginal, Sigma, support=list(0:3,0:4))
head(m)
# sample correlation matrix
cor(m) # compare it with Sigma
# empirical marginal distributions
cumsum(table(m[,1]))/n
cumsum(table(m[,2]))/n # compare them with the two marginal distributions

# Example 2

# draw a sample from a 4-dimensional ordinal variable
# with different number of categories and uniform marginal distributions
# and different correlation coefficients
k <- 4
marginal <- list(0.5, c(1/3,2/3), c(1/4,2/4,3/4), c(1/5,2/5,3/5,4/5))
corrcheck(marginal)
# select a feasible correlation matrix
Sigma <- matrix(c(1,0.5,0.4,0.3,0.5,1,0.5,0.4,0.4,0.5,1,0.5,0.3,0.4,0.5,1),
4, 4, byrow=TRUE)
Sigma
# sample size 100
n <- 100
# generate a sample of size n
set.seed(1)
m <- ordsample(n, marginal, Sigma)
# sample correlation matrix
cor(m) # compare it with Sigma
# empirical marginal distribution
cumsum(table(m[,4]))/n # compare it with the fourth marginal
head(m)
# or equivalently...
set.seed(1)
res <- ordcont(marginal, Sigma)
res[[1]] # the intermediate correlation matrix of the multivariate normal
m <- ordsample(n, marginal, res[[1]], cormat="continuous")
head(m)

# Example 3
# simulation of two correlated Poisson r.v.
# modification to GenOrd sampling function for Poisson distribution
ordsamplep<-function (n, lambda, Sigma) 
{
   k <- length(lambda)
   valori <- mvrnorm(n, rep(0, k), Sigma)
   for (i in 1:k)
   {
   valori[, i] <- qpois(pnorm(valori[,i]), lambda[i])
   }
   return(valori)
}
# number of variables
k <- 2
# Poisson parameters
lambda <- c(2, 5)
# correlation matrix
Sigma <- matrix(0.25, 2, 2)
diag(Sigma) <- 1
# sample size
n <- 10000
# preliminar stage: support TRUNCATION
# required for recovering the correlation matrix
# of the standard bivariate normal
# truncation error
epsilon <- 0.0001
# corresponding maximum value
kmax <- qpois(1-epsilon, lambda)
# truncated marginals
l <- list()
for(i in 1:k)
{
l[[i]] <- 0:kmax[i]
}
marg <- list()
for(i in 1:k)
{
marg[[i]] <- dpois(0:kmax[i],lambda[i])
marg[[i]][kmax[i]+1] <- 1-sum(marg[[i]][1:(kmax[i])])
}
cm <- list()
for(i in 1:k)
{
cm[[i]] <- cumsum(marg[[i]])
cm[[i]] <- cm[[i]][-(kmax[i]+1)]
}
# check feasibility of correlation matrix
RB <- corrcheck(cm, support=l)
RL <- RB[[1]]
RU <- RB[[2]]
Sigma <= RU & Sigma >= RL # OK
res <- ordcont(cm, Sigma, support=l)
res[[1]]
Sigma <- res[[1]]
# draw the sample
m <- ordsamplep(n, lambda, Sigma)
# sample correlation matrix
cor(m)
head(m)

# Example 4
# simulation of 4 correlated binary and Poisson r.v.'s (2+2)
# modification to GenOrd sampling function
ordsamplep <- function (n, marginal, lambda, Sigma) 
{
k <- length(lambda)
valori <- mvrnorm(n, rep(0, k), Sigma)
for(i in 1:k)
{
if(lambda[i]==0)
{
	valori[, i] <- as.integer(cut(valori[, i], breaks = c(min(valori[,i]) - 1,
               qnorm(marginal[[i]]), max(valori[, i]) + 1)))
valori[, i] <- support[[i]][valori[, i]]
}
else
{
	valori[, i] <- qpois(pnorm(valori[,i]), lambda[i])
}
	}
return(valori)
}
# number of variables
k <- 4
# Poisson parameters (only 3rd and 4th are Poisson)
lambda <- c(0, 0, 2, 5)
# 1st and 2nd are Bernoulli with p=0.5
marginal <- list()
marginal[[1]] <- .5
marginal[[2]] <- .5
marginal[[3]] <- 0
marginal[[4]] <- 0
# support
support <- list()
support[[1]] <- 0:1
support[[2]] <- 0:1
# correlation matrix
Sigma <- matrix(0.25, k, k)
diag(Sigma) <- 1
# sample size
n <- 10000
# preliminar stage: support TRUNCATION
# required for recovering the correlation matrix
# of the standard bivariate normal
# truncation error
epsilon <- 0.0001
# corresponding maximum value
kmax <- qpois(1-epsilon, lambda)
# truncated marginals
for(i in 3:4)
{
support[[i]] <- 0:kmax[i]
}
marg <- list()
for(i in 3:4)
{
marg[[i]] <- dpois(0:kmax[i],lambda[i])
marg[[i]][kmax[i]+1] <- 1-sum(marg[[i]][1:(kmax[i])])
}
for(i in 3:4)
{
marginal[[i]] <- cumsum(marg[[i]])
marginal[[i]] <- marginal[[i]][-(kmax[i]+1)]
}
# check feasibility of correlation matrix
RB <- corrcheck(marginal, support=support)
RL <- RB[[1]]
RU <- RB[[2]]
Sigma <= RU & Sigma >= RL # OK
# compute correlation matrix of the 4-variate standard normal
res <- ordcont(marginal, Sigma, support=support)
res[[1]]
Sigma <- res[[1]]
# draw the sample
m <- ordsamplep(n, marginal, lambda, Sigma)
# sample correlation matrix
cor(m)
head(m)

Example output

Loading required package: mvtnorm
Loading required package: Matrix
Loading required package: MASS
[[1]]
2 x 2 Matrix of class "dsyMatrix"
     [,1] [,2]
[1,]    1   -1
[2,]   -1    1

[[2]]
2 x 2 Matrix of class "dsyMatrix"
     [,1] [,2]
[1,]  1.0  0.8
[2,]  0.8  1.0

     [,1] [,2]
[1,]    4    3
[2,]    4    3
[3,]    4    2
[4,]    3    1
[5,]    2    1
[6,]    3    2
          [,1]      [,2]
[1,] 1.0000000 0.6069379
[2,] 0.6069379 1.0000000
    1     2     3     4 
0.106 0.299 0.596 1.000 
    1     2     3     4 
0.377 0.689 0.898 1.000 
[[1]]
2 x 2 Matrix of class "dsyMatrix"
     [,1] [,2]
[1,]    1   -1
[2,]   -1    1

[[2]]
2 x 2 Matrix of class "dsyMatrix"
          [,1]      [,2]
[1,] 1.0000000 0.8733333
[2,] 0.8733333 1.0000000

     [,1] [,2]
[1,]    3    3
[2,]    1    1
[3,]    2    1
[4,]    3    3
[5,]    3    1
[6,]    2    1
[1] 0.5921783
    1     2     3     4 
0.106 0.294 0.605 1.000 
    1     2     3     4 
0.403 0.713 0.904 1.000 
[[1]]
[1] 0.2962963 0.7407407 0.9629630

[[2]]
[1] 0.01234568 0.11111111 0.40740741 0.80246914

[[1]]
2 x 2 Matrix of class "dsyMatrix"
           [,1]       [,2]
[1,]  1.0000000 -0.8660254
[2,] -0.8660254  1.0000000

[[2]]
2 x 2 Matrix of class "dsyMatrix"
          [,1]      [,2]
[1,] 1.0000000 0.8499879
[2,] 0.8499879 1.0000000

     [,1] [,2]
[1,]    0    2
[2,]    2    3
[3,]    0    2
[4,]    2    3
[5,]    1    2
[6,]    0    3
          [,1]      [,2]
[1,] 1.0000000 0.5986331
[2,] 0.5986331 1.0000000
    0     1     2     3 
0.310 0.739 0.957 1.000 
    0     1     2     3     4 
0.010 0.118 0.419 0.811 1.000 
[[1]]
4 x 4 Matrix of class "dsyMatrix"
           [,1]       [,2]       [,3]       [,4]
[1,]  1.0000000 -0.8164966 -0.8944272 -0.8485281
[2,] -0.8164966  1.0000000 -0.9128709 -0.9237604
[3,] -0.8944272 -0.9128709  1.0000000 -0.9486833
[4,] -0.8485281 -0.9237604 -0.9486833  1.0000000

[[2]]
4 x 4 Matrix of class "dsyMatrix"
          [,1]      [,2]      [,3]      [,4]
[1,] 1.0000000 0.8164966 0.8944272 0.8485281
[2,] 0.8164966 1.0000000 0.9128709 0.9237604
[3,] 0.8944272 0.9128709 1.0000000 0.9486833
[4,] 0.8485281 0.9237604 0.9486833 1.0000000

     [,1] [,2] [,3] [,4]
[1,]  1.0  0.5  0.4  0.3
[2,]  0.5  1.0  0.5  0.4
[3,]  0.4  0.5  1.0  0.5
[4,]  0.3  0.4  0.5  1.0
          [,1]      [,2]      [,3]      [,4]
[1,] 1.0000000 0.4373221 0.2979056 0.1453077
[2,] 0.4373221 1.0000000 0.4456337 0.3782520
[3,] 0.2979056 0.4456337 1.0000000 0.3892966
[4,] 0.1453077 0.3782520 0.3892966 1.0000000
   1    2    3    4    5 
0.22 0.45 0.62 0.81 1.00 
     [,1] [,2] [,3] [,4]
[1,]    1    3    3    5
[2,]    2    1    1    4
[3,]    1    3    2    5
[4,]    1    1    1    1
[5,]    1    2    4    2
[6,]    2    3    1    4
          [,1]      [,2]      [,3]      [,4]
[1,] 1.0000000 0.6576809 0.5230593 0.3920468
[2,] 0.6576809 1.0000000 0.5853667 0.4669768
[3,] 0.5230593 0.5853667 1.0000000 0.5587005
[4,] 0.3920468 0.4669768 0.5587005 1.0000000
     [,1] [,2] [,3] [,4]
[1,]    1    3    3    5
[2,]    2    1    1    4
[3,]    1    3    2    5
[4,]    1    1    1    1
[5,]    1    2    4    2
[6,]    2    3    1    4
2 x 2 Matrix of class "lgeMatrix"
     [,1] [,2]
[1,] TRUE TRUE
[2,] TRUE TRUE
          [,1]      [,2]
[1,] 1.0000000 0.2628508
[2,] 0.2628508 1.0000000
          [,1]      [,2]
[1,] 1.0000000 0.2639675
[2,] 0.2639675 1.0000000
     [,1] [,2]
[1,]    4    6
[2,]    5    8
[3,]    1    4
[4,]    2    3
[5,]    1    4
[6,]    2    4
4 x 4 Matrix of class "lgeMatrix"
     [,1] [,2] [,3] [,4]
[1,] TRUE TRUE TRUE TRUE
[2,] TRUE TRUE TRUE TRUE
[3,] TRUE TRUE TRUE TRUE
[4,] TRUE TRUE TRUE TRUE
          [,1]      [,2]      [,3]      [,4]
[1,] 1.0000000 0.3826825 0.3259881 0.3178154
[2,] 0.3826825 1.0000000 0.3259881 0.3178154
[3,] 0.3259881 0.3259881 1.0000000 0.2628508
[4,] 0.3178154 0.3178154 0.2628508 1.0000000
          [,1]      [,2]      [,3]      [,4]
[1,] 1.0000000 0.2500256 0.2384720 0.2540702
[2,] 0.2500256 1.0000000 0.2294206 0.2500565
[3,] 0.2384720 0.2294206 1.0000000 0.2626674
[4,] 0.2540702 0.2500565 0.2626674 1.0000000
     [,1] [,2] [,3] [,4]
[1,]    1    1    4    8
[2,]    1    1    1    1
[3,]    0    0    0    2
[4,]    0    0    2    9
[5,]    0    1    3    6
[6,]    1    0    2    6

GenOrd documentation built on May 2, 2019, 8:16 a.m.