# ordsample: Drawing a sample of discrete data In GenOrd: Simulation of Discrete Random Variables with Given Correlation Matrix and Marginal Distributions

## Description

The function draws a sample from a multivariate discrete variable with correlation matrix Sigma and prescribed marginal distributions marginal

## Usage

 1 2 ordsample(n, marginal, Sigma, support = list(), Spearman = FALSE, cormat = "discrete")

## Arguments

 n the sample size marginal a list of k elements, where k is the number of variables. The i-th element of marginal is the vector of the cumulative probabilities defining the marginal distribution of the i-th component of the multivariate variable. If the i-th component can take k_i values, the i-th element of marginal will contain k_i-1 probabilities (the k_i-th is obviously 1 and shall not be included). Sigma the target correlation matrix of the multivariate discrete variable support a list of k elements, where k is the number of variables. The i-th element of support is the vector containing the ordered values of the support of the i-th variable. By default, the support of the i-th variable is 1,2,...,k_i Spearman if TRUE, the function finds Spearman's correlations (and it is not necessary to provide support), if FALSE (default) Pearson's correlations cormat "discrete" if the Sigma in input is the target correlation matrix of the multivariate discrete variable; "continuous" if the Sigma in input is the intermediate correlation matrix of the multivariate standard normal

## Value

a n\times k matrix of values drawn from the k-variate discrete r.v. with the desired marginal distributions and correlation matrix

## Author(s)

Alessandro Barbiero, Pier Alda Ferrari

## Examples

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 # Example 1 # draw a sample from a bivariate ordinal variable # with 4 of categories and asymmetrical marginal distributions # and correlation coefficient 0.6 (to be checked) k <- 2 marginal <- list(c(0.1,0.3,0.6), c(0.4,0.7,0.9)) corrcheck(marginal) # check ok Sigma <- matrix(c(1,0.6,0.6,1),2,2) # sample size 1000 n <- 1000 # generate a sample of size n m <- ordsample(n, marginal, Sigma) head(m) # sample correlation matrix cor(m) # compare it with Sigma # empirical marginal distributions cumsum(table(m[,1]))/n cumsum(table(m[,2]))/n # compare them with the two marginal distributions # Example 1bis # draw a sample from a bivariate ordinal variable # with 4 of categories and asymmetrical marginal distributions # and Spearman correlation coefficient 0.6 (to be checked) k <- 2 marginal <- list(c(0.1,0.3,0.6), c(0.4,0.7,0.9)) corrcheck(marginal, Spearman=TRUE) # check ok Sigma <- matrix(c(1,0.6,0.6,1),2,2) # sample size 1000 n <- 1000 # generate a sample of size n m <- ordsample(n, marginal, Sigma, Spearman=TRUE) head(m) # sample correlation matrix cor(rank(m[,1]),rank(m[,2])) # compare it with Sigma # empirical marginal distributions cumsum(table(m[,1]))/n cumsum(table(m[,2]))/n # compare them with the two marginal distributions # Example 1ter # draw a sample from a bivariate random variable # with binomial marginal distributions (n=3, p=1/3 and n=4, p=2/3) # and Pearson correlation coefficient 0.6 (to be checked) k <- 2 marginal <- list(pbinom(0:2, 3, 1/3),pbinom(0:3, 4, 2/3)) marginal corrcheck(marginal, support=list(0:3, 0:4)) # check ok Sigma <- matrix(c(1,0.6,0.6,1),2,2) # sample size 1000 n <- 1000 # generate a sample of size n m <- ordsample(n, marginal, Sigma, support=list(0:3,0:4)) head(m) # sample correlation matrix cor(m) # compare it with Sigma # empirical marginal distributions cumsum(table(m[,1]))/n cumsum(table(m[,2]))/n # compare them with the two marginal distributions # Example 2 # draw a sample from a 4-dimensional ordinal variable # with different number of categories and uniform marginal distributions # and different correlation coefficients k <- 4 marginal <- list(0.5, c(1/3,2/3), c(1/4,2/4,3/4), c(1/5,2/5,3/5,4/5)) corrcheck(marginal) # select a feasible correlation matrix Sigma <- matrix(c(1,0.5,0.4,0.3,0.5,1,0.5,0.4,0.4,0.5,1,0.5,0.3,0.4,0.5,1), 4, 4, byrow=TRUE) Sigma # sample size 100 n <- 100 # generate a sample of size n set.seed(1) m <- ordsample(n, marginal, Sigma) # sample correlation matrix cor(m) # compare it with Sigma # empirical marginal distribution cumsum(table(m[,4]))/n # compare it with the fourth marginal head(m) # or equivalently... set.seed(1) res <- ordcont(marginal, Sigma) res[[1]] # the intermediate correlation matrix of the multivariate normal m <- ordsample(n, marginal, res[[1]], cormat="continuous") head(m) # Example 3 # simulation of two correlated Poisson r.v. # modification to GenOrd sampling function for Poisson distribution ordsamplep<-function (n, lambda, Sigma) { k <- length(lambda) valori <- mvrnorm(n, rep(0, k), Sigma) for (i in 1:k) { valori[, i] <- qpois(pnorm(valori[,i]), lambda[i]) } return(valori) } # number of variables k <- 2 # Poisson parameters lambda <- c(2, 5) # correlation matrix Sigma <- matrix(0.25, 2, 2) diag(Sigma) <- 1 # sample size n <- 10000 # preliminar stage: support TRUNCATION # required for recovering the correlation matrix # of the standard bivariate normal # truncation error epsilon <- 0.0001 # corresponding maximum value kmax <- qpois(1-epsilon, lambda) # truncated marginals l <- list() for(i in 1:k) { l[[i]] <- 0:kmax[i] } marg <- list() for(i in 1:k) { marg[[i]] <- dpois(0:kmax[i],lambda[i]) marg[[i]][kmax[i]+1] <- 1-sum(marg[[i]][1:(kmax[i])]) } cm <- list() for(i in 1:k) { cm[[i]] <- cumsum(marg[[i]]) cm[[i]] <- cm[[i]][-(kmax[i]+1)] } # check feasibility of correlation matrix RB <- corrcheck(cm, support=l) RL <- RB[[1]] RU <- RB[[2]] Sigma <= RU & Sigma >= RL # OK res <- ordcont(cm, Sigma, support=l) res[[1]] Sigma <- res[[1]] # draw the sample m <- ordsamplep(n, lambda, Sigma) # sample correlation matrix cor(m) head(m) # Example 4 # simulation of 4 correlated binary and Poisson r.v.'s (2+2) # modification to GenOrd sampling function ordsamplep <- function (n, marginal, lambda, Sigma) { k <- length(lambda) valori <- mvrnorm(n, rep(0, k), Sigma) for(i in 1:k) { if(lambda[i]==0) { valori[, i] <- as.integer(cut(valori[, i], breaks = c(min(valori[,i]) - 1, qnorm(marginal[[i]]), max(valori[, i]) + 1))) valori[, i] <- support[[i]][valori[, i]] } else { valori[, i] <- qpois(pnorm(valori[,i]), lambda[i]) } } return(valori) } # number of variables k <- 4 # Poisson parameters (only 3rd and 4th are Poisson) lambda <- c(0, 0, 2, 5) # 1st and 2nd are Bernoulli with p=0.5 marginal <- list() marginal[[1]] <- .5 marginal[[2]] <- .5 marginal[[3]] <- 0 marginal[[4]] <- 0 # support support <- list() support[[1]] <- 0:1 support[[2]] <- 0:1 # correlation matrix Sigma <- matrix(0.25, k, k) diag(Sigma) <- 1 # sample size n <- 10000 # preliminar stage: support TRUNCATION # required for recovering the correlation matrix # of the standard bivariate normal # truncation error epsilon <- 0.0001 # corresponding maximum value kmax <- qpois(1-epsilon, lambda) # truncated marginals for(i in 3:4) { support[[i]] <- 0:kmax[i] } marg <- list() for(i in 3:4) { marg[[i]] <- dpois(0:kmax[i],lambda[i]) marg[[i]][kmax[i]+1] <- 1-sum(marg[[i]][1:(kmax[i])]) } for(i in 3:4) { marginal[[i]] <- cumsum(marg[[i]]) marginal[[i]] <- marginal[[i]][-(kmax[i]+1)] } # check feasibility of correlation matrix RB <- corrcheck(marginal, support=support) RL <- RB[[1]] RU <- RB[[2]] Sigma <= RU & Sigma >= RL # OK # compute correlation matrix of the 4-variate standard normal res <- ordcont(marginal, Sigma, support=support) res[[1]] Sigma <- res[[1]] # draw the sample m <- ordsamplep(n, marginal, lambda, Sigma) # sample correlation matrix cor(m) head(m)

### Example output

[[1]]
2 x 2 Matrix of class "dsyMatrix"
[,1] [,2]
[1,]    1   -1
[2,]   -1    1

[[2]]
2 x 2 Matrix of class "dsyMatrix"
[,1] [,2]
[1,]  1.0  0.8
[2,]  0.8  1.0

[,1] [,2]
[1,]    4    3
[2,]    4    3
[3,]    4    2
[4,]    3    1
[5,]    2    1
[6,]    3    2
[,1]      [,2]
[1,] 1.0000000 0.6069379
[2,] 0.6069379 1.0000000
1     2     3     4
0.106 0.299 0.596 1.000
1     2     3     4
0.377 0.689 0.898 1.000
[[1]]
2 x 2 Matrix of class "dsyMatrix"
[,1] [,2]
[1,]    1   -1
[2,]   -1    1

[[2]]
2 x 2 Matrix of class "dsyMatrix"
[,1]      [,2]
[1,] 1.0000000 0.8733333
[2,] 0.8733333 1.0000000

[,1] [,2]
[1,]    3    3
[2,]    1    1
[3,]    2    1
[4,]    3    3
[5,]    3    1
[6,]    2    1
[1] 0.5921783
1     2     3     4
0.106 0.294 0.605 1.000
1     2     3     4
0.403 0.713 0.904 1.000
[[1]]
[1] 0.2962963 0.7407407 0.9629630

[[2]]
[1] 0.01234568 0.11111111 0.40740741 0.80246914

[[1]]
2 x 2 Matrix of class "dsyMatrix"
[,1]       [,2]
[1,]  1.0000000 -0.8660254
[2,] -0.8660254  1.0000000

[[2]]
2 x 2 Matrix of class "dsyMatrix"
[,1]      [,2]
[1,] 1.0000000 0.8499879
[2,] 0.8499879 1.0000000

[,1] [,2]
[1,]    0    2
[2,]    2    3
[3,]    0    2
[4,]    2    3
[5,]    1    2
[6,]    0    3
[,1]      [,2]
[1,] 1.0000000 0.5986331
[2,] 0.5986331 1.0000000
0     1     2     3
0.310 0.739 0.957 1.000
0     1     2     3     4
0.010 0.118 0.419 0.811 1.000
[[1]]
4 x 4 Matrix of class "dsyMatrix"
[,1]       [,2]       [,3]       [,4]
[1,]  1.0000000 -0.8164966 -0.8944272 -0.8485281
[2,] -0.8164966  1.0000000 -0.9128709 -0.9237604
[3,] -0.8944272 -0.9128709  1.0000000 -0.9486833
[4,] -0.8485281 -0.9237604 -0.9486833  1.0000000

[[2]]
4 x 4 Matrix of class "dsyMatrix"
[,1]      [,2]      [,3]      [,4]
[1,] 1.0000000 0.8164966 0.8944272 0.8485281
[2,] 0.8164966 1.0000000 0.9128709 0.9237604
[3,] 0.8944272 0.9128709 1.0000000 0.9486833
[4,] 0.8485281 0.9237604 0.9486833 1.0000000

[,1] [,2] [,3] [,4]
[1,]  1.0  0.5  0.4  0.3
[2,]  0.5  1.0  0.5  0.4
[3,]  0.4  0.5  1.0  0.5
[4,]  0.3  0.4  0.5  1.0
[,1]      [,2]      [,3]      [,4]
[1,] 1.0000000 0.4373221 0.2979056 0.1453077
[2,] 0.4373221 1.0000000 0.4456337 0.3782520
[3,] 0.2979056 0.4456337 1.0000000 0.3892966
[4,] 0.1453077 0.3782520 0.3892966 1.0000000
1    2    3    4    5
0.22 0.45 0.62 0.81 1.00
[,1] [,2] [,3] [,4]
[1,]    1    3    3    5
[2,]    2    1    1    4
[3,]    1    3    2    5
[4,]    1    1    1    1
[5,]    1    2    4    2
[6,]    2    3    1    4
[,1]      [,2]      [,3]      [,4]
[1,] 1.0000000 0.6576809 0.5230593 0.3920468
[2,] 0.6576809 1.0000000 0.5853667 0.4669768
[3,] 0.5230593 0.5853667 1.0000000 0.5587005
[4,] 0.3920468 0.4669768 0.5587005 1.0000000
[,1] [,2] [,3] [,4]
[1,]    1    3    3    5
[2,]    2    1    1    4
[3,]    1    3    2    5
[4,]    1    1    1    1
[5,]    1    2    4    2
[6,]    2    3    1    4
2 x 2 Matrix of class "lgeMatrix"
[,1] [,2]
[1,] TRUE TRUE
[2,] TRUE TRUE
[,1]      [,2]
[1,] 1.0000000 0.2628508
[2,] 0.2628508 1.0000000
[,1]      [,2]
[1,] 1.0000000 0.2639675
[2,] 0.2639675 1.0000000
[,1] [,2]
[1,]    4    6
[2,]    5    8
[3,]    1    4
[4,]    2    3
[5,]    1    4
[6,]    2    4
4 x 4 Matrix of class "lgeMatrix"
[,1] [,2] [,3] [,4]
[1,] TRUE TRUE TRUE TRUE
[2,] TRUE TRUE TRUE TRUE
[3,] TRUE TRUE TRUE TRUE
[4,] TRUE TRUE TRUE TRUE
[,1]      [,2]      [,3]      [,4]
[1,] 1.0000000 0.3826825 0.3259881 0.3178154
[2,] 0.3826825 1.0000000 0.3259881 0.3178154
[3,] 0.3259881 0.3259881 1.0000000 0.2628508
[4,] 0.3178154 0.3178154 0.2628508 1.0000000
[,1]      [,2]      [,3]      [,4]
[1,] 1.0000000 0.2500256 0.2384720 0.2540702
[2,] 0.2500256 1.0000000 0.2294206 0.2500565
[3,] 0.2384720 0.2294206 1.0000000 0.2626674
[4,] 0.2540702 0.2500565 0.2626674 1.0000000
[,1] [,2] [,3] [,4]
[1,]    1    1    4    8
[2,]    1    1    1    1
[3,]    0    0    0    2
[4,]    0    0    2    9
[5,]    0    1    3    6
[6,]    1    0    2    6

GenOrd documentation built on May 2, 2019, 8:16 a.m.