genMultinomialData | R Documentation |
Generate two sets of multinomially distributed vectors using
rmultinom
. Useful for hypothesis testing simulations. Three different
experiments with different probability vectors (of length k) are
available in addition to user-specified probability vector p
:
Experiment 1: p_{1i} = \frac{1/i^α}{∑_1^k 1/i^α}.
When the null_hyp
parameter is FALSE, the probability vector
for the 2nd group is generated by switching the position of 1st and
m^th entries.
Experiment 2: p_{1i} = 1/k. When the null_hyp
parameter is FALSE,
p_{2i} = 0 for i \in 1...b and
p_{2,b+1}= ∑_{1}^{b+1} p_{1i} = (b+1)/k .
Experiment 3: p_{1i} = 1/k. When the null_hyp
parameter is FALSE,
p_{2i} = 0 for i \in 1...b and p_{2i} = 1/(k − b) for
i > b.
genMultinomialData( null_hyp = TRUE, p = NULL, k = 2000, n = c(8000, 8000), sample_size = 30, expID = 1, alpha = 0.45, m = 1000, numzero = 50, ... )
null_hyp |
logical; if TRUE, generate data using the same distribution. Default value is TRUE. |
p |
An optional 2 by k matrix specifying the probabilities of the
k categories for each of the two groups. Each row of |
k |
integer representing dimension (number of categories). Default 2000. |
n |
Vector of length 2 specifying the parameter of each multinomial distribution used to define the total number of objects that are put into k bins in the typical multinomial experiment. |
sample_size |
integer specifying the number of random vectors to generate for each of the two groups. |
expID |
Experiment number 1-3. Default is 1. |
alpha |
Number between 0 and 1. Used for experiment 1. Default is 0.45. |
m |
integer between 2 and k. Used in experiment 1 for the alternative hypothesis. Default is 1000. |
numzero |
integer between 1 and k-1. Used in experiments 2 and 3 for the alternative hypothesis. Default is 50. |
... |
Additional parameters. |
A list containing two matrices each having dimension
sample_size
by k.
#Generate data when the null hypothesis is FALSE: X <- genMultinomialData(FALSE) #Dimension of the two generated datasets: lapply(X, dim) #Proportion of entries less than 5 in the first dataset: sum(X[[1]]<5)/(nrow(X[[1]])*ncol(X[[1]]))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.