genMultinomialData: Generate multinomial data

View source: R/multinomial.R

genMultinomialDataR Documentation

Generate multinomial data

Description

Generate two sets of multinomially distributed vectors using rmultinom. Useful for hypothesis testing simulations. Three different experiments with different probability vectors (of length k) are available in addition to user-specified probability vector p:

  • Experiment 1: p_{1i} = \frac{1/i^α}{∑_1^k 1/i^α}. When the null_hyp parameter is FALSE, the probability vector for the 2nd group is generated by switching the position of 1st and m^th entries.

  • Experiment 2: p_{1i} = 1/k. When the null_hyp parameter is FALSE, p_{2i} = 0 for i \in 1...b and p_{2,b+1}= ∑_{1}^{b+1} p_{1i} = (b+1)/k .

  • Experiment 3: p_{1i} = 1/k. When the null_hyp parameter is FALSE, p_{2i} = 0 for i \in 1...b and p_{2i} = 1/(k − b) for i > b.

Usage

genMultinomialData(
  null_hyp = TRUE,
  p = NULL,
  k = 2000,
  n = c(8000, 8000),
  sample_size = 30,
  expID = 1,
  alpha = 0.45,
  m = 1000,
  numzero = 50,
  ...
)

Arguments

null_hyp

logical; if TRUE, generate data using the same distribution. Default value is TRUE.

p

An optional 2 by k matrix specifying the probabilities of the k categories for each of the two groups. Each row of p must sum to 1. If defined, all remaining parameters in the function definition are ignored. Default value is NULL.

k

integer representing dimension (number of categories). Default 2000.

n

Vector of length 2 specifying the parameter of each multinomial distribution used to define the total number of objects that are put into k bins in the typical multinomial experiment.

sample_size

integer specifying the number of random vectors to generate for each of the two groups.

expID

Experiment number 1-3. Default is 1.

alpha

Number between 0 and 1. Used for experiment 1. Default is 0.45.

m

integer between 2 and k. Used in experiment 1 for the alternative hypothesis. Default is 1000.

numzero

integer between 1 and k-1. Used in experiments 2 and 3 for the alternative hypothesis. Default is 50.

...

Additional parameters.

Value

A list containing two matrices each having dimension sample_size by k.

Examples

#Generate data when the null hypothesis is FALSE:
X <- genMultinomialData(FALSE)

#Dimension of the two generated datasets:
lapply(X, dim)

#Proportion of entries less than 5 in the first dataset:
sum(X[[1]]<5)/(nrow(X[[1]])*ncol(X[[1]]))


AmandaRP/hddtest documentation built on March 18, 2023, 5:53 p.m.