genMultinomialData: Generate multinomial data
In AmandaRP/hddtest: Hypothesis Testing for High Dimensional Discrete Data

View source: R/multinomial.R

genMultinomialData

R Documentation

Generate multinomial data

Description

Generate two sets of multinomially distributed vectors using rmultinom. Useful for hypothesis testing simulations. Three different experiments with different probability vectors (of length k) are available in addition to user-specified probability vector p:

Experiment 1: p_{1i} = \frac{1/i^α}{∑_1^k 1/i^α}. When the null_hyp parameter is FALSE, the probability vector for the 2nd group is generated by switching the position of 1st and m^th entries.
Experiment 2: p_{1i} = 1/k. When the null_hyp parameter is FALSE, p_{2i} = 0 for i \in 1...b and p_{2,b+1}= ∑_{1}^{b+1} p_{1i} = (b+1)/k .
Experiment 3: p_{1i} = 1/k. When the null_hyp parameter is FALSE, p_{2i} = 0 for i \in 1...b and p_{2i} = 1/(k − b) for i > b.

Usage

genMultinomialData(
  null_hyp = TRUE,
  p = NULL,
  k = 2000,
  n = c(8000, 8000),
  sample_size = 30,
  expID = 1,
  alpha = 0.45,
  m = 1000,
  numzero = 50,
  ...
)

Arguments

`null_hyp`	logical; if TRUE, generate data using the same distribution. Default value is TRUE.
`p`	An optional 2 by k matrix specifying the probabilities of the k categories for each of the two groups. Each row of `p` must sum to 1. If defined, all remaining parameters in the function definition are ignored. Default value is NULL.
`k`	integer representing dimension (number of categories). Default 2000.
`n`	Vector of length 2 specifying the parameter of each multinomial distribution used to define the total number of objects that are put into k bins in the typical multinomial experiment.
`sample_size`	integer specifying the number of random vectors to generate for each of the two groups.
`expID`	Experiment number 1-3. Default is 1.
`alpha`	Number between 0 and 1. Used for experiment 1. Default is 0.45.
`m`	integer between 2 and k. Used in experiment 1 for the alternative hypothesis. Default is 1000.
`numzero`	integer between 1 and k-1. Used in experiments 2 and 3 for the alternative hypothesis. Default is 50.
`...`	Additional parameters.

Value

A list containing two matrices each having dimension sample_size by k.

Examples

#Generate data when the null hypothesis is FALSE:
X <- genMultinomialData(FALSE)

#Dimension of the two generated datasets:
lapply(X, dim)

#Proportion of entries less than 5 in the first dataset:
sum(X[[1]]<5)/(nrow(X[[1]])*ncol(X[[1]]))

AmandaRP/hddtest documentation built on March 18, 2023, 5:53 p.m.