Prepare the Data Structure for Exact NB test for TwoGroup Comparison
Description
Create the NBP data structure, (optionally) normalize the counts, and thin the counts to make the effective library sizes equal.
Usage
1 2  prepare.nbp(counts, grp.ids, lib.sizes = colSums(counts),
norm.factors = NULL, thinning = TRUE, print.level = 1)

Arguments
counts 
an n by r matrix of RNASeq read counts with rows corresponding to genes (exons, gene isoforms, etc) and columns corresponding to libraries (independent biological samples). 
grp.ids 
an r vector of treatment group identifiers (can be a vector of integers, chars or strings). 
lib.sizes 
library sizes, an r vector of numbers. By default, library sizes are estimated by column sums. 
norm.factors 
normalization factors, an r
vector of numbers. If 
thinning 
a boolean variable (i.e., logical). If

print.level 
a number, controls the amount of messages printed: 0 for suppressing all messages, 1 (default) for basic progress messages, and 2 to 5 for increasingly more detailed messages. 
Details
Normalization
We take gene expression to be indicated by relative frequency of RNASeq reads mapped to a gene, relative to library sizes (column sums of the count matrix). Since the relative frequencies sum to 1 in each library (one column of the count matrix), the increased relative frequencies of truly over expressed genes in each column must be accompanied by decreased relative frequencies of other genes, even when those others do not truly differently express. Robinson and Oshlack (2010) presented examples where this problem is noticeable.
A simple fix is to compute the relative frequencies relative to effective library sizesâ€”library sizes multiplied by normalization factors. Many authors (Robinson and Oshlack (2010), Anders and Huber (2010)) propose to estimate the normalization factors based on the assumption that most genes are NOT differentially expressed.
By default, prepare.nbp
does not estimate the
normalization factors, but can incorporate user specified
normalization factors through the argument
norm.factors
.
Library Size Adjustment
The exact test requires that the effective library sizes
(column sums of the count matrix multiplied by
normalization factors) are approximately equal. By default,
prepare.nbp
will thin (downsample) the counts to
make the effective library sizes equal. Thinning may lose
statistical efficiency, but is unlikely to introduce bias.
Value
A list containing the following components:
counts 
the count matrix, same as input. 
lib.sizes 
column sums of the count matrix. 
grp.ids 
a vector of identifiers of treatment groups, same as input. 
eff.lib.sizes 
effective library sizes, lib.sizes multiplied by the normalization factors. 
pseudo.counts 
count matrix after thinning. 
pseduo.lib.sizes 
effective library sizes of pseudo counts, i.e., column sums of the pseudo count matrix multiplied by the normalization. 
Note
Due to thinning (random downsampling of counts), two
identical calls to prepare.nbp
may yield slightly
different results. A random number seed can be used to make
the results reproducible.
See Also
nbp.test
Examples
1  ## See the example for exact.nb.test
