run_abs_simulation: Run a copy number simulation
In warrenmcg/absSimSeq: RNA-Seq Simulation with Absolute Counts

This function runs a simulation that explicitly defines copy numbers, which can be used to test whether compositional changes leads to consistent or misleading results based on the analysis done. After simulating an experiment using the copy numbers, those numbers are converted into expected reads to be used for a polyester simulation.

run_abs_simulation(sleuth, fasta_file, sample_index = "mean",
  outdir = ".", num_reps = c(10, 10), denom = NULL, seed = 1,
  num_runs = 1, gc_bias = NULL, de_probs = 0.1, de_type = "normal",
  de_levels = c(1.25, 2, 4), dir_probs = 0.5, mean_lib_size = 20 *
  10^6, single_value = TRUE, polyester_sim = FALSE,
  control_condition = NULL, num_cores = 1, include_spikeins = TRUE,
  spikein_mix = "Mix1", spikein_percent = 0.02)

`sleuth,`	a sleuth object or a character string with an R-Data file containing a sleuth object saved using 'sleuth_save'. This object contains results from a real experiment.
`fasta_file,`	a multiFASTA file with the transcripts to be used in the simulation (required for polyester)
`sample_index,`	which sample from the real dataset should be used as the starting point for the simulation? You may use a number or string, as long as it is a valid column index for the dataset. If "mean" is given, the default, then the mean of the control samples will be used.
`outdir,`	where should the simulated reads be written to?
`num_reps,`	the number of samples in each condition. Note that this only currently supports two conditions, so this must be length 2.
`denom,`	the name(s) of transcript(s) that will be used as the denominator for showing how the data will behave after ALR transformation. The default is `NULL`, which indicates that this function will choose the first feature that is simulated to not change as the denominator.
`seed,`	the random seed to be used for reproducibility
`num_runs,`	the number of simulations to run
`gc_bias,`	integer vector of length `sum(num_reps)` of the GC bias to be used by polyester. Only numbers between 0 and 7. See ?polyester::simulate_experiment under "gcbias" in the Details section for more information. The default is `NULL`, which means that all samples will be set to 0 (i.e. no bias).
`de_probs,`	vector of same length as `num_runs`, with numbers between 0 and 1 describing the probability of differential expression for each simulation
`de_type,`	either "discrete" or "normal" (the default) to indicate using discrete levels of differential expression, or to used a truncated normal for a continuum of differential expression. The levels of discrete DE, or the parameters for the truncated normal, are determined by `de_levels`.
`de_levels,`	if `de_type` is "discrete", this is a vector with one or more numbers > 1 to indicate the levels of differential expression (e.g. 50 "normal", this is a vector of length 3 specifying the following parameters for the `rtruncnorm` function: a (the min of the truncated normal; it should be > 1), mean, and sd. When the direction is down, the inverse of these levels will be used.
`dir_probs,`	vector of same length as `num_runs`, with numbers between 0 and 1 describing the probability of differential expression being increased, given a transcript that is changing.
`mean_lib_size,`	the average number of reads per library to be simulated. Variability in the exact library size per sample will be introduced with a normal using a coefficient of variation of 5 (default is 20 million reads).
`single_value,`	if `TRUE`, sizes are calculated for the whole experiment using DESeq2 estimateDispersions; otherwise, sizes are interpolated using the dispersion function from DESeq2 using the mean counts for each condition.
`polyester_sim,`	should polyester be run? (default to `FALSE` to save time when you are merely interested in the ground truth)
`control_condition,`	what factor level should be used to define the control condition? This is used to select control samples to estimate dispersions for a null distribution, i.e. variance of estimated counts in an experiment without an expectation of differential expression. The default, `NULL`, uses all of the samples in the provided sleuth_file. Note that if this is specified, DESeq2 will estimate dispersions using an intercept only model (~1), whereas if it is left `NULL`, the full formula from the sleuth object will be used (obj$full_formula).
`num_cores`	the number of cores to be used to run parallel simulations. the default is to use just one.
`include_spikeins`	if `TRUE`, will add spike-ins to the simulated experiment.
`spikein_mix`	character specifying which mix to use; only accepts "Mix1" or "Mix2". If a different mix is desired for each condition, specify a character vector containing a mix for each condition. The default is "Mix1".
`spikein_percent`	what percent of the total copy numbers in the control condition should be spike-in controls? The default is 2%.

returns invisibly a list with three members:

results: a list of lists, one entry for each simulation. Each simulation's results has the following entries:
- all of the entries returned by generate_abs_changes
- sizes: the size parameter for each transcript
- expected_reads: an N x 2 matrix with the expected number of fragments for each transcript in each condition
- adjusted_consistent_changes: consistency comparing copy numbers to the relative data after normalization using the DESeq procedure
- adjusted_fold_changes: the fold changes perceived after normalization using the DESeq procedure
alr_results: a list of lists, one entry for each simulation. Each simulation's alr_data list contains the results from calculate_rel_consistency
params: a list of the parameters used for this simulation