downSample: Down sample the observations in a mixture
In CNPBayes: Bayesian mixture models for copy number polymorphisms

Description Usage Arguments Value Examples

For large datasets (several thousand subjects), the computational burden for fitting Bayesian mixture models can be high. Downsampling can reduce the computational burden with little effect on inference. This function draws a random sample with replacement. Batches with few observations are combined with larger batches that have a similar median log R ratio.

1	downSample(dat, size = 1000, min.batchsize = 75)

`dat`	data.frame with required columns medians, batches, and plate
`size`	the number of observations to sample with replacement
`min.batchsize`	the smallest number of observations allowed in a batch. Batches smaller than this size will be combined with other batches

A tibble of the downsampled data (medians), the original batches, and the updated batches after downsampling

## TODO: this is more complicated than it needs to be
library(dplyr)
mb <- MultiBatchModelExample
mapping <- tibble(plate=letters[1:10],
                  batch_orig=sample(c("1", "2", "3"), 10, replace=TRUE))
full.data <- tibble(medians=y(mb),
                    batch_orig=as.character(batch(mb))) %>%
  left_join(mapping, by="batch_orig")
partial.data <- downSample(full.data, 200)
## map the original batches to the batches after down-sampling
mapping <- partial.data %>%
  select(c(plate, batch_index)) %>%
  group_by(plate) %>%
  summarize(batch_index=unique(batch_index))
mp <- McmcParams(iter=50, burnin=100)
## Not run: 
    mb2 <- MultiBatchModel2(dat=ds$medians,
                            batches=ds$batch_index, mp=mp)
    mb2 <- posteriorSimulation(mb2)
    if(FALSE) ggMixture(mb2)
    full.dat2 <- full.data %>%
      left_join(mapping, by="plate")
    ## compute probabilities for the full dataset
    mb.up <- upSample2(full.dat2, mb2)
    if(FALSE) ggMixture(mb2)

## End(Not run)