downSample: Down sample the observations in a mixture

Description Usage Arguments Value Examples

Description

For large datasets (several thousand subjects), the computational burden for fitting Bayesian mixture models can be high. Downsampling can reduce the computational burden with little effect on inference. This function draws a random sample with replacement. Batches with few observations are combined with larger batches that have a similar median log R ratio.

Usage

1
downSample(dat, size = 1000, min.batchsize = 75)

Arguments

dat

data.frame with required columns medians, batches, and plate

size

the number of observations to sample with replacement

min.batchsize

the smallest number of observations allowed in a batch. Batches smaller than this size will be combined with other batches

Value

A tibble of the downsampled data (medians), the original batches, and the updated batches after downsampling

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
## TODO: this is more complicated than it needs to be
library(dplyr)
mb <- MultiBatchModelExample
mapping <- tibble(plate=letters[1:10],
                  batch_orig=sample(c("1", "2", "3"), 10, replace=TRUE))
full.data <- tibble(medians=y(mb),
                    batch_orig=as.character(batch(mb))) %>%
  left_join(mapping, by="batch_orig")
partial.data <- downSample(full.data, 200)
## map the original batches to the batches after down-sampling
mapping <- partial.data %>%
  select(c(plate, batch_index)) %>%
  group_by(plate) %>%
  summarize(batch_index=unique(batch_index))
mp <- McmcParams(iter=50, burnin=100)
## Not run: 
    mb2 <- MultiBatchModel2(dat=ds$medians,
                            batches=ds$batch_index, mp=mp)
    mb2 <- posteriorSimulation(mb2)
    if(FALSE) ggMixture(mb2)
    full.dat2 <- full.data %>%
      left_join(mapping, by="plate")
    ## compute probabilities for the full dataset
    mb.up <- upSample2(full.dat2, mb2)
    if(FALSE) ggMixture(mb2)

## End(Not run)

CNPBayes documentation built on May 6, 2019, 4:06 a.m.