In csgillespie/voltagefit: Voltage Curve Fitting

suppressPackageStartupMessages(library(ggplot2))
theme_set(theme_bw())

Installing the package

The package lives on github and can be installed using

devtools::install_github("csgillespie/voltagefit", build_vignettes = TRUE)

and loaded in the usual way

library("voltagefit")

To enable parallelisation, run

library("doParallel")
## This sets the number of cores to use.
no_of_cores = max(parallel::detectCores() - 1, 1)
registerDoParallel(no_of_cores)

If this step is ommitted, then the package will only use a single core when fitting.

A single wafer

The package comes with a large number of example data sets. To avoid polluting the global name space, we're going to load them into their own environment, e

e = new.env()
data("voltage", package = "voltagefit", envir = e)
ls(envir = e)

The wafer object is just a data frame

head(get("wafer5210", envir = e), 3)

The key columns are wafer_id, id, ID and VG.

To fit the model to a single wafer, use the fit_wafer() function

wafer_5210 = fit_wafer(get("wafer5210", envir = e))
saveRDS(wafer_5210, file = "wafer_5210.Rds")

wafer_5210 = readRDS(file = "wafer_5210.Rds")

The object wafer_5210 is a data frame, with r ncol(wafer_5210) columns and r nrow(wafer_5210) rows. The columns are:

id: a unique idenfier for the wafer (the wafer_id);
cost: the error term from the curve fitting process;
direction: Forward or Backward
X1, ...: Parameters relating to the model.

We can plot the model fit via

plot_fit(wafer_5210)

Multiple wafers

First load in example data. The data corresponds

l = lapply(ls(envir = e), 
           function(i) fit_wafer(get(i, envir = e)))
all_wafers = Reduce(rbind, l)
saveRDS(all_wafers, file="all_wafers.Rds")

all_wafers = readRDS(file="all_wafers.Rds")

You can also view the parameters

plot(all_wafers)
hist(all_wafers)

Design matrix

For each voltage curve, we have fitted a logistic model, obtained parameter estimates and estimated how well the curve fits. This next stage attempts to estimate week and treatment effects. Using the data in all_wafers, we first create a design matrix descriping the experimental set-up. The design matrix should have three columns

wafer = unique(all_wafers$id)
week = c(rep(1:2, each=8), 1)
treatment = c(rep(1, 9), rep(2,8))
(design = data.frame(wafer = wafer, week = week, treatment = treatment))

We then fit a MANOVA model, using the parameter values as the response

fit_man = fit_manova(all_wafers, design)
(params = get_params(fit_man))

MANOVA is just a multivariate version of the ANOVA table. As such, we can obtain a $p$-value for week differences using

summary(fit_man$forward$man_w)

However, your sample size will often be two small to get p-values.

Mean and plausible curves

We can extract average curves for each treatment/week combination

means = mean(fit_man)

and plot them

library("ggplot2")
ggplot(means) + 
  geom_line(aes(VG, ID, colour=type)) + scale_y_log10() + 
  facet_grid(~direction)

Similarly, we can simulate plausible values

## Generate a sample of parameters for the underlying curve
unders = sample(fit_man, n = 20)
ss = unders$samples
ggplot(ss) + 
  geom_line(aes(VG, ID, group=sim), alpha=0.1) + scale_y_log10() + 
  facet_grid(type~direction) +
  geom_line(data=means, aes(VG, ID), colour="steelblue")

We can also graphically compare curves

#treatment2 - baseline
curve_d = curve_diff_mean(fit_man, type=c("baseline", "treatment2"), direction="Forward")
ggplot(curve_d, aes(VG, diff)) + 
  geom_line()

Similarly, with uncertainity

#treatment2 - baseline
curve_d = curve_diff_sample(fit_man, type=c("baseline", "treatment2"), n = 40,  
                          direction = "Forward")
ggplot(curve_d, aes(VG, diff)) + 
  geom_line(aes(group = sim), alpha=0.05) + 
  stat_smooth()