knitr::opts_chunk$set(echo = TRUE) setwd("~/Project Pipeline") source("General/Complete Pipeline 7.11.18.R") source("Binomial/Binom 7.11.18.R") source("Poisson/Pois 7.11.18.R") source("Negative Binomial/NegBinom 7.11.18.R") source("Beta Binomial/BetaBinom 7.11.18.R")
We are always looking for ways to make data manipulation, analysis, and diagnostics easier and more efficient. The tools included in this Tidyverse Pipeline aim to allow users to analyze data in a simple and model-driven fashion. The entensive use of R's Tidyverse (a combination of packages - including dplyr, purrr, and ggplot - that allow for data manipulation and analysis in a simple and continuous fashion) makes model building and data analysis simple and neat.
The pipeline follows the logical structure of analysis: building a model, fitting the model, and then doing any accessory work (including simulations, graphing, diagnostics, etc).
We will be using a simulated dataset, dt_sim
as an example.
# simulate the data mig_type <- c("Episodic","Chronic") month <- 0:6 trt <- c("A","B") # put into the proper structure dt_sim <- as_tibble(expand.grid(mig_type,month,trt) %>% `colnames<-`(c("mig_type","month","trt"))) %>% mutate(y=map(mig_type,function(x) {rzinegbin(100,size=30,munb=8,pstr0=0.05)} )) %>% unnest() %>% rename(mig = mig_type) dt_sim
The make_distribution()
functions are used to make the initial models. These functions are intended to help setup your future analysis and are to be used to define the predcitors in your model - the outcome will be assigned in a later step.
Supported models include:
These functions take in a list of variable names and return a tibble with the given distribution type and the model equation (a linear combination of the given preditors).
Note: The model equation returned will be of type symbol or language (not character). This is so they can be easily used later when fitting the model.
For example, to make a binomial model with trt
as your predictor:
binom <- make_binom(trt) binom; binom$model_eq
You can also provide multiple predictors to be included in your model.
pois <- make_pois(trt, month) pois; pois$model_eq
Additionally, to make an intercept only model, provide the argument 1
.
negbinom <- make_negbinom(1) negbinom; negbinom$model_eq
Since the return of these make_distribution
functions are a tibble, they can be bound together to create a list of potential models.
mod_tbl <- bind_rows(make_binom(trt), make_pois(trt, month), make_negbinom(1)) mod_tbl; mod_tbl$model_eq
Once a model has been setup using the make_distribution
functions, the model can then be fit to a dataset.
Using the function fit_model()
, specify exactly how you would like your model to be fit. This includes the arguments:
make_distribution
functions)model_options()
functionFor example, to fit an intercept only Poisson model in a frequentist framework:
mod_1 <- make_pois(1) %>% fit_model(frequentist, dt_sim, y) mod_1; mod_1$model_results
If you then want to add a grouping by the treatment received:
mod_2 <- make_pois(1) %>% fit_model(frequentist, dt_sim, y, trt) mod_2; mod_2$model_results
If you want to fit an intercept only model and a model with the treatment as the predictor for each type of migrane:
mod_3 <- bind_rows(make_pois(1), make_pois(trt)) %>% fit_model(frequentist, dt_sim, y, mig) mod_3
For both the binomial distribution and the betabinomial, the outcome can either be a list of 0/1 factors or it can be a vector of success/failures. If your outcome is 0/1 then you can fit the models as shown above. If your outcome is a vector of successes, then you should provide the outcome as usual but use the opt
argument to specify the maximum number of trials.
For example, in our dataset of migranes, assuming that every month has 30 days the maximum number of trials is 30. So, to fit a binomail distribution we would use the argument max
within the model_options function:
mod_4 <- make_binom(1) %>% fit_model(frequentist, dt_sim, y, opt = model_options(max = 30)) mod_4; mod_4$model_results
Each option is specific to the distribution being fit. Since no max
option is needed to fit a Poisson model, you can give it without any effect.
For example, to fit both a Poisson model and a Binomial model at the same time, both grouped by type of migrane:
mod_5 <- bind_rows(make_pois(trt), make_binom(trt)) %>% fit_model(frequentist, dt_sim, y, mig, model_options(max = 30)) mod_5; mod_5$model_results
By comparison, You could do this using base R in the following way:
dt_sim_A <- dt_sim[which(dt_sim$mig == "Episodic"),] dt_sim_B <- dt_sim[which(dt_sim$mig == "Chronic"),] glm(y ~ trt, family = poisson, data = dt_sim_A) glm(y ~ trt, family = poisson, data = dt_sim_B) glm(cbind(y, 30 - y) ~ trt, family = binomial, data = dt_sim_A) glm(cbind(y, 30 - y) ~ trt, family = binomial, data = dt_sim_B)
After the models have been fit, there is often a variety of things that people want to do with them.
There have been some supported functions that will work directly with the pipeline.
The function do_simulation()
provides a framework to simulate values from every model once they have been fit. The function takes in a tibble produced from fit_model()
and simulates from the given models. Some additional arguments that can provided include the number of simulations (nsim) and a seed number (seed). do_simulation()
returns a dataset in long format in which each row is one simulated number from the model.
make_pois(1) %>% fit_model(frequentist, dt_sim, y) %>% do_simulation()
The function do_prediction()
provides a framework to get predictions from every modely once they have been fit. The function takes in a tibble produced from fit_model()
and uses them to make the predictions.
Note: Currently, you cannot give values for the model to predict at. The predcitions occur at the mean value of the predictors.
Note: Currently, you cannot use predict when models are stacked. You can only feed in one fit model at a time.
make_binom(1) %>% fit_model(frequentist, dt_sim, y, opt = model_options(max = 30)) %>% do_prediction()
library(coastr) ### data read in dataPath <- "~/Temp" adtte <- import_isilon_data(dataPath,"adtte.sas7bdat") adsl <- import_isilon_data(dataPath,"adsl.sas7bdat") Analysis <- left_join(adtte,adsl,by=c("USUBJID"="USUBJID")) %>% filter(SAFFL.x=="Y") Analysis$PBACFL <- ifelse(Analysis$PBACFL == "Y", 1, ifelse(Analysis$PBACFL == "N", 0, 99))
make_binom(1) %>% set_priors(int = dnorm(0, .01)) %>% fit_model(bayesian, Analysis, PBACFL) bayes_models <- bind_rows(make_binom(1), make_binom(AVAL)) %>% set_priors(int = dnorm(0, .01), AVAL = dnorm(0, .05)) bayes_models; bayes_models$priors bind_rows(make_binom(1), make_binom(AVAL)) %>% set_priors(int = dnorm(0, .01), AVAL = dnorm(0, .05)) %>% fit_model(bayesian, Analysis, PBACFL)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.