Adding Analyses to a Plan

+----------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | [Broad technical terms]{.underline} | | +----------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Object | Description | +----------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | argset | A named list containing a set of arguments. | +----------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | analysis | These are the fundamental units that are scheduled in plnr: | | | | | | - 1 argset | | | - 1 (action) function that takes two arguments | | | 1. data (named list) | | | 2. argset (named list) | +----------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | plan | This is the overarching "scheduler": | | | | | | - 1 data pull | | | - 1 list of analyses | +----------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | [Different types of plans]{.underline} | | +----------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Plan Type | Description | +----------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Single-function plan | Same action function applied multiple times with different argsets applied to the same datasets. | +----------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Multi-function plan | Different action functions applied to the same datasets. | +----------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | [Plan Examples]{.underline} | | +----------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Plan Type | Example | +----------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Single-function plan | Multiple strata (e.g. locations, age groups) that you need to apply the same function to to (e.g. outbreak detection, trend detection, graphing). | +----------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Single-function plan | Multiple variables (e.g. multiple outcomes, multiple exposures) that you need to apply the same statistical methods to (e.g. regression models, correlation plots). | +----------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Multi-function plan | Creating the output for a report (e.g. multiple different tables and graphs). | +----------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Single-function plan

This approach is generally used when you:

When we apply the same function multiple times, it is preferable to add the argsets first, and then apply the analysis function just before running the analyses.

Multiple strata

In this example, we loop through multiple geographical locations and apply a graphing function to the data from each of these geographical locations.

library(ggplot2)
library(data.table)
library(magrittr)

# We begin by defining a new plan
p <- plnr::Plan$new()

# Data function
data_fn <- function(){
  return(plnr::nor_covid19_cases_by_time_location)
}

# We add sources of data
# We can add data directly
p$add_data(
  name = "covid19_cases",
  fn_name = "data_fn"
)

p$get_data()

location_codes <- p$get_data()$covid19_cases$location_code %>%
  unique() %>% 
  print()

p$add_argset_from_list(
  plnr::expand_list(
    location_code = location_codes,
    granularity_time = "isoweek"
  )
)
# Examine the argsets that are available
p$get_argsets_as_dt()

# We can then add a simple analysis that returns a figure:

# To do this, we first need to create an action function
# (takes two arguments -- data and argset)
action_fn <- function(data, argset){
  if(plnr::is_run_directly()){
    data <- p$get_data()
    argset <- p$get_argset(1)
  }
  pd <- data$covid19_cases[
    location_code == argset$location_code &
    granularity_time == argset$granularity_time
  ]

  q <- ggplot(pd, aes(x=date, y=covid19_cases_testdate_n))
  q <- q + geom_line()
  q <- q + labs(title = argset$location_code)
  q
}

p$apply_action_fn_to_all_argsets(fn_name = "action_fn")

p$run_one(1)
q <- p$run_all()
q[[1]]
q[[2]]

Multiple variables

In this example, we loop through multiple variable combinations (1. raw numbers of Covid-19 cases vs Covid-19 cases per 100 000 population, and 2. aggregating over isoweek vs day) and apply a graphing function to the data according to each of these variable combinations.

library(ggplot2)
library(data.table)
library(magrittr)

# We begin by defining a new plan
p <- plnr::Plan$new()

# Data function
data_fn <- function(){
  return(plnr::nor_covid19_cases_by_time_location[location_code=="nation_nor"])
}

# We add sources of data
# We can add data directly
p$add_data(
  name = "covid19_cases",
  fn_name = "data_fn"
)

p$get_data()

p$add_argset_from_list(
  plnr::expand_list(
    variable = c("covid19_cases_testdate_n", "covid19_cases_testdate_pr100000"),
    granularity_time = c("isoweek","day")
  )
)
# Examine the argsets that are available
p$get_argsets_as_dt()

# We can then add a simple analysis that returns a figure:

# To do this, we first need to create an action function
# (takes two arguments -- data and argset)
action_fn <- function(data, argset){
  if(plnr::is_run_directly()){
    data <- p$get_data()
    argset <- p$get_argset(1)
  }
  pd <- data$covid19_cases[
    granularity_time == argset$granularity_time
  ]

  q <- ggplot(pd, aes_string(x="date", y=argset$variable))
  q <- q + geom_line()
  q <- q + labs(title = argset$granularity_time)
  q
}

p$apply_action_fn_to_all_argsets(fn_name = "action_fn")

p$run_one(1)
p$run_one(2)
p$run_one(3)
p$run_one(4)

Multi-function plan

This approach is generally used when you are creating the output for a report, and you need multiple different tables and graphs.

library(ggplot2)
library(data.table)
library(magrittr)

# We begin by defining a new plan
p <- plnr::Plan$new()

# Data function
data_fn <- function(){
  return(plnr::nor_covid19_cases_by_time_location)
}

# We add sources of data
# We can add data directly
p$add_data(
  name = "covid19_cases",
  fn_name = "data_fn"
)

p$get_data()

# Completely unique function for figure 1
p$add_analysis(
  name = "figure_1",
  fn_name = "figure_1"
)

figure_1 <- function(data, argset){
  if(plnr::is_run_directly()){
    data <- p$get_data()
    argset <- p$get_argset("figure_1")
  }
  pd <- data$covid19_cases[
    granularity_time == "isoweek"
  ]

  q <- ggplot(pd, aes_string(x="date", y="covid19_cases_testdate_pr100000"))
  q <- q + geom_line()
  q <- q + facet_wrap(~location_code)
  q <- q + labs(title = "Weekly covid-19 cases per 100 000 population")
  q
}

# Reusing a function for figures 2 and 3
p$add_analysis(
  name = "figure_2",
  fn_name = "plot_epicurve_by_location",
  location_code = "nation_nor"
)

# Reusing a function for figures 2 and 3
p$add_analysis(
  name = "figure_3",
  fn_name = "plot_epicurve_by_location",
  location_code = "county_nor03"
)

plot_epicurve_by_location <- function(data, argset){
  if(plnr::is_run_directly()){
    data <- p$get_data()
    argset <- p$get_argset("figure_2")
    argset <- p$get_argset("figure_3")
  }
  pd <- data$covid19_cases[
    granularity_time == "isoweek" & 
    location_code == argset$location_code
  ]

  q <- ggplot(pd, aes_string(x="date", y="covid19_cases_testdate_n"))
  q <- q + geom_line()
  q <- q + labs(title = argset$location_code)
  q
}

p$run_one("figure_1")
p$run_one("figure_2")
p$run_one("figure_3")


Try the plnr package in your browser

Any scripts or data that you put into this service are public.

plnr documentation built on Nov. 23, 2022, 5:06 p.m.