R/prepare_diagram.R

Defines functions prepare_diagram

Documented in prepare_diagram

#' Create data frames for plotting from model elements.
#'
#' @description
#' This function takes as input a model consisting of variables/compartments
#' and flows, and creates a list of data frames with label and position
#' information for plotting a flow diagram.
#' The resulting object is used as an input to \code{\link{make_diagram}},
#' which creates a **ggplot2** based diagram. The function attempts to make
#' decent decisions regarding the placement of variables (boxes),
#' flows (arrows), and labels. However, complex models with complex diagrams
#' will likely need user modification. This is documented in the vignettes.
#'
#' @param model_list A list of model elements. This list is required and
#' must contain these two elements:
#' \itemize{
#' \item `variables`: A character vector specifying the names of all variables.
#' \item `flows`: A list that is the same length as `variables`. Each sub-list
#'     element is a character vector of any length specifying the flows into
#'     and out of the variable. Note that **flowdiagramr** assumes that the
#'     order of `flows` matches that of the `variables` vector.
#' \item IMPORTANT: All `variables` entries must start with an upper case letter,
#'     followed by any combination of letters and numbers
#'     (e.g., S, Si, or Aml2). All parameters contained in `flows` must start
#'     with a lower case letter followed by any combination of letters and
#'     numbers (e.g., b, bBmax, kS, p21S). All variables and parameters MUST
#'     be separated by math notation (e.g., +, -, *, /). Most math functions
#'     (e.g., `sin`, `cos`) are currently not supported.
#' \item See examples and details below and vignettes.
#' }
#'
#' @param model_settings A list of optional settings to adjust layout. The
#'     following elements are supported. If not provided, they default to a
#'     single row and all sizes of 1.
#' \itemize{
#' \item `varlocations`: A matrix containing all `model_list$variables` entries
#'     in specific locations on a grid. See examples.
#' \item `varbox_x_size`: Either a scalar or a vector that changes the default
#'     width of variable boxes. For example, `varbox_x_size = 1.5` makes each
#'     box 1.5 units in width. If a scalar, the value is used for all variables.
#'     If a vector, the length must correspond to number of variables,
#'     and the provided values are applied to the variables in the order
#'     provided in `model_list$vars`.
#' \item `varbox_y_size`: Same as `varbox_x_size` but for the height of
#'     the boxes.
#' \item `varspace_x_size`:  Either a scalar or a vector that changes the
#'     spacing between variable boxes in the x/horizontal dimension. If
#'     `varspace_x_size` is a scalar, all spaces between boxes in the x
#'     direction will be the same. For example, `varspace_x_size = 1.5` puts
#'     1.5 units of space in the x direction between boxes. If you provide a
#'     vector, it needs to be of dimension one less than the number of columns
#'     in `varlocations`. Spacing starts at the left, thus the first number is
#'     the spacing between the first column and second column, etc. Spacing
#'     is measured between the right edge of one box to the left edge of
#'     the next box.
#' \item `varspace_y_size`:  Same as `varspace_y_size` but for the vertical
#'     dimension. If you provide a vector, it needs to be of dimension one
#'     less than the number of rows in `varlocations`. Spacing starts at the
#'     bottom, thus the first number is the spacing between the lowest and
#'     second lowest row, etc. Spacing is measured between the bottom edge
#'     of one box to the top edge og the next box.
#' \item See examples and details below and vignettes.
#' }
#'
#' @return A list of two data frames containing all necessary information
#'         for the model variables/boxes and flows/arrows to be plotted
#'         by the \code{\link{make_diagram}} function.
#'         The data frames are:
#' \itemize{
#'   \item `variables`: A data frame containing information for all variables.
#'   The data frame contains these columns:
#'   \itemize{
#'     \item `id`: A numeric id for each variable.
#'     \item `name`: The name of the variable as provided in the model
#'         specification.
#'     \item `xmin`: Left edge location of variable box.
#'     \item `xmax`: Right edge location of variable  box.
#'     \item `ymin`: Lower edge of location variable box.
#'     \item `ymax`: Upper edge of location variable  box.
#'     \item `xlabel`: Horizontal position (midpoint) of label.
#'     \item `ylabel`: Vertical position (midpoint) of label.
#'     \item `label_text`: Text that will appear as the label of the box. Can
#'         be different from `name`.
#'     \item `outline_color`: The outline color of variable boxes.
#'     \item `fill_color`: The fill color of the variable boxes.
#'     \item `label_color`: The color of the box labels for each variable.
#'     \item `label_size`: Text size for variable labels.
#'   }
#'
#'   \item `flows`: A data frame containing information for all flows.
#'   The data frame contains these columns:
#'   \itemize{
#'     \item `id`: A numeric id for each flow.
#'     \item `name`: The name of the flow. Typically a mathematical expression.
#'         If a main flow with an interaction, this name is for id purposes
#'         only because the `label_text` will be the actual label displayed
#'         in the diagram. Thus, the name might be duplicated in other rows.
#'     \item `type`: Type of flow. One of main, interaction, or external.
#'     \item `from`: The variable from which the arrow originate. That is, the
#'         variable donating the flow.
#'     \item `to`: The variable to which the arrow will point. That is, the
#'         variable receiving the flow.
#'     \item `xstart`: The starting horizontal position of the arrow.
#'     \item `xend`: The ending horizontal position of the arrow.
#'     \item `ystart`: The starting vertical position of the arrow.
#'     \item `yend`: The ending vertical position of the arrow.
#'     \item `xlabel`: Horizontal position (midpoint) of label.
#'     \item `ylabel`: Vertical position (midpoint) of label.
#'     \item `curvature`: The amount of curvature applied to arrow.
#'         Higher numbers indicate more curvature; 0 = straight line.
#'     \item `label_text`: The label that will appear in the diagram. This is a
#'         duplicate of `name` so that user can update `label_text` as desired
#'         but retain the original math for reference.
#'     \item `line_color`: The color of the flow arrow line.
#'     \item `line_size`: The size (width) of the flow arrow line.
#'     \item `line_type`: The linetype of the flow arrow line.
#'     \item `label_color`: Color of `label_text`.
#'     \item `label_size`: The text size of `label_text`.
#'     \item `arrow_size`: The size of the arrow point on the flow line.
#'     \item `show_arrow`: Logical for whether to plot the flow arrow line
#'         (TRUE) or not (FALSE).
#'   }
#' }
#' @details
#'    `variables` needs to be specified as a vector of model variables,
#'     e.g., `variables <- c("Pred","Prey")`. `flows` need to be specified as a
#'    list, with each list entry containing the flows/processes for each
#'    variable in the order in which the variables appear. Flows need to be
#'    named according to `VARIABLENAME_flows`.
#'
#'    Example:
#'
#'    \code{flows <- list(Pred_flows = c(r*Pred, -k1*Pred*Prey),
#'    Prey_flows = c(g*Prey, -k2*Pred*Prey))}. Each flow, i.e. each entry in
#'    the flow vector, needs to be a valid mathematical expression made up of
#'    variables and parameters. The rules are as described above.
#'    As an example, the following includes a parameter *b* and two variables,
#'    *S* and *I*: `b*S*I`. The following includes a parameter *s* and two
#'    variables, *Bg* and *I2*: `Bg*s*I2`. See more examples below and in
#'    the vignettes.
#'
#'    The variables and flows data frames returned in the output list from this
#'    function contain a few columns that are provided to make it easier for
#'    the user to make changes to the data frames manually, but are not used
#'    by the package to make the diagram itself. In the `variables` data frame,
#'    `id` and `name` are unique identifiers that are not used by the package
#'    to make the diagram -- changing these will have no impact on the final
#'    diagram. In the `flows` data frame, `id`, `name`, `from`, and `to` are
#'    identifiers provided to make it easier for the user to understand each
#'    row of the data frame. Changing these columns will have no impact on the
#'    final diagram. All other columns contain information that impacts the
#'    drawn diagram itself. Users can update them -- and may want to in many
#'    cases -- but any updates to values in the remaining columns will be seen
#'    in the diagram itself. See the description of the output data frames below.
#'
#' @examples
#' # basic model specification
#' variables <- c("S","I","R")
#' flows <- list(S_flows = c("-b*S*I"),
#'               I_flows = c("b*S*I","-g*I"),
#'               R_flows = c("g*I"))
#' mymodel <- list(variables = variables, flows = flows)
#' diag_list <- prepare_diagram(model_list = mymodel)
#' mydiag <- make_diagram(diag_list)
#'
#'
#'
#' # adding optional specifications
#' varlocations <-  matrix(data = c("S", "", "R",
#'                                  "", "I", "" ),
#'                         nrow = 2, ncol = 3, byrow = TRUE)
#' mysettings <- list(varlocations = varlocations)
#' diag_list <- prepare_diagram(model_list = mymodel, model_settings = mysettings)
#' mydiag <- make_diagram(diag_list)
#'
#'
#'
#' # use of model_settings to change sizes and spacing, including vectorization
#' variables <- c("S","I","R")
#' flows <- list(S_flows = c("-b*S*I"),
#'               I_flows = c("b*S*I","-g*I"),
#'               R_flows = c("g*I"))
#' mymodel <- list(variables = variables, flows = flows)
#'
#' var_locs <- matrix(c("S", "", "R", "", "I", ""), byrow = TRUE, nrow = 2)
#' mysettings = list(
#'   varlocations = var_locs,
#'   varbox_x_size = c(1,2,1),
#'   varbox_y_size = c(0.5,0.5,2),
#'   varspace_x_size = 2,
#'   varspace_y_size = 1)
#'
#' diag_list <- prepare_diagram(model_list = mymodel,
#'                              model_settings = mysettings)
#' make_diagram(diag_list)
#'
#'
#' # another simple model for pathogen (prey) and immune response (predator)
#' variables = c("Pat","Imm")
#' flows     = list(Pat_flows = c("g*Pat*(1-Pat/pmax)", "-dP*Pat", "-k*Pat*Imm"),
#'                  Imm_flows = c("r*Pat*Imm", "-dI*Imm"))
#' mymodel = list(variables, flows)
#' diag_list <- prepare_diagram(mymodel)
#' mydiag <- make_diagram(diag_list)
#'
#'
#'
#' # manually switch to vertical layout
#' varlocations <-  matrix(data = c("Pat", "Imm"),
#'                         nrow = 2, byrow = TRUE)
#' mysettings <- list(varlocations = varlocations)
#' diag_list <- prepare_diagram(mymodel,mysettings)
#' mydiag <- make_diagram(diag_list)
#'
#' @export


# this function calls the following helper functions
# add_default_aes()
# add_locations()
# add_plus_signs()
# check_model_list()
# check_model_settings()
# fix_arrow_pos()
# get_vars_pars()
# set_curvature()
# set_feedback_curvature()
# set_node_to_na()
# update_tofroms()



prepare_diagram <- function(model_list,
                            model_settings = list(
                              varlocations = NULL,
                              varbox_x_size = NULL,
                              varbox_y_size = NULL,
                              varspace_x_size = NULL,
                              varspace_y_size = NULL)
                            )
{

  #############################################################################
  ## CONDUCT PRELIMINARY CHECKS ###############################################
  #############################################################################
  #############################################
  #############################################
  # Code block that does various checks and processing of input
  # This code block uses these helper functions:
  # check_model_list()
  # check_model_settings()
  #############################################
  #############################################

  #check to make sure model_list is provided
  #and is a properly specified model
  if (is.null(model_list))
  {
    stop('Argument model_list is required.')
  }
  checkmsg <- check_model_list(model_list)
  if(!is.null(checkmsg))
  {
    stop(checkmsg)
  }

  #in case the user didn't explicitly name the variable and flow elements, we do it here
  names(model_list) <- c("variables","flows")

  # check all user-provided model_settings to make sure entries are what
  # they should be
  if (!is.null(model_settings))
  {
    checkmsg <- check_model_settings(model_list, model_settings)
    if(!is.null(checkmsg))
    {
      stop(checkmsg)
    }
  }

  ######################################################################
  # Set model_settings components that are not user-provided
  # also, vectorize all entries box/space size entries
  ######################################################################
  # For each model_settings component, if user didn't set it,
  # we set a default here
  # default for varlocations is a matrix with a single row
  # If user did not provide values for sizing/spacing,
  # we set vectors of length nvars and nvars-1 for box and space sizing
  # each with the default value of 1
  # note that we assign it to model_settings.
  # these updated settings will also be returned as part of the list of values
  # this function returns

  #first, if varlocations matrix is not provided, make a single-row matrix
  if (is.null(model_settings$varlocations)) {
    model_settings$varlocations = matrix(model_list$variables,nrow=1)
  }

  # determine number of variables, rows and columns
  nvars = length(model_list$variables)
  nrows = nrow(model_settings$varlocations)  # varlocations is a matrix
  ncols = ncol(model_settings$varlocations)  # varlocations is a matrix

  # if user didn't provide a value, we use default of 1
  # as many box size numbers as there are boxes/variables
  if (is.null(model_settings$varbox_x_size)) {
    model_settings$varbox_x_size = rep(1, nvars)
  }
  if (is.null(model_settings$varbox_y_size)) {
    model_settings$varbox_y_size = rep(1, nvars)
  }
  # one more row/column less for spacing than is in the matrix
  if (is.null(model_settings$varspace_x_size)) {
    model_settings$varspace_x_size = rep(1, ncols - 1)
  }
  if (is.null(model_settings$varspace_y_size)) {
    model_settings$varspace_y_size = rep(1, nrows - 1)
  }

  # If user provided a single number for box and space size, we turn it into vectors here
  # this way we can consistently operate on vectors of the right size everywhere
  if (length(model_settings$varbox_x_size) == 1) {
    model_settings$varbox_x_size = rep(model_settings$varbox_x_size, nvars)
  }
  if (length(model_settings$varbox_y_size) == 1) {
    model_settings$varbox_y_size = rep(model_settings$varbox_y_size, nvars)
  }

  if (length(model_settings$varspace_x_size) == 1) {
    model_settings$varspace_x_size = rep(model_settings$varspace_x_size, ncols - 1)
  }
  if (length(model_settings$varspace_y_size) == 1) {
    model_settings$varspace_y_size = rep(model_settings$varspace_y_size, nrows - 1)
  }

  #############################################
  #############################################
  # At this stage, all input checking and processing should be done
  #############################################
  #############################################


  ############################################################################
  ## EXTRACT INFORMATION FROM USER-PROVIDED LISTS ############################
  ############################################################################
  #############################################
  #############################################
  # Code block that does some processing
  # to make rest of code more concise
  #############################################
  #############################################

  # This pulls out all list elements in model_settings and assigns them
  # to individual variables with their respective names
  # this is done for convenience so we don't have to keep calling
  # model_settings$varlocations and can just call varlocations, etc
  varlocations <- model_settings$varlocations
  varbox_x_size <- model_settings$varbox_x_size
  varbox_y_size <- model_settings$varbox_y_size
  varspace_x_size <- model_settings$varspace_x_size
  varspace_y_size <- model_settings$varspace_y_size


  #assign to variables outside of model_list
  #basically same as above for model_settings
  variable_names <- model_list$variables  # vector of names
  flows_list <- model_list$flows  # a list flows for each variable


  #############################################
  #############################################
  # End code block that extracts lists
  #############################################
  #############################################





  ############################################################################
  ## PROCESS VARIABLES #######################################################
  ############################################################################
  #############################################
  #############################################
  # Code block that goes through all variables and
  # creates the variables data frame
  # This code block uses these helper functions:
  # add_locations()
  #############################################
  #############################################

  #number of variables/compartments in model
  nvars <- length(variable_names)

  # Create a data frame for all variables
  variables <- data.frame(
    id = 1:nvars,  # numeric id for nodes
    name = variable_names  # names for labels
  )

  # Add location information for each variable and add to data frame
  # See comments within function for details
  # this function only adds location information to real/named variables
  # provided by the user.
  variables <- add_locations(
    variables,
    varlocations,
    varbox_x_size,
    varbox_y_size,
    varspace_x_size,
    varspace_y_size
  )

  #############################################
  #############################################
  # End code block that processes variables
  # At this stage, the variable data frame is complete and done
  #############################################
  #############################################





  ############################################################################
  ## PROCESS FLOWS ###########################################################
  ############################################################################
  #############################################
  #############################################
  # Code block that starts processing flows
  # This code block uses these helper functions:
  # add_plus_signs()
  # get_vars_pars()
  #############################################
  #############################################

  # if a flow element is a single empty character string, then no flow is
  # generated and those flows can be removed here
  # if a variable in the flows_list has no flows, then the result is a
  # variable box with no flows in or out.
  # first_elements <- sapply(flows_list, "[[", 1)  # first element of each sublist
  # nonempty_flows <- which(first_elements != "")  # find non-blank elements
  # flows_list <- flows_list[nonempty_flows]  # keep the non-blank list elements

  #add implicit + signs to make explicit before additional parsing
  # implicit + signs are added to any flow without an explicit "-" in front
  flows_list <- add_plus_signs(flows_list)

  #turns flow list into matrix, adding NA
  #(from modelbuilder code base)
  #variables are along rows and flows along columns.
  flowmat <- t(sapply(flows_list, `length<-`, max(lengths(flows_list))))

  # if there are just two variables and a single flow between them,
  # the flowmat is oriented incorrectly (nodes across columns). this
  # can be diagnosed by checking to see if flowmat has rownames.
  # if not, the matrix needs to be transposed.
  if(is.null(rownames(flowmat))) {
    flowmat <- t(flowmat)
  }

  #strip leading +/- from flows and replace with no space, this way we have
  #character vectors of the flows without leading math, which we need for
  #processing labels
  flowmatred <- sub("\\+|-","",flowmat)

  #extract only the + or - signs from flows so we know the direction
  signmat <- gsub("(\\+|-).*","\\1",flowmat)


  ############################################################
  #Loop over all variables, for each variable, loop over flows
  ############################################################
  #create a flows data frame for storing the flow information
  flows <- data.frame()  # empty until binded to during first iteration of loop

  #start loop over variables (rows in the flowmatred matrix)
  for(i in 1:nrow(flowmatred))
  {
    varflowsfull <- flowmat[i, ] #all flows with sign for current variable
    varflows <- flowmatred[i, ] #all flows for current variable, without sign
    varflowsigns <- signmat[i, ] #signs of flows for current variable

    #remove NA entries because these only show up to match the
    #matrix dimensions needed given the variable with the largest
    #number of flows in/out.
    varflows <- varflows[!is.na(varflows)]

    #start loop over all the flows in/out of the current variable (node)
    for(j in 1:length(varflows)) {
      currentflowfull <- varflowsfull[j]
      currentflow <- varflows[j]
      currentsign <- varflowsigns[j]

      # Find the variables for which the current flow appears, i.e., what
      # other rows of the matrix does it show up in. It may not show up in
      # any other rows, and this logic is evaluated later on.
      connectvars <- unname(which(flowmatred == currentflow, arr.ind = TRUE)[,1])

      # Extract the variable names in the flow expression
      varspars <- unique(get_vars_pars(currentflowfull))

      # Extract first letter of each character string. As evaluated below, if
      # the first letter is UPPERCASE, then it is a variable. Otherwise, is
      # is a parameter.
      varfirsts <- substr(varspars, start = 1, stop = 1)

      # varfirsts is now a vector of the variables AND parameters that
      # are in the flow math
      # extract any strings that start with an upper case letter
      # (state variable) and are present in the current flow. So, if P1 and P2
      # are in this flow they both will be found.
      varvec <- varspars[which(varfirsts %in% LETTERS)]  #variables are UPPERCASE

      #extract the numeric ids for the variables in this flow
      varsids <- variables[which(variables$name %in% varvec), "id"]


      ####
      ## This first chunk further processes the connectvars vector
      ## information. After this, the flows diagram can be created effectively.
      ## For clarity, this chunk is kept separate from the creation of the
      ## flows data frame below; thus, one may notice redundant IF/THEN
      ## statements. The connectvars vector is the backbone that defines
      ## which variables each flow connects/interacts with.
      ####
      # add a connecting variable if the expression contains only one
      # variable, is only in one row of the flow matrix, and the row in
      # which it occurs does not correspond with the variable in the expression.
      # this is rare, but can occur in predator-prey style models.
      # the multiple condition IF statement checks:
      #  1. That there is one, and only one, variable in the expression
      #  2. The expression occurs in one, and only one, row of the flow matrix
      #  3. That the variable in the expression is not the variable row in which
      #     the expression occurs in the flow matrix.
      # This ends up being an internal feedback loop mediated by another
      # variable's state.
      # Note that these must be nested.
      if(length(varsids) == 1) {
        if(length(unique(connectvars)) == 1) {
          if(!(unique(connectvars) %in% varsids)) {
            # create a flag for adding interaction, this is used below
            connectvars <- unique(c(connectvars, varsids))
            flag <- TRUE
          }
        }
      }

      # Assign connecting variables for inflows (+ flows).
      # This block is just to update the connectvars vector. the flows
      # data frame is created using this information below in a separate
      # if/then block
      if(currentsign == "+") {
        # If the flow does not show up in any other rows (length(connectvars) == 1)
        # and there are no variables in the flow math, then the only connecting
        # variable is the current (i) variable
        if(length(connectvars) == 1 & length(varvec) == 0) {
          connectvars <- i
        }

        # If the flow does not show up in any other rows (length(connectvars) == 1)
        # and there is at least one variable in the flow math, then the
        # connecting variable(s) will either be the current variable once
        # (indicating an inflow like births) or the current variable twice
        # (indicating a feedback flow)
        if(length(connectvars) == 1 & length(varvec) >= 1){

          # if the current (i) variable does not show up in the flow math
          # then the connecting variable is just the current variable once,
          # indicating a independent inflow from out of the system (e.g., birth)
          if(!variables$name[i] %in% varvec) {
            connectvars <- i
          }

          # is the the current (i) variables shows up in the flow math, then
          # the connecting variables are the current variable twice, indicating
          # a feedback loop
          if(variables$name[i] %in% varvec) {
            connectvars <- c(i, i)
          }
        }

        # If there are more than one unique connecting variables, then
        # the connecting variables are simply those defined above by
        # searching the matrix of flows and/or the variables in the expression
        if(length(connectvars) > 1) {
          connectvars <- connectvars
        }
      } #end function block for inflows


      ####
      ## This chunk uses information about the sign of the flow and the
      ## connecting variables (connectvars) to generate a flows data frame
      ## with columns for: from, to, label, interation, out_interaction,
      ## and direct_interaction.
      ####
      # If current sign is negative, it is an outflow and goes either to the
      # connectvar that is not equal to the current variable id (indexed by i)
      # or it goes to NA (this happens when there is an unspecified death
      # compartment, for example). in the code below, the 'cn' object is used
      # to store the "to" variable id. it is either NA or one of the ids in the
      # 'connectvars' vector.
      if(currentsign == "-") {
        if(length(connectvars) == 1) {
          cn <- NA  #placeholder for unspecified compartment (deaths, typically)
        } else {
          #flow goes to the variable that is not i because variable i is the
          #donating variable.
          cn <- connectvars[connectvars!=i]
        }

        # Create a data frame with all the necessary segment information
        tmp <- data.frame(from = i,
                          to = cn,
                          name = currentflow,
                          interaction = FALSE,
                          out_interaction = FALSE,
                          direct_interaction = FALSE)

        # Bind to edge data frame for flows
        flows <- dplyr::bind_rows(flows, tmp)
      } #end function block for outflows

      # If the current sign is positive AND the flow only shows up in
      # one row of the flow matrix, then this is an inflow external to the
      # system or as a function of the current variable itself.
      if(currentsign == "+" & length(connectvars) == 1) {
        # These are typically births/imports
        if(connectvars == i) {
          tmp <- data.frame(from = NA,
                            to = i,
                            name = currentflow,
                            interaction = FALSE,
                            out_interaction = FALSE,
                            direct_interaction = FALSE)
          flows <- dplyr::bind_rows(flows, tmp)
        }
      }

      # If the current sign is positive and the length of connecting variables
      # is equal to two, then it is:
      #   a feedback loop (1 unique connecting variable)
      #   a physical flow between two unique variables
      #   an interaction flow between to unique variables
      if(currentsign == "+" & length(connectvars) == 2) {
        # These are feedbacks of somekind
        if(length(unique(connectvars)) == 1) {
          tmp <- data.frame(from = i,
                            to = i,
                            name = currentflow,
                            interaction = FALSE,
                            out_interaction = FALSE,
                            direct_interaction = FALSE)
        } else {
          # These are physical flows between two variables
          tmp <- data.frame(from = connectvars[connectvars!=i],
                            to = i,
                            name = currentflow,
                            interaction = FALSE,
                            out_interaction = FALSE,
                            direct_interaction = FALSE)

          # update interaction flag if flag exists
          if(exists("flag")) {
            # the flag for interactions was set above
            tmp$direct_interaction <- TRUE

            # remove flag from the environment
            rm(flag)
          }
        }
        flows <- dplyr::bind_rows(flows, tmp)
      }

      # add an interaction flag if two variables are in the flow
      if(length(varvec) > 1) {
        if(length(unique(connectvars)) > 1) {
          # this means that the flow connects two variables and both
          # are present in the flow math
          flows[nrow(flows), "interaction"] <- TRUE
        } else {
          # this means that the flow comes from or goes to somewhere out
          # of the system, and only 1 variable is included in the
          # flow math. this is designated as an "out_interaction"
          flows[nrow(flows), "out_interaction"] <- TRUE
        }
      }
    }  #end loop over all flows for a given variable
  }  #end loop over all variables

  # At this point, we can remove any rows where the name is blank. These
  # are only included above so that proper indexing through flowmat is
  # done.
  flows <- subset(flows, name != "")

  ############################################################
  ############################################################
  # finished creating all flows for the flows dataframe
  # not all information is present or correct yet
  # code block below further update the flows DF
  # At this stage, the flows dataframe has the following columns:
  # from, to, name, interaction, out_interaction, direct_interaction
  ############################################################
  ############################################################


  ############################################################
  # some cleanup of flow data frame
  # things that the code above didn't do quite right
  ############################################################

  # Keep only distinct rows; duplication occurs because one variable's
  # inflow can be another variable's outflow, but we only want these once
  # in the data frame for edges (segments/arrows/flows).
  flows <- unique(flows)

  # keep original name for all flows. this gets overwritten when the interaction
  # flow is added. but we want to retain this for later for the user and to
  # match up interaction flows.
  flows$orig_name <- flows$name

  # Parse the meaning of duplicate labels. Usually this is a complex mix
  # of a direct, physical flows and interactions from several other
  # state variables. We assume that the "main" flow among the "auxilliary"
  # duplicate flows is the one that traverses left-to-right (e.g., 1 to 2)
  # with the smallest gap and has no interaction flags.
  dups <- as.matrix(table(flows$name))  # tally the occurences of each flow
  dupids <- rownames(dups)[which(dups[,1] > 1)]  # grab the one with >1 occurence
  if(length(dupids) > 0) {
    flowdups <- subset(flows, name %in% dupids)  # take a subset of the edge data frame
    flows <- subset(flows, !(name %in% dupids))  # restrict flows to non-duplicate flows
    flowdups <- subset(flowdups, sign(to-from) == 1)  # keep left-to-right flows
    flowdups <- subset(flowdups, interaction == FALSE &
                         out_interaction == FALSE &
                         direct_interaction == FALSE)  # drop interactions
    if(nrow(flowdups) == 0) {
      stop(paste0("There are duplicate flows across variables that failed to\n",
                  "parse easily. Check to make sure all inflows have no sign or '+' signs\n",
                  "and all outflows have '-' signs."))
    }
    diffs <- with(flowdups, to - from)  # calc difference between nodes
    mainid <- which(diffs == min(diffs))  # keep the minimum node diff as main flow
    maindup <- flowdups[mainid, ]  # extract just the main flow for physical flow
    intdup <- flowdups[mainid, ]  # extract again for interaction flow, which is parsed later on
    intdup$interaction <- TRUE  # set interaction flag to TRUE
    flows <- rbind(flows, maindup, intdup)
  }

  # Duplicate rows with out_interaction == TRUE to assign the interaction
  # flag and then remove the out_interaction flag. This is done to
  # achieve appropriate labeling. We want the physical flow to have no label
  # and for the interaction arrow to carry the label.
  repdf <- subset(flows, out_interaction == TRUE)
  if(nrow(repdf) != 0) {  # avoids errors if no rows
    repdf$interaction <- TRUE  # set this to TRUE for linetypes
    repdf$out_interaction <- NULL  # remove this now
    flows[which(flows$out_interaction == TRUE), "name"] <- ""  # take away the name for the physical flow
    flows$out_interaction <- NULL  # remove this now
    flows <- rbind(flows, repdf)  # tack them together
  }

  # remove out_interaction completely now that interaction is
  # appropriately flagged with correct labeling
  flows$out_interaction <- NULL

  # set up columns needed to define interactions and subset out for interaction
  # settings separate from other flows. these are all merged back together
  # after interactions are given appropriate settings.
  flows$linkfrom <- NA  #empty column for interaction flows, but needed for binding
  flows$linkto <- NA  #empty column for interaction flows, but needed for binding
  flows$generator <- FALSE  #empty column for interaction/generation flows, but needed for binding
  ints <- subset(flows, interaction == TRUE)
  flows <- subset(flows, interaction == FALSE)

  # If there are interactions, then duplicate them and reassign the to/from
  # columns such that we have two segments for each interaction flagged
  # row: (1) the physical flow with from/to for donating and receiving
  # varables and (2) an interaction flow with an NA for the to column
  # and from is the non-donating variable in the flow math. A new "link"
  # column is added to identify which variable is linking the interaction
  # (the link is the "from" variable in the physical flow).
  if(nrow(ints) > 0) {  # avoids errors if no interactions

    # Here we check for "mediation" flows: flows that should go directly from
    # from one variable to another, but are not physical flows. For example,
    # infected cells produce new virus, but it is not a flow of cells, per se.
    # Mediation flows can be identified as interaction flows where
    # is.na(from) == TRUE. This is because the flow only shows up as a positive
    # in one row of the flow matrix AND it contains variables beyond the one
    # whose math it shows up in. The from variable is set to the first variable
    # encountered, and then the typical logic after this sets the other
    # variable encountered as the linking variable.
    for(i in 1:nrow(ints)) {
      if(is.na(ints[i,"from"])) {
        # extract variable ids in the flowmath
        v <- get_vars_pars(ints[i, "name"])  #strips away math, leaving just letters
        vf <- substr(v, start = 1, stop = 1)  #get first letters
        v <- v[which(vf %in% LETTERS)]  #subset to upper case VARIABLES
        ids <- variables[variables$name %in% v, "id"]  #extract the relevant numeric ids
        ints[i, "from"] <- ids[1]  # set from variable to first var encountered
        ints[i, "generator"] <- TRUE

        # if the mediation flow is added, we can remove this from the main
        # flows because it will look like an external flow with from == NA.
        fid <- which(flows$orig_name == ints[i, "name"])
        if(length(fid) > 0) {
          flows <- flows[-fid, ]
        }
      }
    }

    intflows <- ints  # duplicate
    intflows$name <- ""  # strip the name from the physical flow
    intflows <- unique(intflows)  # just keep unique flows
    intflows$interaction <- FALSE  # reset interaction to false b/c a main flow now


    # Redefine the from, to, and link columns for the interaction
    # arrows. "to" is NA until updated to meet at the center
    # of the physical flow arrow.
    for(i in 1:nrow(ints)) {
      tmp <- ints[i, ]
      v <- get_vars_pars(tmp$name)  #strips away math, leaving just letters
      vf <- substr(v, start = 1, stop = 1)  #get first letters
      v <- v[which(vf %in% LETTERS)]  #subset to upper case VARIABLES
      ids <- variables[variables$name %in% v, "id"]  #extract the relevant numeric ids

      if(is.na(ints[i, "to"])){
        # If the receiving node is NA, then this is an interaction
        # with a feedback flow, meaning the "link node" is also NA.
        ints[i, "linkfrom"] <- NA
        ints[i, "linkto"] <- NA
      } else if(ints[i, "to"] == ints[i, "from"]) {
          # If the to and from nodes are the same, this is a feedback
          # flow that does not require a link, so NAs.
          ints[i, "linkfrom"] <- NA
          ints[i, "linkto"] <- NA
      } else {
        # In all other cases, the "link from" node will be the current
        # "from" node and the "link to" node will be the current "to" node.
        ints[i, "linkfrom"] <- tmp$from
        ints[i, "linkto"] <- tmp$to
      }

      # Redefine the "from" node as the other node in this interaction.
      # PACKAGE CURRENTLY CANNOT HANDLE MULTIPLE INTERACTIONS IN
      # A SINGLE FLOW -- CAN ONLY HAVE TWO VARIABLES PRESENT
      ints[i, "from"] <- ids[which(ids != tmp$from)]
      ints[i, "to"] <- NA  # set NA for "to" node for all interactions
    }

    # Recombine the edge data frame
    flows <- dplyr::bind_rows(flows, ints, intflows)
  }

  # Keep only distinct rows.
  flows <- unique(flows)

  #########################################
  #########################################
  # At this stage, the flows dataframe has the following columns:
  # from, to, name, interaction, direct_interaction, linkfrom, linkto
  #########################################
  #########################################





  #############################################################################
  ## ADD SPATIAL INFORMATION TO FLOWS #########################################
  #############################################################################
  #########################################
  # This next large chunk assigns spatial information to all flows.
  # Spatial information includes: xmin, xmax, xlabel, ymin, ymax, ylabel.
  #########################################
  #########################################

  ####
  ## Direct, physical flows
  ####
  # These are simple flows from one variable to another, identified by
  # flows that have real numbers in the from and to columns, and are also
  # not links
  simple_flows <- subset(flows, !is.na(from) & !is.na(to) & is.na(linkfrom))

  if(nrow(simple_flows) > 0) { # only execute if these exist
    # add columns for to-be added information
    simple_flows$xmin <- NA_real_
    simple_flows$xmax <- NA_real_
    simple_flows$ymin <- NA_real_
    simple_flows$ymax <- NA_real_

    # Loop over the simple flows and identify the relative positions of the
    # to and from variables. This is necessary because start and end points
    # of arrows will change if the alignment of the nodes is horizontal
    # or vertical
    for(i in 1:nrow(simple_flows)) {
      tmp <- simple_flows[i,]
      from_node <- subset(variables, id == tmp$from)
      to_node <- subset(variables, id == tmp$to)

      # if the start and end variables are in the same row (y = y) AND
      # the start and end variables are in different columns (x != x), then
      # we set the y values for start and end to the mean of the y start
      # variable box (the middle) and the xmin location is the max x of
      # the left-most (starting) box and the min x of the right-most (ending) box
      if(from_node$ymin == to_node$ymin & from_node$xmin != to_node$xmin) {
        if(from_node$xmin > to_node$xmin) {
          # flow from right to left if from-node to right of to-node
          simple_flows[i, "xmin"] <- from_node$xmin # left edge
          simple_flows[i, "xmax"] <- to_node$xmax  # right edge
        } else {  # otherwise flow left to right
          simple_flows[i, "xmin"] <- from_node$xmax # right edge
          simple_flows[i, "xmax"] <- to_node$xmin  # left edge
        }
        simple_flows[i, "ymin"] <- mean(c(from_node$ymin, from_node$ymax)) # middle
        simple_flows[i, "ymax"] <- mean(c(to_node$ymin, to_node$ymax)) # middle
      }

      # if the start variable is above the end variable (y1 > y2) AND
      # the start and end variables are in the same column (x ranges overalap),
      # then we set the ymin of the arrow the bottom of the originating box and
      # the ymax of the arrow to the top of the terminating box. the x location
      # for start and end is set to the middle of the box (mean of top and bottom)

      # overlap test for x ranges to apply to all vertical alignments
      overlap_test <- from_node$xmax >= to_node$xmin & from_node$xmin <= to_node$xmax
      if(from_node$ymin > to_node$ymin & overlap_test) {
        simple_flows[i, "xmin"] <- mean(c(from_node$xmin, from_node$xmax)) # middle
        simple_flows[i, "xmax"] <- mean(c(to_node$xmin, to_node$xmax)) # middle
        simple_flows[i, "ymin"] <- from_node$ymin # bottom
        simple_flows[i, "ymax"] <- to_node$ymax # top
      }

      # if the start variable is below the end variable (y1 < y2) AND
      # the start and end variables are in the same column (x = x), then
      # we set the ymin of the arrow the top of the originating box and
      # the ymax of the arrow to the bottom of the terminating box. the x location
      # for start and end is set to the middle of the box (mean of top and bottom)
      if(from_node$ymin < to_node$ymin & overlap_test) {
        simple_flows[i, "xmin"] <- mean(c(from_node$xmin, from_node$xmax)) # middle
        simple_flows[i, "xmax"] <- mean(c(to_node$xmin, to_node$xmax)) # middle
        simple_flows[i, "ymin"] <- from_node$ymax # top
        simple_flows[i, "ymax"] <- to_node$ymin # bottom
      }

      # if the start variable is above the ending variable (y1 > y2) AND
      # the start variable is to the left of the ending variable (x1 < x2), then
      # the flow start is set to the right-middle of the originating box and
      # the flow end is set to the left-middle of the terminating box. this
      # creates an angled flow arrow pointing down and to the right.
      if(from_node$ymin > to_node$ymin & !overlap_test & from_node$xmin < to_node$xmin) {
        simple_flows[i, "xmin"] <- from_node$xmax # right edge
        simple_flows[i, "xmax"] <- to_node$xmin  # left edge
        simple_flows[i, "ymin"] <- mean(c(from_node$ymin, from_node$ymax)) # middle
        simple_flows[i, "ymax"] <- mean(c(to_node$ymin, to_node$ymax)) # middle
      }

      # if the start variable is above the ending variable (y1 > y2) AND
      # the start variable is to the right of the ending variable (x1 > x2), then
      # the flow start is set to the left-middle of the originating box and
      # the flow end is set to the right-middle of the terminating box. this
      # creates an angled flow arrow pointing down and to the left.
      if(from_node$ymin > to_node$ymin & !overlap_test & from_node$xmin > to_node$xmin) {
        simple_flows[i, "xmin"] <- from_node$xmin # left edge
        simple_flows[i, "xmax"] <- to_node$xmax  # right edge
        simple_flows[i, "ymin"] <- mean(c(from_node$ymin, from_node$ymax)) # middle
        simple_flows[i, "ymax"] <- mean(c(to_node$ymin, to_node$ymax)) # middle
      }

      # if the start variable is below the ending variable (y1 < y2) AND
      # the start variable is to the left of the ending variable (x1 < x2), then
      # the flow start is set to the right-middle of the originating box and
      # the flow end is set to the left-middle of the terminating box. this
      # creates an angled flow arrow pointing up and to the right.
      if(from_node$ymin < to_node$ymin & !overlap_test & from_node$xmin < to_node$xmin) {
        simple_flows[i, "xmin"] <- from_node$xmax # right edge
        simple_flows[i, "xmax"] <- to_node$xmin # left edge
        simple_flows[i, "ymin"] <- mean(c(from_node$ymin, from_node$ymax)) # middle
        simple_flows[i, "ymax"] <- mean(c(to_node$ymin, to_node$ymax)) # middle
      }

      # if the start variable is below the ending variable (y1 < y2) AND
      # the start variable is to the right of the ending variable (x1 > x2), then
      # the flow start is set to the left-middle of the originating box and
      # the flow end is set to the right-middle of the terminating box. this
      # creates an angled flow arrow pointing up and to the left.
      if(from_node$ymin < to_node$ymin & !overlap_test & from_node$xmin > to_node$xmin) {
        simple_flows[i, "xmin"] <- from_node$xmin # left edge
        simple_flows[i, "xmax"] <- to_node$xmax # right edge
        simple_flows[i, "ymin"] <- mean(c(from_node$ymin, from_node$ymax)) # middle
        simple_flows[i, "ymax"] <- mean(c(to_node$ymin, to_node$ymax)) # middle
      }

      # if the flow starts and ends in the same place, this is a feedback
      # flow that needs minor offsets in the x direction.
      if(from_node$xmin == to_node$xmin & from_node$ymin == to_node$ymin) {
        middle <- mean(c(from_node$xmin, from_node$xmax))
        simple_flows[i, "xmin"] <- middle - 0.25  # minor offset to the left for start
        simple_flows[i, "xmax"] <- middle + 0.25  # minor offset to the right for end
        simple_flows[i, "ymin"] <- from_node$ymax  # top
        simple_flows[i, "ymax"] <- to_node$ymax  # top
      }
    } # end loop over simple, physical flows
  } # end direct physical flows if/then for existence


  ####
  ## In flows
  ####
  # These flows only have a to id and from is NA
  in_flows <- subset(flows, is.na(from) & !is.na(to) & is.na(linkfrom))

  if(nrow(in_flows) > 0) { # only exectute if these exist
    # The xlabel, ymax locations define the top/middle of the node, which
    # is xmax/ymax for in-flows
    in_flows <- merge(in_flows, variables[,c("xlabel","ymax", "id")],
                      by.x = "to",
                      by.y = "id")
    in_flows$xmax <- in_flows$xlabel
    in_flows$xlabel <- NULL  # remove the column

    # ymin is the y starting point of the arrow, defined as the end point (ymax) + 0.5
    in_flows$ymin <- in_flows$ymax + 0.25

    # xmin is the x starting point of the arrow, defined as left-edge of the node
    left_edges <- variables[,c("id", "xmin")]
    in_flows <- merge(in_flows, left_edges, by.x = "to", by.y = "id")
  } # end in flows if/then for existence


  ####
  ## Out flows
  ####
  # These flows only have a from id, to is NA, linkfrom is NA, and no interaction
  out_flows <- subset(flows, !is.na(from) & is.na(to) &
                              is.na(linkfrom) & interaction == FALSE)

  if(nrow(out_flows) > 0) { # only execute if these exist
    # The xlabel, ymin locations define the bottom/middle of the node, which
    # is xmin/ymin for out-flows
    out_flows <- merge(out_flows, variables[,c("xlabel","ymin", "id")],
                       by.x = "from",
                       by.y = "id")
    out_flows$xmin <- out_flows$xlabel
    out_flows$xlabel <- NULL  # remove the column

    # ymax is the y end point of the arrow, defined as the start point (ymin) - 0.5
    out_flows$ymax <- out_flows$ymin - 0.25

    # xmax is the x end point of the arrow, defined as right-edge of the node
    right_edges <- variables[,c("id", "xmax")]
    out_flows <- merge(out_flows, right_edges, by.x = "from", by.y = "id")
  } # end out flows if/then for existence


  ####
  ## Interaction flows
  ####
  # These are flows where the interaction column is TRUE and both linkfrom
  # and linkto have values
  int_flows <- subset(flows, interaction == TRUE & !is.na(linkfrom) & !is.na(linkto))

  if(nrow(int_flows) > 0) { # only execute if these exist
    # add columns for to-be added information
    int_flows$xmin <- NA_real_
    int_flows$xmax <- NA_real_
    int_flows$ymin <- NA_real_
    int_flows$ymax <- NA_real_

    # the end point is the middle of the simple flow going from "linkfrom" to "linkto"
    # loop over the out_flows to get correct matching of the simple flows
    for(i in 1:nrow(int_flows)) {
      int_tmp <- int_flows[i,]  # get one row to work with
      simple_to <- int_tmp$linkto # to variable id
      simple_from <- int_tmp$linkfrom # from variable id

      # get the direct flow arrow for end positions
      simple_tmp <- subset(simple_flows, to == simple_to & from == simple_from)
      this_ymax <- mean(c(simple_tmp$ymin, simple_tmp$ymax)) # middle of direct flow
      this_xmax <- mean(c(simple_tmp$xmin, simple_tmp$xmax)) # middle of direct flow

      # get the from variable for start positions
      var_tmp <- subset(variables, id == int_tmp$from)
      this_ymin <- var_tmp$ymax # top of box
      this_xmin <- mean(c(var_tmp$xmin, var_tmp$xmax)) # middle of box

      # replace NAs
      int_flows[i, "xmin"] <- this_xmin
      int_flows[i, "xmax"] <- this_xmax
      int_flows[i, "ymin"] <- this_ymin
      int_flows[i, "ymax"] <- this_ymax

      # remove from memory to avoid overwriting potential
      rm(this_xmin, this_xmax, this_ymax,this_ymin)
    } # end interaction variable loop
  } # end interaction variable if/then for existence


  ####
  ## External interaction flows
  ####
  # These are interaction arrows that go from a state variable (node) to
  # another arrow that is either an outflow, inflow, or feedback flow.
  # Regular interactions for physical flows between variables are already
  # handled above. These external interaction flows are special, and we
  # treat them as such.
  # External flows are identified as having the interaction as TRUE and
  # the linkto is NA.
  ext_flows <- subset(flows, interaction == TRUE & is.na(linkto))
  # We also need to know the locations of all the other flows, so create
  # a temporary flows dataframe here
  other_flows <- dplyr::bind_rows(simple_flows, in_flows, out_flows, int_flows)
  # update vertical edges to avoid overlaps
  other_flows <- fix_arrow_pos(other_flows)
  # Now loop through the ext_flows for spatial processing, if there is at least 1 row
  if(nrow(ext_flows) > 0) {
    # add columns for to-be added information
    ext_flows$xmin <- NA_real_
    ext_flows$xmax <- NA_real_
    ext_flows$ymin <- NA_real_
    ext_flows$ymax <- NA_real_
    for(i in 1:nrow(ext_flows)) {
      tmp <- ext_flows[i,]

      # use the flow math to determine if this is associated with an
      # outflow or inflow
      direction <- signmat[which(flowmatred == tmp$name)]

      if(direction == "-") {
        # if an outflow (direction == "-"), then this is associated with a
        # row in the other_flows data frame where the from location
        # is different than the from location in the tmp data frame AND
        # the name is empty
        to_flow <- NULL  # null out to avoid errors
        to_flow <- subset(other_flows, orig_name == tmp$orig_name)
        # this can sometimes produce a data frame with an NA row because of
        # an NA in the fields used above in the logical constraint, that
        # row is dropped here
        drops <- which(is.na(to_flow$from) & is.na(to_flow$to))
        if(length(drops) > 0){
          to_flow <- to_flow[-drops, ]
        }


        # And it is associated with the variable in the from element
        from_node <- NULL  # null this out to avoid errors
        from_node <- variables[variables$id == tmp$from, ]

        # For these complex interactions, we assume a horizontal flow
        # arrangment, user must update if more complex
        if(from_node$xlabel > mean(c(to_flow$xmin, to_flow$xmax))) {
          # this implies and arrow going from right to left
          tmp$xmin <- from_node$xmin # left edge
          tmp$xmax <- mean(c(to_flow$xmin, to_flow$xmax)) # middle
          tmp$ymin <- from_node$ylabel # middle
          tmp$ymax <- mean(c(to_flow$ymin, to_flow$ymax)) # middle
        } else { # assume left to right
          tmp$xmin <- from_node$xmax # right edge
          tmp$xmax <- mean(c(to_flow$xmin, to_flow$xmax)) # middle
          tmp$ymin <- from_node$ylabel # middle
          tmp$ymax <- mean(c(to_flow$ymin, to_flow$ymax)) # middle
        } # end left-right if/then
      } # end direction "-" if

      if(direction == "+") {
        # if the flow is a "+", then this associated with either a feedback
        # flow or an external flow into the system
        # first find the to_flow, which will be the flow with a to variable
        # that is not the current from variable and the name is empty
        to_flow <- NULL  # null out to avoid errors
        to_flow <- other_flows[other_flows$orig_name == tmp$name, ]
        # this can sometimes produce a data frame with an NA row because of
        # an NA in the fields used above in the logical constraint, that
        # row is dropped here
        drops <- which(is.na(to_flow$from) & is.na(to_flow$to))
        if(length(drops) > 0){
          to_flow <- to_flow[-drops, ]
        }

        # And it is associated with the variable in the from element
        from_node <- NULL  # null this out to avoid errors
        from_node <- variables[variables$id == tmp$from, ]

        # For these complex interactions, we assume a horizontal flow
        # arrangment, user must update if more complex
        if(from_node$xlabel > mean(c(to_flow$xmax, to_flow$xmax))) {
          # this implies and arrow going from right to left
          tmp$xmin <- from_node$xmin # left edge
          tmp$xmax <- mean(c(to_flow$xmin, to_flow$xmax)) # middle
          tmp$ymin <- from_node$ylabel # middle
          tmp$ymax <- mean(c(to_flow$ymin, to_flow$ymax)) # middle
        } else { # assume left to right
          tmp$xmin <- from_node$xmax # right edge
          tmp$xmax <- mean(c(to_flow$xmin, to_flow$xmax)) # middle
          tmp$ymin <- from_node$ylabel # middle
          tmp$ymax <- mean(c(to_flow$ymin, to_flow$ymax)) # middle
        } # end left-right if/then

        # Last, the "+" flows can be either an external entry or a feedback.
        # External entries will have no "from", feedbacks will. If the flow
        # is pointing to an external entry, then all location information is
        # fine. If it is pointing to a feedback loop, we need to update it
        # slightly to hit the curve. Manual adjustments will be required by the
        # user, still.
        if(!is.na(tmp$from)) {
          tmp$ymax <- tmp$ymax + 0.5  # this hits the top of the curve, generally
        }

      } # end direction "+"
      loc_cols <- c("xmin", "xmax", "ymin", "ymax")
      ext_flows[i, loc_cols] <- tmp[ , loc_cols]
    } # end external flow loop
  } # end external flow if


  ####
  ## Combine flows back together
  ####
  flows <- NULL # set original df to null to avoid/identify any potential errors
  flows <- dplyr::bind_rows(other_flows, ext_flows)


  ####
  ## Add label locations for all flows
  ####
  # label locations are mid points, which are means of the start and end positions
  flows$xlabel <- with(flows, (xmax + xmin) / 2)
  flows$ylabel <- with(flows, (ymax + ymin) / 2)

  ####
  ## Update 'generator' column for single interaction flows
  ####
  # here we do one final update to the generator column to move some flows
  # that are conviently identified as 'direct_interactions' to be identified
  # as 'generator' instead. these are identified by flows where:
  #    direct_interaction == TRUE  AND
  #    is.na(linkto) == TRUE  AND
  #    is.na(linkfrom) == TRUE  AND
  #    is.na(to) == FALSE  AND
  #    is.na(from) == FALSE
  gen_ids <- which(flows$direct_interaction == TRUE &
                     is.na(flows$linkto) == TRUE &
                     is.na(flows$linkfrom) == TRUE &
                     is.na(flows$from) == FALSE &
                     is.na(flows$to) == FALSE)
  if(length(gen_ids) > 0) {
    flows[gen_ids, "direct_interaction"] <- FALSE
    flows[gen_ids, "generator"] <- TRUE
  }

  # flows cannot be both an interaction and a generator, default back to
  # interaction because that is the most common meaning
  gen_int_ids <- which(flows$generator == TRUE & flows$interaction == TRUE)
  if(length(gen_int_ids) > 0) {
    flows[gen_int_ids, "generator"] <- FALSE
  }


  # set default curvature of all flows, this also applies label updates
  # to curved arrows, so we do this before making minor adjustments below
  flows <- set_curvature(variables, flows)

  # apply a minor offset to move the label away from the line
  # this is done flow by flow to determine if it is vertical or horizontal
  for(i in 1:nrow(flows)) {
    tmp <- flows[i, ]
    # processing for direct flows
    if(tmp$interaction == FALSE & is.na(tmp$from) == FALSE & is.na(tmp$to) == FALSE) {
      if(tmp$xmin == tmp$xmax) { # vertical
        flows[i, "xlabel"] <- flows[i, "xlabel"] - 0.25  # move to left
      } else { # horizontal
        flows[i, "ylabel"] <- flows[i, "ylabel"] + 0.1  # move up
      }
    } else if((is.na(tmp$from) | is.na(tmp$to)) &
              tmp$interaction == FALSE) { # processing for in/out flows

      #place label at beginning of arrow
      if(is.na(tmp$from)) {
        flows[i, "xlabel"] <- flows[i, "xmin"] - 0.1
        flows[i, "ylabel"] <- flows[i, "ymin"] + 0.1

      }
      #move label at end of arrow
      if(is.na(tmp$to)) {
        flows[i, "xlabel"] <- flows[i, "xmax"] + 0.1
        flows[i, "ylabel"] <- flows[i, "ymax"] - 0.1
      }
    }

    # processing for interactions
    if(tmp$interaction == TRUE) {
      if(tmp$xmax == tmp$xmin) {  # vertical
        flows[i, "xlabel"] <- flows[i, "xlabel"] + 0.02
      } else { # horizontal
        flows[i, "ylabel"] <- flows[i, "ylabel"] + 0.2
      }
    }
  }

  # add a diff column so we can identify flows that traverse more than
  # one variable. these will be updated to have curvature that goes over
  # or under the nodes it is bypassing. works best with just a couple. if there
  # is lots of traversing, then manual intervention will be required by the user
  flows$diff <- with(flows, abs(to-from))

  # set curvature of feedback loops. this is pretty different from the
  # "regular" curvature settings, so we made a separate function for this
  # operation.
  flows <- set_feedback_curvature(flows)

  # set to/from columns in flows to NA if value is not in node dataframe
  flows <- set_node_to_na(flows, variables)

  # remove rows with no location information
  flows <- flows[!is.na(flows$xmin) &
                   !is.na(flows$xmax) &
                   !is.na(flows$ymin) &
                   !is.na(flows$ymax), ]

  # convert direct interaction to flag to regular interaction flag,
  # now only relevant for plotting
  # get row ids for the "direct interactions"
  ids <- which(flows$interaction == FALSE & flows$direct_interaction == TRUE)
  # set interaction to TRUE since this now is just for plotting aesthetics
  flows[ids, "interaction"] <- TRUE
  # remove the direct_interaction column because all processing is complete
  flows$direct_interaction <- NULL  # delete the flagging column

  # update all to and froms such that each is the variable label
  # until now, the to/from in flows has just been numeric. these
  # need to be character strings for the variable labels for plotting
  flows <- update_tofroms(flows, variables)


  # update interaction column to be type column, one of
  # main, interaction, or external. this is needed for plotting
  flows$type <- "main"  # intialize the column as all "main" flows
  # set the interaction flows according to the interaction flag
  flows$type <- ifelse(flows$interaction == TRUE, "interaction", flows$type)
  # external flows are not interactions and either the to or from id NA
  flows$type <- ifelse(flows$interaction == FALSE & (is.na(flows$to) | is.na(flows$from)),
                       "external", flows$type)
  # generators are id'd in the generator column
  flows$type <- ifelse(flows$generator == TRUE, "generator", flows$type)
  flows$interaction <- NULL  # remove the interaction column
  flows$generator <- NULL  # remove the generator column

  #sort flows by type, main/external/interaction
  flows <- dplyr::bind_rows(
    flows[flows$type == "main", ],
    flows[flows$type == "generator", ],
    flows[flows$type == "external", ],
    flows[flows$type == "interaction", ]
  )

  #add a row id so it's easier for users to know which row to alter
  flows$id = 1:nrow(flows)

  # add a unique id for users when interacting with update_diagram()
  # for variables, this is just the name columne
  # for flows, the unique id is the first letter of the "type" and
  # then the collapsed original name, separated by "_"
  part1 <- substr(flows$type, 1, 1)
  paste_it_too <- function(x) {  # little helper function for lapply
    ch <- get_vars_pars(x)
    paste0(ch, collapse = "")
  }
  part2 <- unlist(lapply(flows$orig_name, paste_it_too))
  # overwrite orig_name, gets changed to name in add_default_aes after
  # label column is created
  flows$orig_name <- paste0(part1, "_", part2)

  # update flows column ordering
  flows <- flows[, c("id",
                     "orig_name",
                     "name",
                     "type",
                     "from",
                     "to",
                     "xmin",
                     "xmax",
                     "ymin",
                     "ymax",
                     "xlabel",
                     "ylabel",
                     "curvature")]

  # update flows x/y min/max column names to be start/end for clarity
  cols_to_change <- which(colnames(flows) %in% c("xmin","xmax","ymin","ymax"))
  colnames(flows)[cols_to_change] <- c("xstart","xend","ystart","yend")

  # update variables column ordering
  variables <- variables[ , c("id",
                              "name",
                              "xmin",
                              "xmax",
                              "ymin",
                              "ymax",
                              "xlabel",
                              "ylabel")]

  # one final check for duplicated arrows within types that need to have names
  # combined so that only one arrow is plotted. if this is not done, then
  # duplicate arrows are plotted on top of one another. the diagram looks
  # correct, but the heavy arrow is not pleasing.
  new_flows <- data.frame()
  # find flows within type that overlap and combine down to one line
  for(do_type in unique(flows$type)) {
    tmp <- subset(flows, type == do_type)
    new_tmp <- tmp
    compare <- tmp[ , c("xstart", "xend", "ystart", "yend")]
    compare$id <- with(compare, paste0(xstart, xend, ystart, yend))
    # loop over compare to find which rows are duplicated with which
    for(i in 1:nrow(compare)) {
      test <- compare[i, "id"]
      targ <- compare[, "id"]
      ids <- which(targ == test)
      if(length(ids) > 0) {
        newname <- paste(tmp[ids, "orig_name"], collapse = "_")
        new_tmp[i, "orig_name"] <- newname
      }
    }
    new_flows <- dplyr::bind_rows(new_flows, new_tmp)
  }
  new_flows$id <- NA_real_
  flows <- unique(new_flows)
  flows$id <- 1:nrow(flows)

  # remove row names, those are confusing
  rownames(flows) <- NULL
  rownames(variables) <- NULL

  # add default aesthetics and unique ids
  dflist <- add_default_aes(variables, flows)

  return(dflist)
}
andreashandel/modeldiagram documentation built on July 31, 2023, 10:05 a.m.