R/ImputeMissingVisits.R

Defines functions MarkMissing

Documented in MarkMissing

#' Code Empty Visit Values as "Missing" as Appropriate
#' 
#' @description Given a complete timeline of potential subject visits per study
#'    protocol, mark certain visits as "Missing"
#'
#' @param timeline_df A data frame with columns \code{who}, \code{when},
#'    \code{visit} and \code{randomized}. This data frame measures on which days
#'    the subjects visited the clinic (\code{visit}) and indicates when the 
#'    subjects were randomized to Phase I of their respective studies (the
#'    \code{randomized} column). This data set will contain one (and only one)
#'    record per subject per day; and enough rows to cover all potential visits
#'    per the protocol length of the study.
#' @param windowWidth How many days are expected between clinic visits? Defaults
#'    to 7, representing weekly clinic visits.
#' @param daysGrace How many days late are subjects allowed to be for their
#'    weekly visit. Defaults to 0. Under this default behavior with weekly
#'    visits, a subject who visits the clinic on days 8 and 14 instead of days
#'    7 and 14 will have a missing visit imputed for day 7.
#'
#' @return A copy of \code{timeline_df} with the column \code{visitYM} added.
#'    This column is a copy of the \code{visit} column with additional cells
#'    marking if a subject should have attended the clinic but did not.
#'
#' @details Most definitions of opioid use disorder treatment success or failure
#'    partially depend on a tally of the number of missed clinic visits. For
#'    example, a definition of early treatment failure could be "3 or more UDS
#'    positive for non-study opioids or missing visits within the first 28 days
#'    of randomization". Given a table of subject visits by day over the entire
#'    protocol timeline, this function will estimate when each subject missed a
#'    clinic visit (unfortunately, missed visits can often be improperly 
#'    recorded in the patient logs; if such information is complete, using this
#'    function is unnecessary).
#'    
#'    This estimation is conducted as follows: (1) first, for each subject, a
#'    regular grid of days is spread from the randomization day to the end of
#'    treatment by \code{windowWidth}; (2) next, we iterate over each day in
#'    this regular grid, and at each step we check the next \code{windowWidth}
#'    plus \code{daysGrace} days for a visit in that range, and we mark the day
#'    at the end of the window as "missing" if there are no visits in that
#'    range; (3) and finally, we combine these subject-specific data tables.
#'
#' @importFrom magrittr `%>%`
#' @importFrom purrr map
#' @importFrom dplyr arrange bind_rows mutate n pull slice
#' @export
#'
#' @examples
#'    # TO DO
MarkMissing <- function(timeline_df, windowWidth = 7, daysGrace = 0) {
  
  who <- when <- visit <- randomized <- NULL
  data_ls <- split(timeline_df, timeline_df$who)
  
  res_ls <- map(
    .x = data_ls,
    .f = ~{ 
      
      df2 <- 
        .x %>% 
        mutate(visitYM = as.character(visit))
      
      
      ###  Create Date Sequence  ###
      # These are the dates that the patient **should have** completed a UDS
      randomized_lgl <- any( !is.na(df2$randomized) )
      if (!randomized_lgl) {
        
        # For some reason, I can't get the for() loop below to mark day 0 as missing
        #   if the subject wasn't randomised
        # UPDATE: there are people who had a baseline visit on day 0, but were
        #   never randomized
        treatStart_int <- 0L
        hasBaseline_lgl <- !is.na( df2[df2$when == 0, "visitYM"] )
        if (!hasBaseline_lgl) {
          df2[df2$when == 0, "visitYM"] <- "Missing"
        }
        
      } else {
        
        treatStart_int <- 
          df2 %>% 
          filter(randomized) %>% 
          pull(when)
        
      }
      
      treatmentEnd_int <- 
        df2 %>% 
        arrange(when) %>% 
        slice(n()) %>% 
        pull(when) 
      screening_int <- seq(
        from = treatStart_int,
        to = treatmentEnd_int,
        by = windowWidth
      )
      
      
      ###  Loop Over Date Sequence  ###
      # TO DO: when a grace period exists, we need to confirm that visits are
      #   not counted twice (for example, a person with a visit only on day 9
      #   should not have that visit counted for weeks 1 and 2 if we use a 2
      #   day grace period).
      for (day in screening_int) {
        # browser()
        
        missing_logi <- 
          df2 %>% 
          filter(when > day) %>% 
          filter(when <= day + windowWidth + daysGrace) %>% 
          pull(visit) %>% 
          is.na() %>% 
          all()
        
        if (missing_logi) {
          
          df2[df2$when == day + windowWidth, "visitYM"] <- "Missing"
          
        }
        
      }
      
      df2 
      
    }
  )
  
  bind_rows(res_ls)
  
}

Try the public.ctn0094extra package in your browser

Any scripts or data that you put into this service are public.

public.ctn0094extra documentation built on Nov. 22, 2023, 5:07 p.m.