R/flex_index.R

Defines functions flex_index

Documented in flex_index

# --------------------------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See LICENSE.txt in the project root for license information.
# --------------------------------------------------------------------------------------------

#' @title Compute a Flexibility Index based on the Hourly Collaboration Query
#'
#' @description
#' `r lifecycle::badge('experimental')`
#'
#' Pass an Hourly Collaboration query and compute a Flexibility Index for the
#' entire population. The Flexibility Index is a quantitative measure of the
#' freedom for employees to work at a time of their choice.
#'
#' @details
#' The **Flexibility Index** is a metric that has been developed to quantify and
#' measure flexibility using behavioural data from Viva Insights. Flexibility
#' here refers to the freedom of employees to adopt a working arrangement of
#' their own choice, and more specifically refers to **time** flexibility
#' (_whenever_ I want) as opposed to **geographical** flexibility (_wherever_ I
#' want).
#'
#' The **Flexibility Index** is a score between 0 and 1, and is calculated based
#' on three component measures:
#'
#' - `ChangeHours`: this represents the freedom to define work start and end
#' time. Teams that embrace flexibility allow members to start and end their
#' workday at different times.
#'
#' - `TakeBreaks`: this represents the freedom define one's own schedule. In
#' teams that embrace flexibility, some members will choose to organize / split
#' their day in different ways (e.g. take a long lunch-break, disconnect in the
#' afternoon and reconnect in the evening, etc.).
#'
#' - `ControlHours`: this represents the freedom to switch off. Members who
#' choose alternative arrangements should be able to maintain a workload that is
#' broadly equivalent to those that follow standard arrangements.
#'
#' The **Flexibility Index** returns with one single score for each person-week,
#' plus the **three** sub-component binary variables (`TakeBreaks`,
#' `ChangeHours`, `ControlHours`). At the person-week level, each score can only
#' have the values 0, 0.33, 0.66, and 1. The Flexibility Index should only be
#' interpreted as a **group** of person-weeks, e.g. the average Flexibility
#' Index of a team of 6 over time, where the possible values would range from 0
#' to 1.
#'
#' @section Context:
#'   The central feature of flexible working arrangements is
#'   that it is the employee rather the employer who chooses the working
#'   arrangement. _Observed flexibility_ serves as a proxy to assess whether a
#'   flexible working arrangement are in place. The Flexibility Index is an
#'   attempt to create such a proxy for quantifying and measuring flexibility,
#'   using behavioural data from Viva Insights.
#'
#' @section Recurring disconnection time:
#'   The key component of `TakeBreaks` in the Flexibility Index is best
#'   interpreted as 'recurring disconnection time'. This denotes an hourly block
#'   where there is consistently no activity occurring throughout the week. Note
#'   that this applies a stricter criterion compared to the common definition of
#'   a break, which is simply a time interval where no active work is being
#'   done, and thus the more specific terminology 'recurring disconnection time'
#'   is preferred.
#'
#' @param data Hourly Collaboration query to be passed through as data frame.
#'
#' @param hrvar A string specifying the HR attribute to cut the data by.
#'   Defaults to NULL. This only affects the function when "table" is returned.
#'
#' @param signals Character vector to specify which collaboration metrics to
#'   use:
#'   - a combination of signals, such as `c("email", "IM")` (default)
#'   - `"email"` for emails only
#'   - `"IM"` for Teams messages only
#'   - `"unscheduled_calls"` for Unscheduled Calls only
#'   - `"meetings"` for Meetings only
#'
#' @param active_threshold A numeric value specifying the minimum number of
#'   signals to be greater than in order to qualify as _active_. Defaults to 0.
#'
#' @param start_hour A character vector specifying starting hours, e.g. `"0900"`
#'
#' @param end_hour A character vector specifying end hours, e.g. `"1700"`
#'
#' @param return String specifying what to return. This must be one of the
#'   following strings:
#'   - `"plot"`
#'   - `"data"`
#'   - `"table"`
#'
#' See `Value` for more information.
#'
#' @param plot_method Character string for determining which plot to return.
#'   - `"sample"` plots a sample of ten working pattern
#'   - `"common"` plots the ten most common working patterns
#'   - `"time"` plots the Flexibility Index for the group over time
#'
#' @param mode String specifying aggregation method for plot. Only applicable
#'   when `return = "plot"`. Valid options include:
#'   - `"binary"`: convert hourly activity into binary blocks. In the plot, each
#'   block would display as solid.
#'   - `"prop"`: calculate proportion of signals in each hour over total signals
#'   across 24 hours, then average across all work weeks. In the plot, each
#'   block would display as a heatmap.
#'
#' @return
#' A different output is returned depending on the value passed to the `return`
#' argument:
#'   - `"plot"`: 'ggplot' object. A random of ten working patterns are displayed,
#'   with diagnostic data and the Flexibility Index shown on the plot.
#'   - `"data"`: data frame. The original input data appended with the
#'   Flexibility Index and the component scores. Can be used with
#'   `plot_flex_index()` to recreate visuals found in `flex_index()`.
#'   - `"table"`: data frame. A summary table for the metric.
#'
#'
#' @import dplyr
#' @importFrom data.table ":=" "%like%" "%between%"
#'
#' @examples
#' \donttest{
#' # Create a sample small dataset
#' orgs <- c("Customer Service", "Financial Planning", "Biz Dev")
#' em_data <- em_data[em_data$Organization %in% orgs, ]
#'
#' # Examples of how to test the plotting options individually
#' # Sample of 10 work patterns
#' em_data %>%
#'   flex_index(return = "plot", plot_method = "sample")
#'
#' # 10 most common work patterns
#' em_data %>%
#'   flex_index(return = "plot", plot_method = "common")
#'
#' # Plot Flexibility Index over time
#' em_data %>%
#'   flex_index(return = "plot", plot_method = "time")
#'
#' # Return a summary table with the computed Flexibility Index
#' em_data %>%
#'   flex_index(hrvar = "Organization", return = "table")
#' }
#'
#' @section Returning the raw data:
#' The raw data containing the computed Flexibility Index can be returned with
#' the following:
#' ```
#' em_data %>%
#'   flex_index(return = "data")
#' ```
#'
#' @family Working Patterns
#'
#' @export
flex_index <- function(data,
                       hrvar = NULL,
                       signals = c("email", "IM"),
                       active_threshold = 0,
                       start_hour = "0900",
                       end_hour = "1700",
                       return = "plot",
                       plot_method = "common",
                       mode = "binary"){

  ## Bindings for variables
  TakeBreaks <- NULL
  ChangeHours <- NULL
  ControlHours <- NULL
  FlexibilityIndex <- NULL
  Signals_Break_hours <- NULL

  ## Make sure data.table knows we know we're using it
  .datatable.aware = TRUE

  ## Save original
  start_hour_o <- start_hour
  end_hour_o <- end_hour

  ## Coerce to numeric, remove trailing zeros
  start_hour <- as.numeric(gsub(pattern = "00$", replacement = "", x = start_hour))
  end_hour <- as.numeric(gsub(pattern = "00$", replacement = "", x = end_hour))
  norm_span <- end_hour - start_hour

  ## convert to data.table
  data2 <-
    data %>%
    dplyr::mutate(Date = as.Date(Date, format = "%m/%d/%Y")) %>%
    data.table::as.data.table() %>%
    data.table::copy()

  ## Text replacement only for allowed values
  if(any(signals %in% c("email", "IM", "unscheduled_calls", "meetings"))){

    signal_set <- gsub(pattern = "email", replacement = "Emails_sent", x = signals) # case-sensitive
    signal_set <- gsub(pattern = "IM", replacement = "IMs_sent", x = signal_set)
    signal_set <- gsub(pattern = "unscheduled_calls", replacement = "Unscheduled_calls", x = signal_set)
    signal_set <- gsub(pattern = "meetings", replacement = "Meetings", x = signal_set)

  } else {

    stop("Invalid input for `signals`.")

  }

  ## Create 24 summed `Signals_sent` columns
  signal_cols <- purrr::map(0:23, ~combine_signals(data, hr = ., signals = signal_set))
  signal_cols <- bind_cols(signal_cols)

  ## Use names for matching
  input_var <- names(signal_cols)

  ## Signals sent by Person and Date
  signals_df <-
    data2 %>%
    .[, c("PersonId", "Date")] %>%
    cbind(signal_cols)

  ## Save original `signals_df` before manipulating ------------------------
  ## Rename `Signals_sent` columns to prevent conflict
  signals_df_o <- signals_df %>%
    purrr::set_names(
      nm = gsub(x = names(.),
                replacement = "_ori_",
                pattern = "_sent_")
    )

  ## Signal label
  sig_label <- ifelse(length(signal_set) > 1, "Signals_sent", signal_set)

  ## Create binary variable 0 or 1
  num_cols <- names(which(sapply(signals_df, is.numeric))) # Get numeric columns

  ## Create Signals Total and binary values
  signals_df <-
    signals_df %>%
    data.table::as.data.table() %>%
    # active_threshold: minimum signals to qualify as active
    .[, (num_cols) := lapply(.SD, function(x) ifelse(x > active_threshold, 1, 0)), .SDcols = num_cols] %>%
    .[, ("Signals_Total") := apply(.SD, 1, sum), .SDcols = input_var]

  ## Classify PersonId-Signal data by time of day

  WpA_classify <-
    signals_df %>%
    tidyr::gather(!!sym(sig_label), sent, -PersonId, -Date, -Signals_Total) %>%
    data.table::as.data.table()

  WpA_classify[, StartEnd := gsub(pattern = "[^[:digit:]]", replacement = "", x = get(sig_label))]
  WpA_classify[, Start := as.numeric(substr(StartEnd, start = 1, stop = 2))]
  WpA_classify[, End := as.numeric(substr(StartEnd, start = 3, stop = 4))]
  WpA_classify[, Before_start := Start < (start_hour)] # Earlier than start hour
  WpA_classify[, After_end := End > (end_hour)] # Later than  start hour
  WpA_classify[, Within_hours := (Start >= start_hour & End <= end_hour)]
  WpA_classify[, HourType := NA_character_]
  WpA_classify[After_end == TRUE, HourType := "After_end"]
  WpA_classify[Before_start == TRUE, HourType := "Before_start"]
  WpA_classify[Within_hours == TRUE, HourType := "Within_hours"]

  WpA_classify <-
    WpA_classify[, c("PersonId", "Date", "Signals_Total", "HourType", "sent")] %>%
    .[, .(sent = sum(sent)), by = c("PersonId", "Date", "Signals_Total", "HourType")] %>%
    tidyr::spread(HourType, sent) %>%
    left_join(WpA_classify %>%   ## Calculate first and last activity for day_span
                filter(sent>0) %>%
                group_by(PersonId,Date) %>%
                summarise(First_signal = min(Start),
                          Last_signal = max(End),
                          .groups = "drop_last"),
              by = c("PersonId","Date")) %>%
    mutate(Day_Span = Last_signal-First_signal,
           Signals_Break_hours = Day_Span-Signals_Total) %>%
    select(-Signals_Total)

  ## hrvar treatment
  if(is.null(hrvar)){

    hr_dt <- data2[, c("PersonId", "Date")]
    hr_dt <- hr_dt[, Total := "Total"]
    hrvar <- "Total"
    hr_dt <- as.data.frame(hr_dt)

  } else {

    temp_str <- c("PersonId", "Date", hrvar)

    # hr_dt <- data2[, ..temp_str] # double dot prefix

    hr_dt <-
      data2 %>%
      as.data.frame() %>%
      select(temp_str)
  }


  ## Bind calculated columns with original Signals df
  calculated_data <-
    WpA_classify %>%
    left_join(signals_df, by = c("PersonId","Date")) %>%
    left_join(hr_dt, by = c("PersonId","Date")) %>%
    left_join(signals_df_o, by = c("PersonId","Date")) %>%
    filter(Signals_Total >= 3) %>% # At least 3 signals required

    ## Additional calculations for Flexibility Index
    mutate(TakeBreaks = (Signals_Break_hours > 0),
           ChangeHours = (First_signal != start_hour),
           ControlHours = (Day_Span <= norm_span)) %>%
    mutate(FlexibilityIndex = select(., TakeBreaks, ChangeHours, ControlHours) %>%
             apply(1, mean))

  ## Plot return
  sig_label_ <- paste0(sig_label, "_")

  ## Summary Table for Return
  ## Applies groups
  returnTable <-
    calculated_data %>%
    group_by(!!sym(hrvar)) %>%
    summarise_at(vars(TakeBreaks, ChangeHours, ControlHours),
                 ~mean(.), .groups = "drop_last") %>%
    mutate(FlexibilityIndex =
             select(., TakeBreaks, ChangeHours, ControlHours) %>%
             apply(1, mean))

  ## Main plot

  if(return == "plot"){

    plot_flex_index(data = calculated_data,
                    sig_label = sig_label_,
                    start_hour = start_hour,
                    end_hour = end_hour,
                    method = plot_method,
                    mode = mode)

  } else if(return == "data"){

    calculated_data

  } else if(return == "table"){

    returnTable

  } else {

    stop("Check input for `return`.")

  }
}

Try the wpa package in your browser

Any scripts or data that you put into this service are public.

wpa documentation built on Aug. 21, 2023, 5:11 p.m.