In jmiahjones/lunch.time: Analyze Survey Data related to Eating Trends

knitr::opts_chunk$set(echo = TRUE)
knitr::opts_knit$set(root.dir = "..")

Introduction

Work-life balance is a topic that receives much more attention in today's economy. Many firms have experienced the disappointment that occurs when bright, talented recruits find themselves saddled with too-burdensome responsibilities. Perhaps they were overzealous or overambitious in the new role, or perhaps they were not clear on the necessities of the position. Either way, the zombie-walk of a burned-out employee is unmistakable. In my own professional past, I have had to overcome the detrimental effects of burnout on my work and personal life.

It is difficult to identify -- or indeed quantify -- the slow accumulation of burnout on an individual's psyche. Presupposing merely a tolerance of the Behaviorist's worldview, we may use behavioral surrogates that we deem one of the deleterious effects of this mainly cognitive phenomenon.

One possible surrogate for the burnout phenomenon is lunch-eating habits. Personal experience has yielded the insight that increasing workloads may lead to unreflective, asocial, or even worse -- unobserved -- lunch breaks in high-burnout-risk individuals. Unfortunately, the worst of these was the case during my burnout period, during which I would frequently sprint to the end of a 10-hour day only noticing during the trek down the stairs that lunch was completely skipped.

In an effort to monitor my own level of burnout, I am recording my lunch eating habits on a Google Sheet during the work-week. This data will serve two purposes. The first is for the inherent improvement in these habits according to the adage (often attributed to 20th-century management consultant Peter Drucker), "That which is measured is managed." The alternative, somewhat opposite interest is in posterior analysis of these habits, so that I might gain insight into the forces that push my lapses in appropriate perspective.

If you have similar goals, or merely wish to see what this kind of personal tracking is all about, feel free to use the functions in my package. Below I give a demonstration of this tracking experiment.

Loading the Data

First, we're going to need to load dplyr and ggplot2 in order to perform our analysis. magrittr is installed along with dplyr and gives access to the useful %>% operator.

suppressPackageStartupMessages(library(dplyr))
library(magrittr)
library(ggplot2)

If the Sheet is not publicly visible, we can use the load_sheet function to load the worksheet from Google Sheets and format the columns appropriately. Otherwise we can access it directly from the web.

public <- T
if(public) {
  ws_site <- paste("https://docs.google.com/spreadsheets/d/e/2PACX-1vTcycMCOswocd",
    "1sceCjwEQXlN-2_zpxsUOK5T0uTZTv12TJrFktxUDPJXU1NAAOCy4V9F_",
    "1kE07Gryw/pub?output=xlsx",
    sep=""
  )
  worksheet <- googlesheets::gs_url(ws_site, lookup=F)
  df <- googlesheets::gs_read(worksheet, ws = 1, col_names=T) %>% 
    lunch.time::set_columntypes(df)
} else {
  df <- lunch.time::load_sheet("SelfLunchSurvey", 1)
}

In this example, we are using the saved version of the worksheet available in data/.

load("./data/lunch_survey.rda")
df <- lunch_survey %>% 
  lunch.time::set_columntypes()

Analysis

Percentage Successful Lunch-Eating

sample.prop <- df %>%
  lunch.time::percent_hungry(quo(Lunch_Eaten)) %>% 
  round(1)

First, we look at the percentage of days during which lunch was eaten. As of this date, that percentage is r sample.prop$\%$.

Summarizing the Adherence

When summarizing the data that was collected, it is important to analyze adherence. It could be the case, for example, that days during which no information was recorded may have been more likely to be unsuccessful lunch-eating days.

To examine the overall adherence, we will identify the first and last dates recorded in the data set, and look for all of the workdays in between these two dates that contain no data.

missing_days <- df %>% 
  pull(Timestamp) %>% 
  lunch.time::no_adherence()

In this case there are r length(missing_days) school days with no data recorded.

Descriptives about Lunch

Before we obtain any further analyses we should look to the recorded information around lunch. Location is an important variable, since it may contain some information around how divorced lunch time is from work. Eating at one's desk is not undesirable, however it may provide evidence along with other factors of an unhealthy work ethic. The data suggest a significant portion of lunches are spent at my desk. The NA value indicates a day that lunch was not eaten.

df %>% lunch.time::lunch_pie("Full_Location", "Distribution of Location")

Based on the duration, we see that most lunches were relatively short. The mode length of lunch was less than 15 minutes. This suggests that most lunches were not focused on interaction with others.

df %>% lunch.time::lunch_pie("Length", "Distribution of Lunch Duration")

The time of lunch seems to cluster around 1pm. The bulk of lunch times fall between 12pm and 2pm. The NA category represent days without lunch.

df %>% lunch.time::lunch_pie("Time_of_Lunch", "Distribution of Lunch Time")

The Work-Life Index

We may create an index for each outcome that rates the "distance from burnout" of each lunch occurrence. Using a few intuitive principles and a heuristic construction, this index may serve as a useful quantity for rating each lunch.

duration_len <- levels(df$Length) %>% length()
df <- df %>% 
  mutate(WLI = 
           (dplyr::if_else(Lunch_Eaten == "Yes", true=1/3, false=0, missing=0)) +
           (dplyr::if_else(Full_Location == "Desk", true=0, false=1/3, missing=0)) +
           (dplyr::if_else((!is.na(Length)), 
                           true=(as.numeric(Length) / (duration_len * 3)), 
                           false=0)) +
           (dplyr::if_else((!is.na(Time_of_Lunch)), 
                           true=((Time_of_Lunch %in% c("11am","12pm", "1pm")) / 3), 
                           false=0))
         ) %>% 
  mutate(Date = 
            as.Date(
              strptime(Timestamp, format="%Y-%m-%d")
            )
  )

df %>% 
  ggplot(aes(x=Date, y=WLI)) +
  geom_point(col="navy") +
  scale_y_continuous(limits = c(0, NA)) +
  geom_hline(yintercept = 0, col="firebrick") +
  annotate("text", x=min(df$Date), y=0.09, 
           label="No Lunch", col="firebrick", hjust=0) +
  labs(title="Work-Life Index Over Time", 
       subtitle="(Higher is Better)", y="Work-Life Index") +
  theme_bw()

Now we have a quantification that we might accept as a decent measure of work-life balance -- at least, insofar as it relates to lunch habits. This enables us to move one step further in our analysis and consider how this balance fluctuates over time. With this measure a plethora of investigative topics may be opened to us. For example, consider the investigative questions:

How "good" is our work-life balance in general?
How does it compare to some another time period?
Does it change based on day of the week?

These questions now have natural statistical interpretations and we may avail ourselves of well-defined techniques in the pursuit of an answers.

Some Considerations of the WLI Model

While this the Work-Life Index (WLI) above has provided some flexibility in our treatment of the subject, let us take a moment to lay out some assumptions and considerations for the model.

Admittedly Work-Life Balance (WLB) is a complex idea to analyze. There are many different factors that contribute to a comprehensive qualitative assessment. For example, we could think of WLB as comprising the strength of familial bonds, social involvement, personal endeavors, spiritual reflection, as well as professional fulfillment. These rather abstract goals are laden with cultural expectations -- so much so that a researcher might cringe at the attempt. Equally cringeworthy are the tasks of proving that a directly-derived quantitative assessment is not in fact measuring the cultural acceptability of some individual's way of life. One person's ideal balance may be unambitious to another, or vice-versa.

Few researchers make it their goal to deem entire cultures as "poor" based on some personalized qualitative-turned-quantitative assessments. We hope to avoid this in the analysis of the lunch data. We have derived a quantitative measure which we believe may find rather wide acceptability---or at least very narrow displeasure. The judgement is further removed by assessing a single individual according to their individually-created index.

This is why no functions are provided in this package for creating such an index. We believe that these considerations should be made explicitly by each researcher to avoid the effects mentioned above.

By considering lunch habits as a microcosm of WLB factors, the goal is to trade the previous complexities for the measurable habits of lunch. In this microcosm we obtain an incomplete, yet well-defined idea of WLB. Thus in the statistical sense, we may consider the lunch WLI to be a sample from a larger Work-Life Balance population, which includes some of the more intractable topics as well as those captured in the WLI.

Lunch is intrinsically entangled by the opposing natures of professional endeavors and the non-professional endeavors. For example, lunch can be an inherently social event, or a familial one, or be used as an opportunity for personal betterment, etc. Therefore we view this data as a closely-related, though not necessarily unbiased sample from a larger population of quantifications of WLB.

Then in the procedures that follow, the test of population statistics may be interpreted both as tests of this underlying hypothetical WLB population and the lunch WLI population.

Goodness of Balance

A simple hypothesis test can be used to determine the goodness of Work-Life (WL) balance. A simple method is to break the WLI into two regions separated by a threshold. WLI below this threshold may be classified broadly as "poor" and those above the threshold may be called "good".

threshold <- 0.5
test <- t.test(df$WLI, mu = threshold, alternative="greater")
p <- test$p.value; p

Then with the significance level above, we may conclude that the true work-life balance was better than our threshold. We can provide a confidence interval for the mean of the WLI using the t_ci() function.

df %>% 
  dplyr::pull(WLI) %>% 
lunch.time::t_ci(alpha=0.05, rounding_places=2)

Appendix: Interpreting the WLI

To aid in interpreting each WLI number, we include the code below. This uses all the possible outcomes present in the data and enumerates the different possible measures for the WLI.

lunch_eaten <- df$Lunch_Eaten %>% unique()
location <- df$Full_Location %>% unique()
length <- df$Length %>% unique()
times <- df$Time_of_Lunch %>% unique()

possibilities <- expand.grid(Lunch_Eaten=lunch_eaten, 
                             Full_Location=location, 
                             Length=length, 
                             Time_of_Lunch=times)

possibilities %>%   
  mutate(WLI = 
           (dplyr::if_else(Lunch_Eaten == "Yes", true=1/3, false=0, missing=0)) +
           (dplyr::if_else(Full_Location == "Desk", true=0, false=1/3, missing=0)) +
           (dplyr::if_else((!is.na(Length)), 
                           true=(as.numeric(Length) / (duration_len * 3)), 
                           false=0)) +
           (dplyr::if_else((!is.na(Time_of_Lunch)), 
                           true=((Time_of_Lunch %in% c("11am","12pm", "1pm")) / 3), 
                           false=0))
         ) %>% 
  pull(WLI) %>% 
  unique() %>% 
  sort() %>% 
  knitr::kable(col.names='WLI Values')