This package contains several functions that I find myself constantly having to rewrite while doing data analysis. It is very much under development at this point in time.

Get season for date-time object

This is a very simple but useful function. Often I like to break time-series data up by season to create separate models or plots. The season function allows this to be accomplished easily. For example,

require(lubridate)
require(dplyr)
require(myhelpr)

x <- dmy("3/2/2015")
season(x, label = FALSE)

x <- data_frame(
  ts = dmy_hm(c("1/5/2015 12:00", "1/7/2015 15:00", "1/11/2016 2:00"))
  )
x %>% 
  mutate(Season = season(ts))

The label parameter specifies whether to return a numeric value (1-4 where Summer = 1) or a character string label. This parameter defaults to TRUE.

Financial year

Often we want to calculate the financial year of a date-time object. The fye and fyb functions calculate the financial year ending and financial year ending values, respectively. For example,

x <- data_frame(ts = dmy("1/1/2010") + months(0:11))
x %>% 
  mutate(fye = fye(ts),
         fyb = fyb(ts))

This can be particularly useful when grouping data frames based on year and season. A particularly easy to make mistake is grouping on season and year without accounting for the fact that December will be grouped with January and February within the same year, rather than with January and February in the following year as would be preferred. The below code shows a simple way to avoid this issue,

# Create data frame with clear trend and some noise
x <- data_frame(ts = dmy("1/1/2010") + months(0:35),
                value = 1:36 + rnorm(36, sd = 0.2))

# Incorrect analysis
x_bad_summary <- x %>% 
  mutate(Season = season(ts),
         Year = year(ts)) %>% 
  group_by(Year, Season) %>% 
  summarise(mean_val = mean(value))
with(x_bad_summary, plot(mean_val))

# Fixed analysis
x_good_summary <- x %>% 
  mutate(Season = season(ts),
         Year = ifelse(Season == "Summer",
                       fye(ts),
                       year(ts))) %>% 
  group_by(Year, Season) %>% 
  summarise(mean_val = mean(value))
with(x_good_summary, plot(mean_val))

In the first plot January and February has been grouped with the next Summer's December which causes an unexpected spike for each summer. The second plot corrects this by setting all summer Year values to financial year ending which allows for correct grouping.

A new function season_year has been added to streamline the above code,

x_season_year <- x %>% 
  mutate(Season = season(ts),
         Year = season_year(ts)) %>% 
  group_by(Year, Season) %>% 
  summarise(mean_val = mean(value))
with(x_season_year, plot(mean_val))

which gives the same result. This is likely to be a bit slower because the season is effectively calculated twice, but I think the convenience and clarity is worth it.

TODO



camroach87/myhelpr documentation built on May 13, 2019, 11:03 a.m.