knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  warning =  FALSE,
  message = FALSE
)

Recurring calendar events are very common in everyday life. When a set of these events recur in a pattern, we call it a 'schedule'. While some schedules are simple, others can complicated and thus difficult to work with. This is especially true when the pattern of events is irregular. The objective of gs is to provide a way to make working with schedules simpler and easier.

To start with, we will import gs along with the magrittr package, which will be useful later on.

library(gs)
library(magrittr)

gs builds atop the lubridate package and is designed for use alongside it. For this reason, it is imported automatically when gs is imported.

Creating a schedule

In this section we will create a schedule of events occurring every year on New Year's Day (January 1st). This simple example which teach you the basics of schedules so that you can build-up to more involved examples later on.

Ask yourself, what is unique about New Year's Day (from a calendar perspective)? It occurs on the first day of the month, but so do many other days. It occurs in January, but so do many other days. You would be right to say that New Year's Day always occurs on the first day of the year. It is the only day that does so.

gs provides a function to create schedules where the events occur only on certain days of the year. The function is called on_yday() and it accepts numeric input depending on which day of the year you wish to schedule events for. For example, on_yday(2) would create a schedule of events occurring on the second day of every year. But in our case we want to create a schedule of events occurring on only the first day of every year, so we do so as follows.

on_yday(1)

You may be familiar with the similar yday() function from the lubridate package. This is one of a few functions lubridate provides for accessing the properties of a date or date-time object. The lubridate::yday() function returns whatever day of the year a date or datetime occurs on. gs provides equivalent functions which (instead of extracting the property from a datetime) allow you to create a schedule of events that meet the given input. So by computing on_yday(1) above, we created a schedule of all the days occurring on the first day of every year.

By itself, this isn't that useful. What will help make it more so is that we can make this schedule into an object like so:

on_new_years_day <- on_yday(1)

We now have the on_new_years_day schedule object that we can put to use. You can think of this schedule as encompassing all the possible occurrences of New Year's Day throughout time. It is not limited to just one particular date or dates.

Using schedules

Now that we have a schedule object, we can begin using it.

Testing for events

The first thing we can use schedules for is to find out whether certain dates fall on them or not. Lets create an arbitrary set of dates using base R and call them my_dates:

my_dates <- seq.Date(from = as.Date("2001-01-01"),
                     to = as.Date("2001-01-10"),
                     by = "1 day")

my_dates

We can then use the happen() function provided by gs to find out which (if any) of these dates fall on New Year's Day. happen() takes a schedule object as its first argument and a date or vector of dates as the second argument. The idea is that the syntax is readable. We are asking for the events which 'happen', 'on New Year's day' from 'my_dates':

happen(on_new_years_day, my_dates)

We can see from this that only the first of our dates fall on New Year's Day. In this example you can validate this simply by inspecting the dates.

Note that you do not have to make the on_new_years_day schedule into an object if you don't want to. We could have achieved the same result as follows:

happen(on_yday(1), my_dates)

But, for reasons that will become clear, it is often more useful to create schedules objects. Whenever you do so, I also recommend giving them a descriptive name and starting them with the prefix 'in_' or 'on_', as we did above. Among other things, this will make the syntax natural and readable.

Getting events

The next useful thing we can do with a schedule object is get the events from it. This is done using the schedule_days() function, which accepts a schedule as its first argument.

One snag with doing this however is that the total number of New Year's Days is theoretically infinite. So if we try and run schedule_days(on_new_years_day) we will get an error. What we can do is place date limits on the resulting output using the from and to arguments, which can both accept a date value:

schedule_days(on_new_years_day, 
              from = as.Date("2001-01-01"), 
              to = as.Date("2001-01-10"))

This gives us all the occurrences of New Year's Day within the boundaries we have specified. Let's expand these boundaries so we get more occurrences of New Year's Day. Here I extend the boundaries from the start of 1995 to the end of 2005.

schedule_days(on_new_years_day, 
              from = as.Date("1995-01-01"), 
              to = as.Date("2005-12-31"))

Because this is a little cumbersome to type, schedule_days() allows you to abbreviate to and from to only the numeric years you wish to use as boundaries. These go from the start of the from year to the end of the to year. That means this code is equivalent to what we just ran:

schedule_days(on_new_years_day, from = 1995, to = 2005)

If the events we desire only occur in a single year, we can use the during argument as a further numeric shortcut, which is the equivalent of setting to and from to the start and end of one particular year:

schedule_days(on_new_years_day, during = 2005)

Further basic schedules

The on_yday() function isn't the only type of schedule that you can create. gs provides a whole host of functions to create schedules. As explained these all follow from the syntax of the accessor functions found in lubridate. In each case the same function text name is prefixed by either on_ or in_ depending on how one would naturally say it.

For example, the on_mday() function creates a schedule of events occurring on the specified days of every month. Imagine your job pays your salary on the 25th of each month; if you wanted to create a schedule of your paydays you would do so as follows:

on_payday <- on_mday(25)

If instead you get paid weekly every Friday, you can use the on_wday() function which accepts either a day number (where 1 is Sunday by default), a day name or a day abbreviation as the first argument. This means each of these are equivalent and create the schedule of your weekly paydays:

on_weekly_payday <- on_wday(6)
on_weekly_payday <- on_wday("Friday")
on_weekly_payday <- on_wday("Fri")

If we wanted to create a schedule of events occurring in a particular month, we would use the in_month() function. For example, if we wanted to create a schedule of events occurring in December, we could do so in any of the following ways, each of which produces the same schedule:

in_dec <- in_month(12)
in_dec <- in_month("Dec")
in_dec <- in_month("December")

I won't go over all the available functions here. You can see them for yourself in the package reference documentation. They all follow the same principle of generating schedules from the inputs you provide to them.

Each of the functions will accept multiple inputs which in each case creates a schedule of events occurring on both the inputs. For example on_wday("Mon", "Wed", "Fri") creates a schedule of events occurring on all of those days of the week. For convenience the on_weekday() and on_weekend() functions are also provided in the package without you having to create them.

Joining schedules

By itself, New Year's Day isn't that interesting (from a calendar perspective). We could have achieved the same thing using base R. The power of gs comes when dealing with more intricate schedules. For this, we need something more. gs allows you to compose more complex schedules by combining basic ones.

Intersecting schedules

Say that, instead of New Year's Day, we were interested in Christmas day. Ask yourself, what is uniquely special about Christmas day (from a calendar perspective)? You may think of using on_yday() again and say that Christmas occurs on the 359th day of the year and so create the schedule on_yday(359). But this wouldn't work because in a leap year Christmas day is the 360th day of the year. The only way to create a schedule of events on Christmas is to schedule them on December 25th.

gs provides some functions to help us along the way. We have the in_month() function which creates a schedule of events in a given month. We could use that, but in_month(12) would give us all the events in December and we are only interested in the 25th one. We could also use the on_mday() function. But on_mday(25) is going to give us a schedule of events occurring on the 25th of every month, not just December.

What we need is the intersection of these two schedules. For this purpose gs provides the only_occur() function, which accepts two schedules and returns a single schedule which is the intersection of the two inputs.

on_christmas_day <- only_occur(on_mday(25), in_month(12))

Again the syntax is designed to be readable. We are creating a schedule called on_christmas_day where the events 'only occur' on 25th day of the month and 'only occur' in the twelfth month of the year.

Now that we have this new on_christmas_day object, we can do with it the same set of things we did with the on_new_years_day object. We can test if certain dates fall upon it with the happen() function. Or we can get the events from it using the schedule_days() function:

schedule_days(on_christmas_day, from = 2000, to = 2004)

If I wanted to create a schedule of Boxing Day days, the process is the same. Boxing day occurs on Dec 26th:

on_boxing_day <- only_occur(on_mday(26), in_month(12))

schedule_days(on_boxing_day, from = 2000, to = 2004)

Uniting schedules

Sometimes, instead of finding the intersection between schedules, you will instead want to unite schedules. For example, now that we have the schedules for on_new_years_day and on_christmas say we wanted to create a schedule of public holidays. To do this, gs provides the also_occur() function which takes two schedules as input and returns a single unified schedule as its output.

on_public_holidays <- also_occur(on_christmas_day, on_new_years_day)

schedule_days(on_public_holidays, from = 2000, to = 2004)

We now have a new schedule of public holidays available for us to use further.

One thing to take note of is that the schedules we made use of in this step were themselves created in earlier steps. This can be done an arbitrary number of times and allows huge flexibility when creating your own schedules.

The syntax of the also_occur() and only_occur() functions also lend themselves to piping (%>%) using the magrittr package. For example, if I wanted to include Boxing Days in my public holiday schedule I could re-make it as follows:

on_public_holidays <-
  on_christmas_day %>% 
  also_occur(on_new_years_day) %>% 
  also_occur(on_boxing_day)

schedule_days(on_public_holidays, from = 2000, to = 2004)

Again the syntax is readable. We are creating a public holidays schedule, where the events occur on Christmas day, and 'also occur' on New Year's Day and 'also occur' on Boxing Day.

Inverting schedules

Once you have a schedule it is often useful to have way to invert it so as to get the events that do not occur on that schedule. This can be done using the dont_occur() function. For example, say that instead of focusing on holidays, you instead wanted to get a schedule of working (or business) days. These are the days that occur on neither weekends nor on public holidays. You could get that schedule of events as follows. First you would create the schedule of non-working days as follows:

on_non_working_days <- 
  on_weekend() %>% 
  also_occur(on_public_holidays)

And then invert it using dont_occur():

on_business_days <- dont_occur(on_non_working_days)

You then have a new schedule of business days to work with that can be used in exactly the same ways already shown.

Nth occurrences within periods

One of the most common schedule patterns are when events take place on the nth occurrence of another schedule within a given period. For example, in the United States (US), Martin Luther King Jr. Day is celebrated on the third Monday in January.

This date rule has a few components. 'Monday', 'January' and 'third occurrence within a given month'. Mondays don't have their own schedule, but instead are created by on_wday("Mon"). January also doesn't have it's own schedule, but is instead created using in_month("Jan").

in_jan <- in_month("Jan")
on_monday <- on_wday("Mon")

To find the third occurrence of a scheduled event within a period gs provides the on_third() function, which accepts a schedule as its first argument and a period type string as its second argument (called within_given =). So the schedule of every third Monday of the month is created as follows:

on_third_monday_month <- on_third(on_monday, within_given = "month")

But because Martin Luther King Jr. Day doesn't occur every month, it only occurs in January, we have to refine our schedule one step further using only_occur():

on_mlk_jr_day <- 
  on_third_monday_month %>% 
  only_occur(in_jan)

We now have our schedule can can call on it for our purposes as before. Say that we wanted to get the occurrences of Martin Luther King Jr. Day from 2010 to 2020:

schedule_days(on_mlk_jr_day, from = 2010, to = 2020)

on_third() isn't the only function provided for this purpose. There is also on_first(), on_second(), on_fourth() and on_last(). These are all convenience functions of on_nth() which instead accepts an integer as its first argument. Positive integers encode the nth occurring event within the period; negative integers encode the nth last occurring events of the period.

Say you wanted to get the occurrences of Memorial Day in the US, which occurs on the last Monday in May. This time, let's do it in one step:

on_memorial_day <-
  on_last(on_wday("Mon"), within_given = "month") %>% 
  only_occur(in_month("May"))

schedule_days(on_memorial_day, from = 2010, to = 2020)

As a somewhat bizarre example, say that you play for a football team whose matches are on the 12th either Monday, Wednesday or Friday of every quarter. You could find your match days as follows:

on_monday_wednesday_friday <- on_wday("Mon", "Wed", "Fri")

on_football_days <- on_nth(12, 
                           on_monday_wednesday_friday, 
                           within_given = "quarter")

schedule_days(on_football_days, from = 2019, to = 2020)

Equally bizarrely, let's instead say they occur on the 12th last either Monday, Wednesday or Friday of every quarter. For this we do the same thing but use a negative integer:

on_football_days <- on_nth(-12, 
                           on_monday_wednesday_friday, 
                           within_given = "quarter")

schedule_days(on_football_days, from = 2019, to = 2020)

Conclusion

This vignette has provided an introduction to the gs package, which implements a grammar of recurring calendar events in R.

gs has greater capabilities than just those explained above. The remainder of the package documentation provides more detail on both the features shown and those we have not had time to get to.

Because gs is a grammar, it provides the building blocks for you as a user to combine the different parts in new and interesting ways. You can then make your own schedules based on your own needs.



jameslairdsmith/gs documentation built on July 19, 2023, 12:49 a.m.