Base
R
provides two broad classes for representing dates,
POSIXt
and Date
.
Objects of these classes
conform with
international
standards^[Refer to http://www.iso.org/iso/home/standards/iso8601.htm]
by marking a "day" with the instant of
time that begins the day.
It is straightforward to calculate
elapsed time in units of "days" using those objects.
However, frequent business-use cases express elapsed time
in units of "months" or "years"
and the definition of "month" or "year" in units of "days"
is ambiguous.
The "mondate" package adopts a different perspective. It is based on three principles:
mondate
objects constructed according to these principles
make it straightforward to calculate
elapsed time in units of "months" and "years."
The purpose of this vignette is to highlight the usefulness of the "mondate" package in everyday and business situations that inherently express time in units of "months" or "years." Technical looks "under the mondate hood" will only be touched on for clarification.
The four major benefits of the mondate package are:
The "age" of an event plays many important roles in business use cases.
By default, Date
objects are measured in units of "days" and POSIXt
objects in
units of "seconds".
But sometimes it is more convenient to measure elapsed time in units of
"months" or "years".
This is where mondate
comes in.
If my "birth event" took place on February 29, 1996 then my age on February 28, 2006 was 10:
require(mondate) YearsBetween("1996-02-29", "2006-02-28")
or in the US
YearsBetween("2/29/1996", "2/28/2006")
and
MonthsBetween("2/29/1996", "2/28/2006")
which also results when subtracting two mondate
s
m1 <- mondate.ymd(1996, 2) m2 <- mondate.ymd(2006, 2) m2 - m1
Suppose ABC Company invoices a customer in late October 2015 and has a policy of recognizing that invoice to have been sent on the 1st of November. This code calculates the ages of that invoice in months as of the ends of 2015 and 2016:
invoiceDate = as.Date("2015-11-01") asof <- mondate.ymd(2015:2016) ages <- asof - invoiceDate print(ages)
The last example in this section is actuarial in nature. Suppose ABC Insurance Co. stores the date of insured losses in the variable (or data base field) DateOfLoss. Here are 10 random dates of loss after the end of 2010:
# generate 10 random dates after 2010 set.seed(1) z <- rexp(10, .1) DateOfLoss <- as.Date(mondate.ymd(2010) + z) names(DateOfLoss) <- paste0("Claim", 1:10) print(DateOfLoss)
Here are the four quarter-end "as-of dates" in 2013:
# Quarter-ends in 2013 asof <- mondate.ymd(2013, 3 * 1:4) names(asof) <- paste0("Q", 1:4) print(asof)
Comment:
"names" were assigned to DateOfLoss
and asof
so that
R
will automatically embellish the
data.frame below with row and column headers.
Here are the ages of the 10 losses as of each quarter end, stored in a data.frame with each claim's date of loss:
# a matrix of ages in units of months Ages <- round(sapply(asof, `-` , DateOfLoss), 1) # code ages as "not available" if the evaluation date preceeds # the Date of Loss (one instance) Ages[Ages <= 0] <- NA data.frame(DateOfLoss, Ages)
mondate
s can act arithmetically in (almost always) the same way
their underlying numeric
can
act.^[The underlying numeric
measures the number of months since
the close of business 1999-12-31 (".mondate.origin").]
In particular,
use subtraction to measure the magnitude of the interval between
two dates in units of months.
For example, the following two calculations yield the same result:
mondate("2015-12-31") - mondate("2014-12-31") mondate("2015-12-31") - as.Date("2015-01-01")
Why are the results identical
even though the subtrahends would appear to be a day apart?
The answer is that the two objects,
as.Date("2015-01-01")
and mondate("2014-12-31")
represent the same instant in time,
i.e.,
the moment that separates events occuring in 2014 from
events occurring in 2015.
This points out a new feature in mondate v1.0.
![]()
Date
s can be subracted directly frommondate
s.
mondate
enables dates to be read and displayed in more than one format.
The built-in formats that are automatically recognized are currently
The order can be changed and new formats added using base::options
for display ("writing") and
set.mondate.displayFormats
for "reading".
By default,
mondate
looks at your value of Sys.getlocale("LC_TIME")
at startup.
If "United States" appears, then USa format is selected,
otherwise, EUa is selected.
This default can be changed globally for all mondate
objects in the session or for indivisual objects.
This vignette is being written in the US,
so today's date,
r format(Sys.Date(), "%B %d, %Y")
,
will be represented using the first format above by default:
mondate(Sys.Date())
That default can be changed to the
international standard format^[ibid.]
"YYYY-MM-DD" using
base::options
and the name
mondate.default.displayFormat
:
options(mondate.default.displayFormat = "%Y-%m-%d") mondate(Sys.Date())
French users of the format "dd/mm/YYYY" can establish that default as follows:
options(mondate.default.displayFormat = "%d/%m/%Y") mondate(Sys.Date())
The
options
approach modifies the default display format for all mondates in the R session. To set the display format for just one instance of a mondate object, use thedisplayFormat
argument during the object's creation.
Here we create the first 6 month-ends of 2015 to be displayed in the French format above despite the fact that the default format is first changed to the ISO standard:
options(mondate.default.displayFormat = "%Y-%m-%d") mondate(Sys.Date()) m <- mondate.ymd(2015, 1:6, displayFormat = "%d/%m/%Y") print(m)
options(mondate.displayFormat = NULL)
More creative formats can be used, as for instance to display just the year and month, as was done in "Example 3" above.
options(mondate.default.displayFormat = NULL)
As previously mentioned,
the mondate
package has the four formats
paste(get.mondate.displayFormats(), collapse=", ")
for detecting whether a character string represents a "date."
To inform mondate
of another format for converting character to date,
use set.mondate.displayFormats
^[This function sets the options
value of
mondate.displayFormats
]
with the value(s) of your choice.
To set the French format "dd/mm/yyyy" as Priority One
for detecting dates,
add that format
to the head of the
current list of detectable formats.
The
code
below accomplishes
that.^[This example is given in ?set.mondate.displayFormats]:
set.mondate.displayFormats(c("%d/%m/%Y", get.mondate.displayFormats()), clear = TRUE)
Contining,
suppose dates in a spreadsheet are saved to a csv file in France
and the read.csv
function results in this data.frame:
data <- data.frame( cbind(Invoice=c("A", "B", "C"), datechar = c("28/11/2015", "29/11/2015", "30/11/2015"))) print(data)
The character dates can be converted automatically to Date
objects
via mondate
as follows
data$InvoiceDate <- as.Date(mondate(data$datechar)) print(data)
For more information on the codes to use when formatting dates,
see the
R
help page for the strptime
function.
To add addtional defaults according to your value of
Sys.getlocale("LC_TIME")
,
contact the
author^[chiefmurphy at gmail].
(All are welcome to visit the package's public repository at
https://github.com/chiefmurph/mondate.)
Sequences of dates in units of days
or weeks is easily accomplished using the base
R's Date
class:
seq(as.Date("2015-11-01"), by = "day", length.out = 5)
Month-sequences can similarly be generated with Date
s,
which does work well
for most dates.
Results can be disappointing,
however,
for dates near the end of the month.
Compare these two sequences
starting from the first and last days of January:
seq(as.Date("2015-01-01"), by = "month", length.out = 5) seq(as.Date("2015-01-31"), by = "month", length.out = 5)
All dates in the first sequence are the first days of the month,
but some dates in the second sequence "leak" into subsequent months.
This behavior is well documented in
R
help^[see ?seq.POSIXt
]:
Using "month" first advances the month without changing the day: if this results in an invalid day of the month, it is counted forward into the next month
Perhaps the major purpose of the mondate package
is to avoid this
shortcoming.^["Under the hood" mondate
represent dates
relative to the
percent of the month that has transpired by the close of business that day.
Kudos to Damien Laker! See the "Thank You" section at the end.]
Sequences of month ends can be accomplished in various "mondate" ways. Here are two:
seq(mondate("2015-01-31"), by = "month", length.out = 5) mondate.ymd(2015, 1:5)
The display format in the first sequence inherits from the format of the character representation of the beginning date. The display format in the second sequence is based on the author's locale (see "Date Formatting" section above). Also note that each of the objects generated above are of class "mondate".
It is often more convenient to generate month sequences from
Date
objects, and produceDate
objects, without having to resort to amondate
object in between. For that purpose theseqmondate
generic function was written.
seqmondate(x)
generates sequences of class(x) for a variety of classes:
Date
, POSIXt
, and mondate
.
For any other class(x) seqmondate(x)
will produce a sequence of
mondate
s, if possible.
By default, 'by = "month"' is assumed.
The first sequence below generates the same month-ends as in Example 8
but this time the objects generated are Date
s.
(d <- seqmondate(as.Date("2015-01-01"), length = 5)) class(d)
It has been said that there are always multiple ways to do things in R and this is no exception. Here are two ways to generate sequences of year-end dates.
seqmondate("2010-12-31", by = "year", length = 6) mondate.ymd(2010:2015)
If month- and year-end dates are intended to represent "as of" dates, it is preferable to create them as
mondate
objects rather than, say,Date
objects if those dates will be used for "date aging" in units of months/years.
Sidebar on "cut" for numerics
Acut
of anumeric
'x' is a collection of (half-open,half-closed] intervals that "cover" 'x'. By "cover" is meant that every value in 'x' is contained in some interval^[thus, not an "open cover" in the topological sense], with the exception that R excludes the minimum value of 'x' by default.^[unless you setinclude.lowest = TRUE
] By default, the right endpoint is assumed to be closed.^[change the interval to left-closed by settingright = FALSE
]A
cut
in R is represented by afactor
. Thecut
function elegantly enunciates thenumeric
intervals by clearly identifying the (open,closed] borders in the factor's levels.
R
Definition:
A cut
of a set of dates 'x' by "months" is
a collection of contiguous months such that every date in x is contained in
some month.
This correspondence between a date and its neighboring members in its 'cut' can be an important factor in the statistical analysis of events occurring during similar time periods.
There is a cut
method for mondate
s when the 'breaks' argument is
numeric
and so determines
the borders between intervals, orcharacter
and so identifies that the cover is to be a set of
day-, week-, month-, year-, or
quarter-intervals.First we will define some cuts.
Then we will see how one might use a cut
.
Because a mondate
is fundamentally a
numeric
^[the "mondate class" is defined via
setClass("mondate", contains = "numeric"
, etc.],
the following two commands --
the first on numeric
, the second on mondate
--
are fundamentally the same.
The only difference is how the cuts' levels display.
cut(seq(from = 180.5, to = 185.5, by = .5), breaks = 180:186) cut(seq(from = mondate(180.5), to = mondate(185.5), by = .5), breaks = 180:186)
In the month intervals above,
if one were to label the interval with one of the endpoints,
it seems natural to choose the closed endpoint.
That is the 'mondate' convention when 'breaks' is character
.
This bears repeating:
The 'mondate' convention is to label a character cut (breaks = "days", "months", ...) with the closed endpoint of the interval. As with
cut.default
, the closed endpoint is determined by the argumentright
: whenTRUE
the right endpoint labels the interval, whenFALSE
the left endpoint labels the interval.
We begin with examples of mondate
cuts,
with breaks
being numeric
and character
.
The following two commands generate the same cut.
The first explicitly sets the break points with the
month-ends beginning 2014-12-31 and ending six months later.
The second implicitly sets the same break points.
As with cut.default
,
the labels of the first cut clearly enunciate the
(open,closed] monthly intervals.
The labels of the second cut only display
the closed endpoint.
cut(seq(mondate("2015-01-15"), mondate("2015-06-15"), by = .5), breaks = mondate.ymd(2014) + 0:6) cut(seq(mondate("2015-01-15"), mondate("2015-06-15"), by = .5), breaks = "month", include.lowest = TRUE)
In the case that breaks is character
it is unfortunate to have to
set include.lowest = TRUE
,
opposite its default value
FALSE
.^[Excluding the minimum value of 'x' would be somewhat random
-- forgive the colloquialism --
given
that other values of 'x' are likely to in the same time interval.
In the case of character
breaks,
include.lowest=FALSE
throws an error.]
Other cut
arguments as well have default values that may seem
counterintuitive
for cutting dates.
Perhaps the most troubling default is right = TRUE
for Date
objects
because it violates
the basic principle that Date
objects begin on,
and can be considered synonymous with,
the instant beginning the day,
i.e., the left endpoint.
For those and other reasons
a new cutmondate
generic exists in mondate v1.0.
The
cutmondate
work onDate
,mondate
, and other objects with arguments that are more appropriate for their class.
Additionally,
three new arguments were added to cut.mondate
,
which we will cover in due course.
We now turn our attention to the cutmondate
methods.
The 'cutmondate' collection of methods are most effective when 'breaks' defines a cover in terms of months or multiple months.
When the object being cut is a
Date
orPOSIXt
, the breakpoints are assumed to begin the period, right = FALSE by default, and the levels are labeled by the first date in the period. If necessary, set labels to the right endpoint withright = TRUE
.
Here we regenerate the same DateOfLoss dates from Example 3, and cut them into month intervals.
set.seed(1) z <- rexp(10, .1) monDOL <- mondate.ymd(2010) + z DateOfLoss <- as.Date(monDOL) print(DateOfLoss) cutmondate(DateOfLoss)
The "28 Levels" says that it takes 28 contiguous months to cover 'DateOfLoss'.
Note that the levels are labeled with the first day of each month because
in this case
right=FALSE
by default.
Specify right=TRUE
and the levels are labeled by
the last day of the month,
which occurs by default in the second, mondate
case below:
cutmondate(DateOfLoss, right = TRUE) cutmondate(monDOL)
Before tackling the final examples,
it is important to point out three new arguments for cut.mondate
(and therefore for cutmondate
)
that do not appear in cut.default
or cut.Date
:
See the help for cut.mondate
for details behind these arguments.
We will show use of the first and third.
The 'startmonth' argument enables fiscal year cuts!
Suppose ABC's fiscal year is July 1 through June 30. The dates of loss in the previous examples can be cut into fiscal years by setting startmonth = 7.
Here we show two ways to cut DateOfLoss by fiscal year, the choice depending on whether the company identifies its fiscal year with the beginning day or ending day of the period.
cutmondate(DateOfLoss, breaks = "year", startmonth = 7) cutmondate(DateOfLoss, breaks = "year", startmonth = 7, right = TRUE)
Continuing, suppose ABC Company conventually refers to a fiscal year not by "the beginning date" but by the first calendar year. The dates can be cut and the abbreviated labels automatically generated in the single function call
cutmondate(mondate(DateOfLoss, displayFormat = "%Y"), breaks = "year", right = FALSE, startmonth = 7)
In the final example we aggregate and plot data by fiscal year.
ABC Insurance Co. records loss amounts associated with the dates of loss at regular intervals. Suppose the amounts as of 2016-06-30 are
(LossAmount <- round(rnorm(10, 1000, 100), -1))
ABC's actuaries want to aggregate loss by age. The C-Suite wants aggregations by fiscal year. Everyone wants visuals! No problem.
First, cut the loss dates into fiscal years (FY),
but this time also use
attr.breaks = TRUE
.
To make the break points available for subsequent date aging, set
attr.breaks = TRUE
incutmondate
.
FY <- cutmondate(mondate(DateOfLoss, displayFormat = "%Y"), breaks = "year", right = FALSE, startmonth = 7, attr.breaks = TRUE) asof <- mondate.ymd(2016, 6) age <- asof - attr(FY, "breaks")[FY] (data <- data.frame(DateOfLoss, LossAmount, FY, FYage = age))
Calculate loss totals --
here we use aggregate
--
and plot those totals --
here we use base R
graphics.
The first plot is by FY, the second by FY age.
(LossByFY <- aggregate(LossAmount ~ FY, data, sum)) barplot(LossByFY$LossAmount, names.arg = LossByFY$FY, ylab = "$thousands", xlab = "Fiscal Year", main = paste("Total Loss\n As of", asof), col = "blue") # and (LossByFYage <- aggregate(LossAmount ~ FYage, data, sum)) barplot(LossByFYage$LossAmount, names.arg = LossByFYage$FYage, ylab = "$thousands", xlab = "Fiscal Year Age (months)", main = paste("Total Loss\n As of", asof), col = "blue")
If fiscal year coincides with calendar year
(January 1 through December 31)
then base R's cut
method for Date
works well
for calculating accident year (AY),
and calculating age by month is even more straightforward.
ABC wants to calculate the age of the losses as of the end of 2016.
First, cut the DateOfLoss's into accident year using base cut
.
Using the fact that base cut
will label each cut with the first day of the period,
subtract that date from the asof("2016-12-31") yields the age in months:
AY <- cut(DateOfLoss, breaks = "years") AYage <- mondate("2016-12-31") - as.Date(AY) print(AYage)
```
The mondate package represents dates in a way that
enables date aging and sequencing in a mathematically "invertible"
manner.^[Invertible in the sense that retracing a monthly sequence
from the end should
produce the same sequence but in reverse order.
That is not always possible with Date
sequences by "month".
Consider that
seq(as.Date("2015-01-31"), by = "month", length.out = 2)
yields
"2015-01-31" "2015-03-03"
whereas
seq(as.Date("2015-03-03"), by = "-1 month", length.out = 2)
yields
"2015-03-03" "2015-02-03"]
A mondate
object is not appropriate for all situations.
For example,
a mondate
halfway through the month of February
falls on the close of business of the 14th day
(in most cases)
but falls on the 15th day for April.
If a time period other than month or year is more suited
to the situation,
use an
R
object other than mondate
.
Factors that associate similar events by date can be created
by
R's
cut
methods.
A cut
method exists for mondate
objects but the preferable
function to use for cutting those and other
date-representing objects (Date
, POSIXt
)
is cutmondate
because the arguments default to values
intuitively appropriate for the object.
The "startmonth" arguments allows the creation of fiscal year cuts.
Many thanks to the
R
Development team for their work on
Date
and POSIXt
objects and methods.
A special thank you goes out to Gabor Grothendieck for his suggestion of,
and help with,
cut.mondate
.
Finally, the "mondate perspective" was motivated by Damien Laker in his somewhat obscure 2008 paper Time Calculations for Annualizing Returns: The Need for Standardization^[The Journal of Performance Measurement, Summer 2008] where he states the obvious :-)
"Annualization calculations based on whole months never wind up accidentally calculating that a year is anything other than a year long."
Mr. Laker's Recommended Method is based on two cases:^[Ibid. Although the terms "start on" and "finishes on" are not specifically defined in Mr. Laker's paper, the spirit is intuitively understood and reflected in the package.]
The mondate package embraces this end-of-business, month-centric, a-year-equals-twelve-months perspective.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.