mkseas: Make a date into a seasonal factor

mkseasR Documentation

Make a date into a seasonal factor

Description

Discretizes a date within a year into a bin (or factor) for analysis, such as 11-day groups or by month.

Usage

mkseas(x, width = 11, start.day = 1, calendar, year)

Arguments

x

A data.frame with a date column (of Date or POSIXct class)

It can also be an integer specifying the Julian day (specify year to determine the leap year)

If it is omitted, the full number of days will be calculated for the year, determined by either year or calendar

width

either numeric or other character value; if it is numeric, it specifies the number of days in each bin (default is 11 days); if character it specifies a common calendar usage, such as "mon" for months; see details below

start.day

this is the start of the season, specified as either a as a Date to specify a month and day (year is ignored; day of month is ignored if width relates to a month), or as a numeric day of year, between 1 and the number of days for the calendarter a leap day

calendar

used to determine the number of days per year and per bin; if not specified, a proleptic Gregorian calendar is assumed; see year.length

year

required if x is omitted, or if x is a Julian day integer and width is non-numeric; used to calculate leap year

Details

This useful date function groups days of a year into discrete bins (or into a factor). Statistical and plotting functions can be applied to a variable contained within each bin. An example of this would be to find the monthly temperature averages, where month is the bin.

If width is integer, the width of each bin (except for the last) will be exactly width days. Since the number of days in a year are not consistent, nor are always perfectly divisible by width, the numbers of days in the last bin will vary. mkseas determines that last bin must have at least 20% of the number of observations for a leap year, otherwise it is merged into the second to last bin (which will have extra numbers of days). If width is numeric (i.e. 366/12), the width of each bin varies slightly. Using width = 366/12 is slightly different than width = "mon". Leap years only affect the last bin.

Other common classifications based on the Gregorian calendar can be used if width is given a character array. All of these systems are arbitrary: having different numbers of days in each bin, and leap years affecting the number of days in February. The most common, of course, is by month ("mon"). Meteorological quarterly seasons ("DJF") are based on grouping three months, starting with December. This style of grouping is commonly used in climate literature, and is preferred over the season names ‘winter’, ‘spring’, ‘summer’, and ‘autumn’, which apply to only one hemisphere. The less common annual quarterly divisions ("JFM") are similar, except that grouping begins with January. Zodiac divisions ("zod") are included for demonstrative purposes, and are based on the Tropical birth dates (common in Western-culture horoscopes) starting with Aries (March 21).

Here are the complete list of options for the width argument:

  • numeric: the width of each bin (or group) in days

  • 366/n: divide the year into n sections

  • "mon": month intervals (abbreviated month names)

  • "month": month intervals (full month names)

  • "DJF": meteorological quarterly divisions: DJF, MAM, JJA, SON

  • "JFM": annual quarterly divisions: JFM, AMJ, JAS, OND

  • "JF": annual six divisions: JF, MA, AJ, JA, SO, ND

  • "zod": zodiac intervals (abbreviated symbol names)

  • "zodiac": zodiac intervals (full zodiac names)

If a non-Gregorian calendar is used (see year.length), the number of days in a year can be set using calendar attribute in the date column (using attr). For example, attr(x$date,"calendar") <- "365_day" will set the dates using a 365-day per year calendar, where February is always 28-days in length. If this attribute is not set, it is assumed a normal Gregorian calendar is used. Calendars with 360-days per year (or 30-days per month) are incorrectly handled, since February cannot have 30 days, however this can be forced by including a duplicate February date in x for each year.

Value

Returns an array of factors for each date given in x. The factor also has four attributes: width, start.day, calendar (assumed to be 366, unless from attribute set in Date), and an array days showing the maximum number of days in each bin.

See examples for its application.

Locale warning

Month names generated using "mon" or "months" are locale specific, and depend on your operating system and system language settings. Normally, abbreviated month names should have exactly three characters or less, with no trailing decimals. However, Microsoft-based operating systems have an inconsistent set of abbreviated month names between locales. For example, abbreviated month names in English locales have three letters with no period at the end, while French locales have 3–4 letters with a decimal at the end. If your OS is POSIX, you should have consistent month names in any locale. This can be fixed by setting options("seas.month.len") <- 3, which forces the length of the months to be three-characters in length.

To avoid any issues supporting locales, or to use English month names, simply revert to a C locale: Sys.setlocale(loc="C").

Note

The phase of the Gregorian solar year (begins Julian day 1, or January 1st) is not in sync with the phase of "DJF" (begins Julian day 335/336) or "zod" (begins Julian day 80/81). If either of these systems are to be used, ensure that there are several years of data, or that the phase of the data is the same as the beginning Julian day.

For instance, if one years worth of data beginning on Julian day 1 is factored into "DJF" bins, the first bin will mix data from the first three months, and from the last month. The last three bins will have a continuous set of data. If the values are not perfectly periodic, the first bin will have higher variance, due to the mixing of data separated by nearly a year.

Author(s)

Mike Toews

References

https://en.wikipedia.org/wiki/Solar_calendar

See Also

mkann, seas.sum

Examples

# Demonstrate the number of days in each category
ylab <- "Number of days"

barplot(table(mkseas(width="mon", year=2005)),
        main="Number of days in each month",
        ylab=ylab)

barplot(table(mkseas(width="zod", year=2005)),
        main="Number of days in each zodiac sign",
        ylab=ylab)

barplot(table(mkseas(width="DJF", year=2005)),
        main="Number of days in each meteorological season",
        ylab=ylab)

barplot(table(mkseas(width=5, year=2004)),
        main="5-day categories", ylab=ylab)

barplot(table(mkseas(width=11, year=2005)),
        main="11-day categories", ylab=ylab)

barplot(table(mkseas(width=366 / 12, year=2005)),
        main="Number of days in 12-section year",
        sub="Note: not exactly the same as months")

# Application using synthetic data
dat <- data.frame(date=as.Date(paste(2005, 1:365), "%Y %j"),
  value=(-cos(1:365 * 2 * pi / 365) * 10 + rnorm(365) * 3 + 10))
attr(dat$date, "calendar") <- "365_day"

dat$d5 <- mkseas(dat, 5)
dat$d11 <- mkseas(dat, 11)
dat$month <- mkseas(dat, "mon")
dat$DJF <- mkseas(dat, "DJF")

plot(value ~ date, dat)
plot(value ~ d5, dat)
plot(value ~ d11, dat)
plot(value ~ month, dat)
plot(value ~ DJF, dat)

head(dat)

tapply(dat$value, dat$month, mean, na.rm=TRUE)
tapply(dat$value, dat$DJF, mean, na.rm=TRUE)

dat[which.max(dat$value),]
dat[which.min(dat$value),]

# start on a different day
st.day <- as.Date("2000-06-01")

dat$month <- mkseas(dat, "mon", start.day=st.day)
dat$d11 <- mkseas(dat, 11, start.day=st.day)
dat$DJF <- mkseas(dat, "DJF", start.day=st.day)

plot(value ~ d11, dat,
     main=.seasxlab(11, start.day=st.day))
plot(value ~ month, dat,
     main=.seasxlab("mon", start.day=st.day))
plot(value ~ DJF, dat,
     main=.seasxlab("DJF", start.day=st.day))

seas documentation built on May 2, 2022, 5:08 p.m.