cutYears: Transform a vector of year values into an ordered factor of...

cutYearsR Documentation

Transform a vector of year values into an ordered factor of year groups.

Description

This function is a wrapper around \linkIntcut. Given a vector of strings or integers that represent years, and a vector of breakpoints, it returns a factor in which each level represents a group of years. Unlike cut(), it returns pretty labels for the levels: "1975-79" instead of "[1975,1980)", and so on. Also unlike cut(), it ensures that all of the data are accounted for in the levels of the factor that it creates: data will never be dropped from a factor that cutYears() returns.

Usage

cutYears(x, breaks, levelsBoundedByData = TRUE, shortLabels = TRUE)

Arguments

x

Vector of four-digit integers, or of four-character strings that can be converted to integers, e.g., "1900".

breaks

Numeric vector of cutpoints

levelsBoundedByData

Logical. Ensures that the lowest and highest levels of the returned factor will contain some data. Also ensures that the label for the highest factor level reports the maximum year in x, rather than a higher year.

shortLabels

Logical. If FALSE, the second year in each label will always have four digits: for example, "1975-1999". If TRUE (the default), the second year in each label will typically have two digits: for example, "1975-99". But even if shortLabels is TRUE, the second year in a label will have four digits if it isn't in the same century as the first year. For example, cutYears() will always produce a label like "1975-2001" instead of "1975-01".

Details

By default, cutYears() differs from cut() in the following ways:

  • Accepts only x vectors in which every value has four characters or four digits.

  • Returns a factor that has better labels for groups of years: for example, "1975-80" rather than "[1975,1980)".

  • Returns factor levels that encompass all values of x. Consequently, cutYears() will never convert year values to NA, as cut() will often do.

  • Returns an ordered factor by default.

  • By default, cutYears() drops levels that are outside the bounds of x. For example, if x ranges from 1975 to 1985, the factor returned by cut() may have an infinite number of levels, including, say, "(1900-1905]". (The exact levels returns by cut() depend on the arguments passed to it, especially the breaks argument.) But in a case like this, the lowest factor level returned by cutYears() will include 1975, and the highest factor level returned by cutYears() will contain 1985.

See Also

\linkInt

cut, Hmisc::cut2()

Examples

years <- rep(1975:1993, each = 3)
fac1a <- cut(     years, breaks = seq(1975, 1993, by = 3))
fac1b <- cutYears(years, breaks = seq(1975, 1993, by = 3))
fac1c <- cutYears(years, breaks = seq(1975, 1993, by = 3), shortLabels = FALSE)
 
table(fac1a)
table(fac1b)
table(fac1c)

fac2a <- cut(     years, breaks = seq(1975, 1990, by = 3))
fac2b <- cutYears(years, breaks = seq(1975, 1990, by = 3))
table(fac2a)
table(fac2b)

fac3a <- cut(     years, breaks = seq(1955, 1990, by = 3))
fac3b <- cutYears(years, breaks = seq(1955, 1990, by = 3))
table(fac3a)
table(fac3b)

jbullock35/Bullock documentation built on April 1, 2022, 6:21 p.m.