format_cohort_multi: Create consistent, complete multi-year cohorts

Description Usage Arguments Details Value See Also Examples

View source: R/format_cohort.R

Description

Given a vector of cohort labels, create a factor containing levels for the earliest and latest cohorts in x, and for all cohorts in between. All cohorts, with the possible exception of a first "open" cohort, have the same width, which is controlled by the width argument.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
format_cohort_multi(
  x,
  width = 5,
  origin = 2000,
  break_min = NULL,
  open_first = NULL,
  month_start = "Jan",
  label_year_start = TRUE,
  label_open_multi = NULL
)

Arguments

x

A vector of cohort labels.

width

The width, in whole years, of the cohorts to be created. Defaults to 5.

origin

An integer. Defaults to 2000.

break_min

An integer or NULL (the default). If an integer, it is the year in which the oldest cohort begins.

open_first

Whether the oldest cohort has no lower limit.

month_start

An element of month.name, or month.abb. Cohorts start on the first day of this month.

label_year_start

Logical. Whether to label a cohort by the calendar year at the beginning of the cohort or the calendar year at the end. Defaults to TRUE.

label_open_multi

Whether intervals that are open on the left should be interpreted as multi-year or single-year labels.

Details

The elements of x can be single-year labels such as "2020", multi-year labels such as "1950-1960", and intervals that are open on the left, such as "<2000".

As discussed in date_to_cohort_year, single-year labels such as "2000" are ambiguous. Correctly aligning single-year and multi-year cohorts requires knowing which month the single-year cohorts start on, which is controlled by the month_start argument, and whether single-year cohorts are labelled according to the calendar year at the start or end of the cohort, which is controlled by the label_year_start argument.

open_first defaults to TRUE if a value for break_min is supplied, or if any intervals in x are open, and to FALSE otherwise.

The location of the cohorts can be shifted by using different values for origin.

If x contains NA, then the levels of the factor created by format_cohort_multi also contain NA.

There is a (slightly obscure) combination of settings that make an open interval such as "<2010" ambiguous. The settings are

  1. x contains a mix of single-year labels such as "2018" and multi-year labels such as "2020-2025"

  2. month_start is not January.

  3. label_year_start is FALSE.

With these settings, it is unclear whether "<2010" should be treated as a type of single-year label, in which case it refers to the period before "2009-<month_start>-01", or as a type of multi-year label, in which case it refers to the period before "2010-<month_start>-01". Supplying a value for label_open_multi removes the ambiguity. When label_open_multi is TRUE, open intervals interpreted as a type of multi-year label, and when label_open_multi is FALSE, they are interpreted as a type of single-year label.

Value

A factor with the same length as x.

See Also

Other functions for reformating cohort labels are

format_cohort_year format_cohort_custom format_cohort_quarter format_cohort_month

date_to_cohort_year calculates cohorts from dates.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
format_cohort_multi(x = c(2000, 2005, NA, 2004))

format_cohort_multi(x = c("2000", "2005-2010", NA, "1995-1999"))

## contains open interval
format_cohort_multi(x = c("2000", "2005-2010", NA, "<1995"))

## changing the interpretation of the labels results in the
## reclassification of cohort "2000"
format_cohort_multi(x = c(2000, 2005, NA, 2004),
                    month_start = "Jul",
                    label_year_start = FALSE)
format_cohort_multi(x = c("2000", "2005-2010", NA, "1995-1999"),
                    month_start = "Jul",
                    label_year_start = FALSE)

## 'break_min' is higher than the minimum of 'x'
format_cohort_multi(x = c("2000", "2005-2010", NA, "1995-1999"),
                    break_min = 2005)

## 'break_min' is lower then the minimum of 'x'
format_cohort_multi(x = c("2000", "2005-2010", NA, "1995-1999"),
                    break_min = 1990)

## 'break_min' supplied, but 'open_first' is FALSE
format_cohort_multi(x = c("2000", "2005-2010", NA, "1995-1999"),
                    break_min = 1990,
                    open_first = FALSE)

## non-default value for 'width'
format_cohort_multi(x = c("2000", "2005-2010", NA, "1995-1999"),
                    width = 10)

## non-default value for 'origin', to shift labels by one year
format_cohort_multi(x = c("2000", "2005-2010", NA, "1995-1999"),
                    width = 10,
                    origin = 2001)

## supply value for 'label_open_multi' to remove
## ambiguity of the "<2000" label
format_cohort_multi(x = c("2025", "2030-2035", "<2021"),
                    month_start = "Jul",
                    label_year_start = FALSE,
                    label_open_multi = FALSE)

johnrbryant/demprep documentation built on Dec. 31, 2021, 11:58 a.m.