R/births.birth_via_cesarean.R

#' Birth via Cesarean
#'
#' A logical column indicating whether the birth delivered via cesarean section.
#'
#' Data on the use of cesarean section not available until 1989.
#'
#' This column is derived from a variety of different columns as the data set has changed over time.
#' From 1989 to 2010 a pair of fields named UME_PRIMC and UME_REPEC gave Yes/No answers which
#' indicated whether the birth occurred via primary (i.e. the mother had none previously), or repeat
#' cesarean section. From 2004 forward, a field named ME_ROUT was introduced, which identifies
#' whether the birth occurred via spontaneous vaginal delivery, forceps assist", vacuum assist, or
#' cesarean section. From 2005 forward, a field named DMETH_REC which recodes the delivery method to
#' a simpler "vaginal", or "cesarean" value set was introduced. Some or all of the columns are
#' populated for some records throughout the years when they exist, so the calculation of our column
#' requires an evaluation for all of these fields.
#'
#' Due to the complexity of source data that are used to calculate this field, additional reference
#' data sets for cesarean section proportions in each year are included, and used as a check against
#' this column for quality assurance. See the code included in the examples section.
#'
#' @section Data Quality Tests:
#'
#' This column is tested for the following quality assumptions prior to packaging:
#' \enumerate{
#'   \item non-NA values exist in each year after 1988
#' }
#'
#' @examples
#' b = copy(births)
#' ext.birth_year(b)
#' library(dplyr)
#' library(ggplot2)
#' bagg = b %>%
#'   group_by(birth_year, birth_via_cesarean) %>%
#'   summarize(cases = sum(cases)) %>%
#'   group_by(birth_year) %>%
#'   mutate(
#'     birth_via_cesarean = coalesce(as.character(birth_via_cesarean), 'NA'),
#'     method = ordered(
#'       birth_via_cesarean,
#'       levels=c('FALSE','NA','TRUE'),
#'       labels=c('vaginal', 'unknown', 'cesarean')
#'     ),
#'     proportion = cases / sum(cases)
#'   )
#'
#' ref = bind_rows(list(
#'   transmute(HHS_cesarean_1989, y=Year, p=AllAges, dataset='HHS_cesarean_1989'),
#'   transmute(HHS_cesarean_1996, y=Year, p=AllAges, dataset='HHS_cesarean_1996'),
#'   transmute(CDC_cesarean_2013, y=Year, p=TotalCesareanRate, dataset='CDC_cesarean_2013')
#' ))
#'
#' ggplot() +
#'   geom_bar(data=bagg, aes(birth_year, proportion, fill=method), stat='identity') +
#'   geom_point(data=ref, aes(x=y, y=p, shape=dataset, color=dataset), alpha=0.6) +
#'   scale_y_continuous(label=scales::percent) +
#'   scale_color_manual(values = c(
#'     HHS_cesarean_1989="blue", HHS_cesarean_1996="red", CDC_cesarean_2013="black"
#'   )) +
#'   ggtitle("Delivery method proportion by year",
#'           paste(sep="\n",
#'                 "stacked bars show proportions of delivery method as indicated by the births",
#'                 "dataset. Points show reference aggregates of cesarean section proportions as",
#'                 "calculated from granular data that are not publicly available.",
#'                 "Each reference dataset is included in this package under the same name."
#'           )
#'   )
#'
#' @return a \code{\link{logical}} column
#' @seealso \code{\link{births}}
#' @family births-column
#' @name birth_via_cesarean
NULL
Mikuana/vitalstatistics documentation built on May 7, 2019, 4:57 p.m.