R/importUKAQ.R

Defines functions importLocal importNI importWAQN importSAQN importAQE importAURN importUKAQ

Documented in importAQE importAURN importLocal importNI importSAQN importUKAQ importWAQN

#' Import data from the UK Air Pollution Networks
#'
#' Functions for importing air pollution data from a range of UK networks
#' including the Automatic Urban and Rural Network (AURN), the individual
#' England (AQE), Scotland (SAQN), Wales (WAQN) and Northern Ireland (NI)
#' Networks, and many "locally managed" monitoring networks across England.
#' Files are imported from a remote server operated by Ricardo that provides air
#' quality data files as R data objects. For an up to date list of available
#' sites that can be imported, see [importMeta()].
#'
#' @section Importing UK Air Pollution Data:
#'
#'   This family of functions has been written to make it easy to import data
#'   from across several UK air quality networks. Ricardo have provided .RData
#'   files (R workspaces) of all individual sites and years, as well as up to
#'   date meta data. These files are updated on a daily basis. This approach
#'   requires a link to the Internet to work.
#'
#'   There are several advantages over the web portal approach where .csv files
#'   are downloaded.
#'
#'   -  First, it is quick to select a range of sites, pollutants
#'   and periods (see examples below).
#'
#'   -  Second, storing the data as .RData
#'   objects is very efficient as they are about four times smaller than .csv
#'   files --- which means the data downloads quickly and saves bandwidth.
#'
#'   -  Third, the function completely avoids any need for data manipulation or
#'   setting time formats, time zones etc. The function also has the advantage
#'   that the proper site name is imported and used in
#'   [openair][openair-package] functions.
#'
#'   Users should take care if using data from both [openair][openair-package]
#'   *and* web portals (for example, [UK
#'   AIR](https://uk-air.defra.gov.uk/data/)). One key difference is that the
#'   data provided by openair is date *beginning*, whereas the web portal
#'   provides date *ending*. Hourly concentrations may therefore appear offset
#'   by an hour, for example.
#'
#'   The data are imported by stacking sites on top of one another and will have
#'   field names `site`, `code` (the site code) and `pollutant`.
#'
#'   By default, the function returns hourly average data. However, annual,
#'   monthly, daily and 15 minute data (for SO2) can be returned using the
#'   option `data_type`. Annual and monthly data provide whole network
#'   information including data capture statistics.
#'
#'   All units are expressed in mass terms for gaseous species (ug/m3 for NO,
#'   NO2, NOx (as NO2), SO2 and hydrocarbons; and mg/m3 for CO). PM10
#'   concentrations are provided in gravimetric units of ug/m3 or scaled to be
#'   comparable with these units. Over the years a variety of instruments have
#'   been used to measure particulate matter and the technical issues of
#'   measuring PM10 are complex. In recent years the measurements rely on FDMS
#'   (Filter Dynamics Measurement System), which is able to measure the volatile
#'   component of PM. In cases where the FDMS system is in use there will be a
#'   separate volatile component recorded as 'v10' and non-volatile component
#'   'nv10', which is already included in the absolute PM10 measurement. Prior
#'   to the use of FDMS the measurements used TEOM (Tapered Element Oscillating.
#'   Microbalance) and these concentrations have been multiplied by 1.3 to
#'   provide an estimate of the total mass including the volatile fraction.
#'
#'   Some sites report hourly and daily PM10 and / or PM2.5. When `data_type =
#'   "daily"` and there are both hourly and 'proper' daily measurements
#'   available, these will be returned as e.g. "pm2.5" and "gr_pm2.5"; the
#'   former corresponding to data based on original hourly measurements and the
#'   latter corresponding to daily gravimetric measurements.
#'
#'   The function returns modelled hourly values of wind speed (`ws`), wind
#'   direction (`wd`) and ambient temperature (`air_temp`) if available
#'   (generally from around 2010). These values are modelled using the WRF model
#'   operated by Ricardo.
#'
#'   The BAM (Beta-Attenuation Monitor) instruments that have been incorporated
#'   into the network throughout its history have been scaled by 1.3 if they
#'   have a heated inlet (to account for loss of volatile particles) and 0.83 if
#'   they do not have a heated inlet. The few TEOM instruments in the network
#'   after 2008 have been scaled using VCM (Volatile Correction Model) values to
#'   account for the loss of volatile particles. The object of all these scaling
#'   processes is to provide a reasonable degree of comparison between data sets
#'   and with the reference method and to produce a consistent data record over
#'   the operational period of the network, however there may be some
#'   discontinuity in the time series associated with instrument changes.
#'
#'   No corrections have been made to the PM2.5 data. The volatile component of
#'   FDMS PM2.5 (where available) is shown in the 'v2.5' column.
#'
#' @param site Site code of the site to import, e.g., `"my1"` is Marylebone
#'   Road. Site codes can be discovered through the use of [importMeta()].
#'   Several sites can be imported at once. For example, `site = c("my1",
#'   "nott")` imports both Marylebone Road and Nottingham. Sites from different
#'   networks can be imported through also providing multiple `source`s. Site
#'   codes can be upper or lower case.
#' @param year Year(s) to import. To import a series of years use, e.g.,
#'   `2000:2020`. To import several specific years use `year = c(2000, 2010,
#'   2020)`.
#' @param source The network to which the `site`(s) belong, defaulting to
#'   `"aurn"`. Providing a single network will attempt to import all of the
#'   given `site`s from the provided network. Alternatively, a vector of sources
#'   can be provided of the same length as `site` to indicate which network each
#'   `site` individually belongs. Available networks include:
#'   - `"aurn"`,  The UK Automatic Urban and Rural Network.
#'   - `"aqe"`,  The Air Quality England Network.
#'   - `"saqn"`,  The Scottish Air Quality Network.
#'   - `"waqn"`,  The Welsh Air Quality Network.
#'   - `"ni"`,  The Northern Ireland Air Quality Network.
#'   - `"local"`,  Locally managed air quality networks in England.
#' @param data_type The type of data to be returned, defaulting to `"hourly"`
#'   data. Alternative data types include:
#'   - `"daily"`: Daily average data.
#'   - `"monthly"`: Monthly average data with data capture information for the whole network.
#'   - `"annual"`: Annual average data with data capture information for the whole network.
#'   - `"15_min"`: 15-minute average SO2 concentrations.
#'   - `"8_hour"`: 8-hour rolling mean concentrations for O3 and CO.
#'   - `"24_hour"`: 24-hour rolling mean concentrations for particulates.
#'   - `"daily_max_8"`: Maximum daily rolling 8-hour maximum for O3 and CO.
#'   - `"daqi"`: Daily Air Quality Index (DAQI). See
#'   [here](https://uk-air.defra.gov.uk/air-pollution/daqi?view=more-info) for
#'   more details of how the index is defined. Note that this `data_type` is not
#'   available for locally managed monitoring networks.
#' @param pollutant Pollutants to import. If omitted will import all pollutants
#'   from a site. To import only NOx and NO2 for example use \code{pollutant =
#'   c("nox", "no2")}. Pollutant names can be upper or lower case.
#' @param hc Include hydrocarbon measurements in the imported data? Defaults to
#'   `FALSE` as most users will not be interested in using hydrocarbon data.
#' @param meta Append the site type, latitude and longitude of each selected
#'   `site`? Defaults to `FALSE`.
#' @param meteo Append modelled meteorological data, if available? Defaults to
#'   `TRUE`, which will return wind speed (`ws`), wind direction (`wd`) and
#'   ambient temperature (`air_temp`). The variables are calculated from using
#'   the WRF model run by Ricardo Energy & Environment and are available for
#'   most but not all networks. Setting `meteo = FALSE` is useful if you have
#'   other meteorological data to use in preference, for example from
#'   the `worldmet` package.
#' @param ratified Append `qc` column(s) to hourly data indicating whether each
#'   species was ratified (i.e., quality-checked)?  Defaults to `FALSE`.
#' @param to_narrow Return the data in a "narrow"/"long"/"tidy" format? By
#'   default the returned data is "wide" and has a column for each
#'   pollutant/variable. When `to_narrow = TRUE` the data are returned with a
#'   column identifying the pollutant name and a column containing the
#'   corresponding concentration/statistic. Defaults to `FALSE`.
#' @param verbose Print messages to the console if hourly data cannot be
#'   imported? Default is `FALSE`. `TRUE` is useful for debugging as the
#'   specific `year`(s), `site`(s) and `source`(s) which cannot be imported will
#'   be returned.
#' @param progress Show a progress bar when many sites/years are being imported?
#'   Defaults to `TRUE`.
#'
#' @export
#' @return a [tibble][tibble::tibble-package]
#' @author David Carslaw, Trevor Davies, and Jack Davison
#' @family import functions
#' @examples
#' \dontrun{
#' # import a single site from the AURN
#' importUKAQ("my1", year = 2022)
#'
#' # import sites from another network
#' importUKAQ(c("bn1", "bn2"), year = 2022, source = "aqe")
#'
#' # import sites across multiple networks
#' importUKAQ(c("my1", "bn1", "bn2"),
#'   year = 2022,
#'   source = c("aurn", "aqe", "aqe")
#' )
#'
#' # get "long" format hourly data with a ratification flag
#' importUKAQ(
#'   "card",
#'   source = "waqn",
#'   year = 2022,
#'   to_narrow = TRUE,
#'   ratified = TRUE
#' )
#'
#' # import other data types, filtering by pollutant
#' importUKAQ(
#'   data_type = "annual",
#'   pollutant = c("no2", "pm2.5", "pm10"),
#'   source = c("aurn", "aqe")
#' )
#' }
importUKAQ <-
  function(site = "my1",
           year = 2022,
           source = "aurn",
           data_type = "hourly",
           pollutant = "all",
           hc = FALSE,
           meta = FALSE,
           meteo = TRUE,
           ratified = FALSE,
           to_narrow = FALSE,
           verbose = FALSE,
           progress = TRUE) {
    # warn if source == "local"
    if ("local" %in% source) {
      cli::cli_warn(
        c(
          "i" = "'local' data is associated with locally managed air quality network sites in England.",
          "!" = "These sites are not part of the AURN national network, and therefore may not have the same level of quality control applied to them."
        ),
        .frequency = "regularly",
        .frequency_id = "lmam"
      )
    }

    # obtain correct URL info for the source
    url_domain <- dplyr::case_match(
      source,
      "aurn" ~ "https://uk-air.defra.gov.uk/openair/R_data/",
      "aqe" ~ "https://airqualityengland.co.uk/assets/openair/R_data/",
      "saqn" ~ "https://www.scottishairquality.scot/openair/R_data/",
      "waqn" ~ "https://airquality.gov.wales/sites/default/files/openair/R_data/",
      "ni" ~ "https://www.airqualityni.co.uk/openair/R_data/",
      "local" ~ "https://uk-air.defra.gov.uk/openair/LMAM/R_data/"
    )

    url_abbr <- dplyr::case_match(
      source,
      "aurn" ~ "_AURN_",
      "aqe" ~ "_AQE_",
      "saqn" ~ "_SCOT_",
      "waqn" ~ "_WAQ_",
      "ni" ~ "_NI_",
      "local" ~ "_LMAM_"
    )

    # detect allowed types
    allowed_types <- c(
      "hourly",
      "daily",
      "15min",
      "monthly",
      "annual",
      "daqi",
      "15_min",
      "24_hour",
      "8_hour",
      "daily_max_8"
    )

    if (!tolower(data_type) %in% allowed_types) {
      cli::cli_warn(
        c(
          "!" = "'{data_type}' not recognised. Setting {.arg data_type} to 'hourly'",
          "i" = "{.arg data_type} should be one of: {allowed_types}"
        )
      )

      data_type <- "hourly"
    }

    # Import Annual / Monthly Stats?
    if (data_type %in% c("annual", "monthly")) {
      if (missing(site)) {
        site <- "all"
      }

      # create file paths
      source <- unique(source)
      files <- paste0(url_domain, "summary_", data_type, url_abbr)
      files <- unique(files)

      # import statistics
      aq_data <- purrr::pmap(
        list(files, year, source),
        readSummaryData,
        data_type = data_type,
        to_narrow = to_narrow,
        hc = hc,
        .progress = ifelse(progress, "Importing Statistics", FALSE)
      ) %>%
        purrr::list_rbind()
    }

    # Import pre-calculated DAQI?
    if (data_type == "daqi") {
      if ("local" %in% source) {
        cli::cli_abort(c("!" = "{.arg data_type} 'DAQI' is not available for locally managed networks"))
      }
      if (missing(site)) {
        site <- "all"
      }

      # create file paths
      source <- unique(source)
      files <- paste0(url_domain, "annual_DAQI", url_abbr)
      files <- unique(files)

      # import DAQI
      aq_data <-
        purrr::pmap(list(files, year, source),
          readDAQI,
          .progress = ifelse(progress, "Importing DAQI", FALSE)
        ) %>%
        purrr::list_rbind()
    }

    # Import any other stat
    if (!data_type %in% c("annual", "monthly", "daqi")) {
      site <- toupper(site)
      if (length(source) == 1) {
        source <- rep(source, times = length(site))
      } else if (length(source) != length(site)) {
        cli::cli_abort("Length of {.arg source} ({length(source)}) not equal to 1 or length of {.arg site} ({length(site)}).")
      }

      # deal with additional paths needed for local data
      if ("local" %in% source) {
        # get pcodes for file paths
        pcodes <-
          importMeta("local", all = TRUE) %>%
          dplyr::distinct(.data$site, .keep_all = TRUE) %>%
          select("code", "pcode")

        # get sites and pcodes
        site_info <-
          data.frame(
            code = site,
            source = source,
            url_data = url_domain
          ) %>%
          merge(pcodes) %>%
          tidyr::crossing(year = year)
      } else {
        site_info <-
          data.frame(
            code = site,
            source = source,
            url_data = url_domain
          ) %>%
          dplyr::mutate(pcode = rep(NA, times = length(site))) %>%
          tidyr::crossing(year = year)
      }

      aq_data <-
        purrr::pmap(
          .l = site_info,
          .f = purrr::possibly(~ readUKAQData(
            site = ..1,
            lmam_subfolder = ..4,
            year = ..5,
            data_type,
            pollutant = pollutant,
            hc = hc,
            to_narrow = to_narrow,
            source = ..2,
            url_data = ..3,
            verbose = verbose
          ), quiet = !verbose),
          .progress = ifelse(progress, "Importing AQ Data", FALSE)
        ) %>%
        purrr::list_rbind()

      if (nrow(aq_data) == 0) {
        cli::cli_abort("No data returned. Check {.arg site}, {.arg year} and {.arg source}.",
          call = NULL
        )
      }

      if (ratified && data_type == "hourly") {
        aq_data <-
          add_ratified(
            aq_data = aq_data,
            source = source,
            to_narrow = to_narrow
          )
      }
    }

    # filter annual/monthly/DAQI data
    if (data_type %in% c("annual", "monthly", "daqi")) {
      aq_data <-
        filter_site_pollutant(
          aq_data,
          site = site,
          pollutant = pollutant,
          to_narrow = to_narrow,
          data_type = data_type
        )
    }

    # check to see if met data needed
    if (!meteo) {
      aq_data <- aq_data %>%
        select(-any_of(c("ws", "wd", "air_temp")))
    }

    # add meta data?
    if (meta) {
      aq_data <- add_meta(source = source, aq_data)
    }

    # arrange output by source
    aq_data <- dplyr::arrange(aq_data, "source", "site")

    # return data
    return(as_tibble(aq_data))
  }

#' Import data from individual UK Air Pollution Networks
#'
#' These functions act as wrappers for [importUKAQ()] to import air pollution
#' data from a range of UK networks including the Automatic Urban and Rural
#' Network (AURN), the individual England (AQE), Scotland (SAQN), Wales (WAQN)
#' and Northern Ireland (NI) Networks, and many "locally managed" monitoring
#' networks across England. While [importUKAQ()] allows for data to be imported
#' more flexibly, including across multiple monitoring networks, these functions
#' are provided for convenience and back-compatibility.
#'
#' @inheritSection importUKAQ Importing UK Air Pollution Data
#'
#' @param site Site code of the site to import, e.g., `"my1"` is Marylebone
#'   Road. Site codes can be discovered through the use of [importMeta()].
#'   Several sites can be imported at once. For example, `site = c("my1",
#'   "nott")` imports both Marylebone Road and Nottingham.
#'
#' @inheritParams importUKAQ
#' @rdname importUKAQ-wrapper
#' @order 1
#' @family import functions
#' @export
importAURN <-
  function(site = "my1",
           year = 2009,
           data_type = "hourly",
           pollutant = "all",
           hc = FALSE,
           meta = FALSE,
           meteo = TRUE,
           ratified = FALSE,
           to_narrow = FALSE,
           verbose = FALSE,
           progress = TRUE) {
    if (missing(site) & data_type %in% c("annual", "monthly", "daqi")) {
      site <- "all"
    }

    importUKAQ(
      site = site,
      year = year,
      data_type = data_type,
      pollutant = pollutant,
      hc = hc,
      meta = meta,
      meteo = meteo,
      ratified = ratified,
      to_narrow = to_narrow,
      progress = progress,
      verbose = verbose,
      source = "aurn"
    )
  }

#' @rdname importUKAQ-wrapper
#' @order 2
#' @export
importAQE <-
  function(site = "yk13",
           year = 2018,
           data_type = "hourly",
           pollutant = "all",
           meta = FALSE,
           meteo = TRUE,
           ratified = FALSE,
           to_narrow = FALSE,
           verbose = FALSE,
           progress = TRUE) {
    if (missing(site) & data_type %in% c("annual", "monthly", "daqi")) {
      site <- "all"
    }

    importUKAQ(
      site = site,
      year = year,
      data_type = data_type,
      pollutant = pollutant,
      meta = meta,
      meteo = meteo,
      ratified = ratified,
      to_narrow = to_narrow,
      progress = progress,
      verbose = verbose,
      source = "aqe"
    )
  }

#' @rdname importUKAQ-wrapper
#' @order 3
#' @export
importSAQN <-
  function(site = "gla4",
           year = 2009,
           data_type = "hourly",
           pollutant = "all",
           meta = FALSE,
           meteo = TRUE,
           ratified = FALSE,
           to_narrow = FALSE,
           verbose = FALSE,
           progress = TRUE) {
    if (missing(site) & data_type %in% c("annual", "monthly", "daqi")) {
      site <- "all"
    }

    importUKAQ(
      site = site,
      year = year,
      data_type = data_type,
      pollutant = pollutant,
      meta = meta,
      meteo = meteo,
      ratified = ratified,
      to_narrow = to_narrow,
      progress = progress,
      verbose = verbose,
      source = "saqn"
    )
  }

#' @rdname importUKAQ-wrapper
#' @order 4
#' @export
importWAQN <-
  function(site = "card",
           year = 2018,
           data_type = "hourly",
           pollutant = "all",
           meta = FALSE,
           meteo = TRUE,
           ratified = FALSE,
           to_narrow = FALSE,
           verbose = FALSE,
           progress = TRUE) {
    if (missing(site) & data_type %in% c("annual", "monthly", "daqi")) {
      site <- "all"
    }

    importUKAQ(
      site = site,
      year = year,
      data_type = data_type,
      pollutant = pollutant,
      meta = meta,
      meteo = meteo,
      ratified = ratified,
      to_narrow = to_narrow,
      progress = progress,
      verbose = verbose,
      source = "waqn"
    )
  }

#' @rdname importUKAQ-wrapper
#' @order 5
#' @export
importNI <-
  function(site = "bel0",
           year = 2018,
           data_type = "hourly",
           pollutant = "all",
           meta = FALSE,
           meteo = TRUE,
           ratified = FALSE,
           to_narrow = FALSE,
           verbose = FALSE,
           progress = TRUE) {
    if (missing(site) & data_type %in% c("annual", "monthly", "daqi")) {
      site <- "all"
    }

    importUKAQ(
      site = site,
      year = year,
      data_type = data_type,
      pollutant = pollutant,
      meta = meta,
      meteo = meteo,
      ratified = ratified,
      to_narrow = to_narrow,
      progress = progress,
      verbose = verbose,
      source = "ni"
    )
  }

#' @rdname importUKAQ-wrapper
#' @order 6
#' @export
importLocal <-
  function(site = "ad1",
           year = 2018,
           data_type = "hourly",
           pollutant = "all",
           meta = FALSE,
           to_narrow = FALSE,
           verbose = FALSE,
           progress = TRUE) {
    if (missing(site) & data_type %in% c("annual", "monthly", "daqi")) {
      site <- "all"
    }

    importUKAQ(
      site = site,
      year = year,
      data_type = data_type,
      pollutant = pollutant,
      meta = meta,
      meteo = TRUE,
      ratified = FALSE,
      to_narrow = to_narrow,
      progress = progress,
      verbose = verbose,
      source = "local"
    )
  }

Try the openair package in your browser

Any scripts or data that you put into this service are public.

openair documentation built on Oct. 9, 2023, 5:10 p.m.