read_forecast: Read forecast data from multiple files

View source: R/read_forecast.R

read_forecastR Documentation

Read forecast data from multiple files

Description

read_forecast generates file names, based on the arguments given, reads data from them, and optionally performs a transformation on those data. By default the function returns nothing due to the large volumes of data that may be read from the files. Optionally, the read data can be returned to the calling environment, and / or written to files.

Usage

read_forecast(
  dttm,
  fcst_model,
  parameter,
  lead_time = seq(0, 48, 3),
  members = NULL,
  members_out = members,
  lags = NULL,
  vertical_coordinate = c("pressure", "model", "height", NA),
  file_path = getwd(),
  file_format = NULL,
  file_template = "vfld",
  file_format_opts = list(),
  transformation = c("none", "interpolate", "regrid", "xsection", "subgrid"),
  transformation_opts = NULL,
  param_defs = get("harp_params"),
  output_file_opts = sqlite_opts(),
  return_data = FALSE,
  merge_lags = TRUE,
  show_progress = TRUE,
  stop_on_fail = FALSE,
  is_forecast = TRUE,
  start_date = NULL,
  end_date = NULL,
  by = "6h"
)

Arguments

dttm

A vector of date time strings to read. Can be in YYYYMMDD, YYYYMMDDhh, YYYYMMDDhhmm, or YYYYMMDDhhmmss format. Can be numeric or character. seq_dttm can be used to generate a vector of equally spaced date-time strings.

fcst_model

The name of the forecast model(s) to read. Can be expressed as character vector if more than one model is wanted, or a named list of character vectors for a mutlimodel ensemble.

parameter

The name of the forecast parameter(s) to read from the files. Should either be harp parameter names (see show_param_defs), or in the case of netcdf files can be the name of the parameters in the files. If reading from vfld files, set to NULL to read all parameters.

lead_time

The lead times to read in. If a numeric vector is passed, the lead times are assumed to be in hours. Otherwise a character vector may be passed with a letter after each value to denote the time units: d = days, h = hours, m = minutes, s = seconds.

members

For ensemble forecasts, a numeric vector giving the member numbers to read. If more than one forecast model is to be read in, the members may be given as a single vector, in which case they are recycled for each forecast model, or as a named list, with the forecast models (as given in fcst_model) as the names. For multimodel ensembles this would be a named list of named lists. If file names do not include the ensemble member, i.e. all members are in the same file, setting members to NULL will read all members from the files.

members_out

If the members are to renumbered on output, the new member numbers are given in members_out. Should have the same structure as members.

lags

A named list of members of an ensemble forecast model that are lagged and the amount by which they are lagged. The list names are the names of those forecast models, as given in fcst_model that have lagged members, and the lags are given as vectors that are the same length as the members vector. If the lags are numeric, it is assumed that they are in hours, but the units may be specified with a letter after each value where d = days, h = hours, m = minutes and s = seconds. lags is primarily used to generate the correct file names for lagged members - for example a lag of 1 hour will generate a file name with a date-time 1 hour earlier than the date-time in the sequence (start_data, end_date, by = by) and a lead time 1 hour longer.

vertical_coordinate

For upper air data to be read the vertical coordinate in the files must be given. By default, this is "pressure", but may also be "height" or "model" for model levels. If reading from vfld files, set to NA to only read surface parameters.

file_path

The parent path to all forecast data. All file names are generated to be under the file_path directory. The default is the current working directory.

file_format

The format of files to read. harpIO includes functions to read 'vfld', 'netcdf', 'grib' and 'fa' format files. If set to NULL, an attempt will be made to guess the format of the files. However, you may write your own functions called read_<file_format> function and read_forecast will attempt to use that instead. See the vignette on writing read functions for more information.

file_template

A template for the file names. For available built in templates see show_file_templates. If anything else is passed, it is returned unmodified, or with substitutions made for dynamic values. Available substitutions are YYYY for year, {MM} for 2 digit month with leading zero, {M} for month with no leading zero, and similarly {DD} or {D} for day, {HH} or {H} for hour, {mm} or {m} for minute. Also {LDTx} for lead time and {MBRx} where x is the length of the string including leading zeros. Note that the full path to the file will always be file_path/template.

file_format_opts

A list of options specific to the file format. For netcdf this can be generated by netcdf_opts and for grib by grib_opts.

transformation

The transformation to apply to the data once read in. "none" will return the data in its original form, "interpolate" will interpolate to points at latitudes and longitudes supplied in transformation_opts, "regrid" will regrid the data to a new domain given in transformation_opts, and "xsection" will interpolate to a vertical cross sectoin betweem two points given in transformation_opts.

transformation_opts

Options for the transformation. For transformation = "interpolate" these can be generated by interpolate_opts, for transformation = "regrid" these can be generated by regrid_opts, and transformation = "xsection" these can be generated by xsection_opts.

param_defs

A list of parameter definitions that includes the file format to be read. By default the built in list harp_params is used. Modifications and additions to this list can be made using modify_param_def and add_param_def respectively.

output_file_opts

Options for output files. read_forecast can output data transformation = "interpolate" as sqlite files. The options for the sqlite files can be set with sqlite_opts. Most inportantly, the path argument in link{sqlite_opts} must not be NULL for data to be output to sqlite files.

return_data

By default read_forecast does not return any data, since many GB of data could be read in. Set to TRUE to return the data read in to the global environment.

merge_lags

Logical. Whether to merge the lagged members when return_data = TRUE (the default). When TRUE, the forecast time and lead time for the lagged members are adjusted to fit with the unlagged members.

show_progress

Some files may contain a lot of data. Set to TRUE to show progress when reading these files.

stop_on_fail

Logical. Set to TRUE to make execution stop if there are problems reading a file. Missing files are always skipped regardless of this setting. The default value is FALSE.

is_forecast

Logical. When TRUE (the default), data are read on the basis if the forecast initialization time (from dttm) and lead_times. When FALSE the date-times from dttm are used to choose what data to read and lead_times is ignored. This is useful for analysis data where many dates are in the same file. read_analysis also provides this functionality.

start_date, end_date, by

[Deprecated] The use of start_date, end_date and by is no longer supported. dttm together with seq_dttm should be used to generate equally spaced date-times.

Value

When return_date = TRUE, a harp_fcst object.

Examples

if (requireNamespace("harpData", quietly = TRUE)) {

  # Read all parameters from vfld files for a deterministic model
  read_forecast(
    start_date  = 2019021700,
    end_date    = 2019021718,
    fcst_model  = "AROME_Arctic_prod",
    file_path   = system.file("vfld", package = "harpData"),
    return_data = TRUE
  )

  # Ensure height corrections to 2m temperature are done and keep the
  # uncorrected data
  read_forecast(
    start_date          = 2019021700,
    end_date            = 2019021718,
    fcst_model          = "AROME_Arctic_prod",
    file_path           = system.file("vfld", package = "harpData"),
    transformation_opts = interpolate_opts(
      correct_t2m    = TRUE,
      keep_model_t2m = TRUE
    ),
    return_data = TRUE
  )

  # Read 10m wind speed from the MEPS_prod ensemble
  read_forecast(
    start_date    = 2019021700,
    end_date      = 2019021718,
    fcst_model    = "MEPS_prod",
    parameter     = "S10m",
    lead_time     = seq(0, 12, 3),
    members       = seq(0, 10),
    file_path     = system.file("vfld", package = "harpData"),
    file_template = "vfld_eps",
    return_data   = TRUE
  )

  # Read vertical profiles of temperature and dewpoint temperature
  read_forecast(
    start_date    = 2019021700,
    end_date      = 2019021718,
    fcst_model    = "MEPS_prod",
    parameter     = c("T", "Td"),
    lead_time     = seq(0, 12, 3),
    members       = seq(0, 10),
    file_path     = system.file("vfld", package = "harpData"),
    file_template = "vfld_eps",
    return_data   = TRUE
  )

  # Read ensemble data from MEPS_prod and lagged ensemble data from
  # CMEPS_prod
  read_forecast(
    start_date    = 2019021700,
    end_date      = 2019021718,
    fcst_model    = c("MEPS_prod", "CMEPS_prod"),
    parameter     = c("T", "Td"),
    lead_time     = seq(0, 12, 3),
    members       = list(
      MEPS_prod = seq(0, 10),
      CMEPS_prod = c(0, 1, 3, 4, 5, 6)
    ),
    lags          = list(CMEPS_prod = c(0, 0, 2, 2, 1, 1)),
    file_path     = system.file("vfld", package = "harpData"),
    file_template = "vfld_eps",
    return_data   = TRUE
  )
}

andrew-MET/harpIO documentation built on March 7, 2024, 7:48 p.m.