getIndex: Get an Index of Available Argo Float Profiles
In argoFloats: Analysis of Oceanographic Argo Floats

getIndex

R Documentation

Get an Index of Available Argo Float Profiles

Description

This function gets an index of available Argo float profiles, typically for later use as the first argument to getProfiles(). The source for the index may be (a) a remote data repository, (b) a local repository (see the keep argument), or (c) a cached RDA file that contains the result of a previous call to getIndex() (see the age parameter).

Usage

getIndex(
  filename = "core",
  server = argoDefaultServer(),
  destdir = argoDefaultDestdir(),
  age = argoDefaultIndexAge(),
  quiet = FALSE,
  keep = FALSE,
  debug = 0L
)

Arguments

`filename`	character value that indicates the file name to be downloaded from a remote server, or (if `server` is set to NULL) the name of a local file. For the remote case, the value of `server` must be taken from the first column of the table given in “Details”, or (for some file types) as in the nickname given in the middle column. Note that the downloaded file name will be based on the full file name given as this argument, and that nicknames are expanded to the full filenames before saving. Note that the downloaded file is in gzipped format (indicated by a file name ending in `.gz`) and it is examined and processed by `getIndex()` to produce an R archive file (ending in `.rda`) that is stored locally. The `.gz` file is discarded by default, unless `keep` is set to TRUE. (See also the documentation on the `server` parameter, next, and the subsection entitled “Using a previously-downloaded index”.)
`server`	an indication of the source for `filename`. There are 2 possibilities for this. (1) If `server` is `NULL`, then `filename` is taken to be the name of a local index file (ending in suffix `.gz`) that was previously downloaded from a server. The easiest way to get such a file is with a previous call to `getIndex()` with `keep` set to TRUE. (2) If `server` is a character vector (as is it is by default), it is taken to represent remote servers to be tried as sources for an index file. The use of multiple servers is a way to avoid errors that can result if a server refuses a download request. As of March 2023, the three servers known to work are `"https://data-argo.ifremer.fr"`, `"ftp://ftp.ifremer.fr/ifremer/argo"` and `"ftp://usgodae.org/pub/outgoing/argo"`. These may be referred to with nicknames `"ifremer-https"`, `"ifremer"`and `"usgodae"`. Any URL that can be used in `curl::curl_download()` is a valid value provided that the file structure is identical to the mirrors listed above. See `argoDefaultServer()` for how to provide a default value for `server`.
`destdir`	character value indicating the directory in which to store downloaded files. The default value is to compute this using `argoDefaultDestdir()`, which returns `⁠~/data/argo⁠` by default, although it also provides ways to set other values using `options()`. Set `destdir=NULL` if `destfile` is a filename with full path information. File clutter is reduced by creating a top-level directory called `data`, with subdirectories for various file types; see “Examples”.
`age`	numeric value indicating how old a downloaded file must be (in days), for it to be considered out-of-date. The default, `argoDefaultIndexAge()`, limits downloads to once per day, as a way to avoid slowing down a workflow with a download that might take a minute or so. Setting `age=0` will force a new download, regardless of the age of the local file, and that age is changed to 0 if `keep` is `TRUE`. The value of `age` is ignored if `server` is NULL (see “Using a previously downloaded Index” in “Details”).
`quiet`	logical value indicating whether to silence some progress indicators. The default is to show such indicators.
`keep`	logical value indicating whether to retain the raw index file as downloaded from the server. This is `FALSE` by default, indicating that the raw index file is to be discarded once it has been analyzed and used to create a cached file (which is an RDA file). Note that if `keep` is `TRUE`, then the supplied value of `age` is converted to 0, to force a new download.
`debug`	integer value indicating level of debugging. If this is less than 1, no debugging is done. Otherwise, some functions will print debugging information. If a function call fails, the first step should be to rerun the function with `debug=1`, to see if the output suggests a problem in the call.

Details

Using an index from a remote server

The first step is to construct a URL for downloading, based on the url and file arguments. That URL will be a string ending in .gz, or .txt and from this the name of a local file is constructed by changing the suffix to .rda and prepending the file directory specified by destdir. If an .rda file of that name already exists, and is less than age days old, then no downloading takes place. This caching procedure is a way to save time, because the download can take from a minute to an hour, depending on the bandwidth of the connection to the server.

The resultant .rda file, which is named in the return value of this function, holds a list named index that holds following elements:

ftpRoot, the FTP root stored in the header of the source file (see next paragraph).
server, the URL at which the index was found, and from which getProfiles() can construct URLs from which to download the NetCDF files for individual float profiles.
filename, the argument provided here.
header, the preliminary lines in the source file that start with the ⁠#⁠ character.
data, a data frame containing the items in the source file. The names of these items are determined automatically from "core","bgcargo", "synthetic" files.

Some expertise is required in deciding on the value for the file argument to getIndex(). As of March 2023, the FTP sites ⁠ftp://usgodae.org/pub/outgoing/argo⁠ and ⁠ftp://ftp.ifremer.fr/ifremer/argo⁠ contain multiple index files, as listed in the left-hand column of the following table. The middle column lists nicknames for some of the files. These can be provided as the file argument, as alternatives to the full names. The right-hand column describes the file contents. Note that the servers also provide files with names similar to those given in the table, but ending in .txt. These are uncompressed equivalents of the .gz files that offer no advantage and take longer to download, so getIndex() is not designed to work with them.

File Name	Nickname	Contents
`ar_greylist.txt`	-	Suspicious/malfunctioning floats
`ar_index_global_meta.txt.gz`	-	Metadata files
`ar_index_global_prof.txt.gz`	`"argo"` or `"core"`	Argo data
`ar_index_global_tech.txt.gz`	-	Technical files
`ar_index_global_traj.txt.gz`	`"traj"`	Trajectory files
`argo_bio-profile_index.txt.gz`	`"bgc"` or `"bgcargo"`	Biogeochemical Argo data (without S or T)
`argo_bio-traj_index.txt.gz`	`"bio-traj"`	Bio-trajectory files
`argo_synthetic-profile_index.txt.gz`	`"synthetic"`	Synthetic data, successor to `"merge"`

Using a previously downloaded index

In some situations, it can be desirable to work with local index file that has been copied directly from a remote server. This can be useful if there is a desire to work with the files in R separately from the argoFloats package, or with python, etc. It can also be useful for group work, in which it is important for all participants to use the same source file.

This need can be handled with getIndex(), by specifying filename as the full path name to the previously downloaded file, and at the same time specifying server as NULL. This works for both the raw files as downloaded from the server (which end in .gz, and for the R-data-archive files produced by getIndex(), which end in .rda. Since the .rda files load an order of magnitude faster than the .gz files, this is usually the preferred approach. However, if the .gz files are preferred, perhaps because part of a software chain uses python code that works with such files, then it should be noted that calling getIndex() with keep=TRUE will save the .gz file in the destdir directory.

Value

An object of class argoFloats with type="index", which is suitable as the first argument of getProfiles().

Author(s)

Dan Kelley and Jaimie Harbin

References

Kelley, D. E., Harbin, J., & Richards, C. (2021). argoFloats: An R package for analyzing Argo data. Frontiers in Marine Science, (8), 636922. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.3389/fmars.2021.635922")}

argoFloats documentation built on Oct. 18, 2023, 1:06 a.m.