batch_opm: Batch-convert PM data
In opm: Analysing Phenotype Microarray and Growth Curve Data

Description Usage Arguments Details Value References See Also Examples

Batch-convert from OmniLog(R) CSV (or previous opm YAML or JSON) to opm YAML (or JSON). It is possible to add metadata to each set of raw data and to aggregate the curves; these additional data will then be included in the output files.

  batch_opm(names, md.args = NULL, aggr.args = NULL,
    force.aggr = FALSE, disc.args = NULL,
    force.disc = FALSE, gen.iii = opm_opt("gen.iii"),
    force.plate = FALSE, device = "mypdf", dev.args = NULL,
    plot.args = NULL, csv.args = NULL,
    table.args = list(sep = "\t", row.names = FALSE), ...,
    proc = 1L, outdir = "", overwrite = "no",
    output = c("yaml", "json", "csv", "xyplot", "levelplot", "split", "clean"),
    combine.into = NULL, verbose = TRUE, demo = FALSE)

`md.args`	If not `NULL` but a list, passed as arguments to `include_metadata` with the data read from each individual file as additional argument ‘object’. If `NULL`, metadata are not included (but may already be present in the case of YAML input).
`aggr.args`	If not `NULL` but a list, passed as arguments to `do_aggr` with the data read from each individual file as additional argument `object`. If `NULL`, aggregation takes not place (but aggregated data may already be present in case of YAML input).
`force.aggr`	Logical scalar. If `FALSE`, do not aggregate already aggregated data (which can be present in YAML input).
`disc.args`	If not `NULL` but a list, passed as arguments to `do_disc` with the data read from each individual file as additional argument `object`. If `NULL`, discretisation takes not place (but discretised data may already be present in case of YAML input).
`force.disc`	Logical scalar. If `FALSE`, do not discretise already discretised data (which can be present in YAML input).
`force.plate`	Logical scalar passed as `force` argument to `read_opm`.
`device`	Character scalar describing the graphics device used for outputting plots. See `Devices` from the grDevices package and `mypdf` from the pkgutils package for possible values. The extension of the output files is created from the device name after a few adaptations (such as converting `postscript` to `ps`).
`dev.args`	List. Passed as additional arguments to `device`.
`plot.args`	List. Passed as additional arguments to the plotting function used.
`csv.args`	If not `NULL` but a list, used for specifying ways to use `csv_data` entries directly as `metadata`. The list can contain character vectors used for selecting and optionally renaming CSV entries or functions that can be applied to an entire data frame containing all CSV entries. Note that this argument has nothing to do with `csv` output.
`table.args`	Passed to `write.table` from the utils package if `output` is set to `csv`. Do not confuse this with `csv.args`.
`...`	Optional arguments passed to `batch_process` in addition to `verbose` and `demo`. Note that `out.ext`, `fun` and `fun.args` are set automatically. Alternatively, these are parameters passed to `batch_collect`.
`output`	Character scalar determining the main output mode. clean Apply `clean_filenames` from the pkgutils package. csv Create CSV files, by default one per input file. json Create JSON files, by default one per input file. levelplot Create graphics files, by default one per input file, containing the output of `level_plot`. split Split multiple-plate new style or old style CSV files with `split_files`. yaml Create YAML files, by default one per input file. xyplot Create graphics files, by default one per input file, containing the output of `xy_plot`. The `clean` mode might be useful for managing OmniLog(R) CSV files, which can contain a lot of special characters.
`combine.into`	Empty or character scalar modifying the output mode unless it is ‘clean’ or ‘split’. If non-empty, causes the creation of a single output file named per plate type encountered in the input, instead of one per input file (the default). Thus `combine.into` should be given as a template string for `sprintf` from the base package with one placeholder for the plate-type, and without a file extension.
`names`	Character vector with names of files in one of the formats accepted by `read_single_opm`, or names of directories containing such files, or both; or convertible to such a vector. See the `include` argument of `read_opm` and `explode_dir` for how to select subsets from the input files or directories.
`gen.iii`	Logical or character scalar. If `TRUE`, invoke `gen_iii` on each plate. This is automatically done with CSV input if the plate type is given as OTH (which is usually the case for plates run in ID mode). If a character scalar, it is used as the `to` argument of `gen_iii` to set other plate types unless it is empty.
`demo`	Logical scalar. Do not read files, but print a vector with the names of the files that would be (attempted to) read, and return them invisibly?
`proc`	Integer scalar. The number of processes to spawn. Cannot be set to more than 1 core if running under Windows. See the `cores` argument of `do_aggr` for details.
`outdir`	Character vector. Directories in which to place the output files. If empty or only containing empty strings, the directory of each input file is used.
`overwrite`	Character scalar. If ‘yes’, conversion is always tried if `infile` exists and is not empty. If ‘no’, conversion is not tried if `outfile` exists and is not empty. If ‘older’, conversion is tried if `outfile` does not exist or is empty or is older than `infile` (with respect to the modification time).
`verbose`	Logical scalar. Print conversion and success/failure information?

This function is for batch-converting many files; for writing a single object to a YAML file (or string), see to_yaml.

A YAML document can comprise scalars (single values of some type), sequences (ordered collections of some values, without names) and mappings (collections assigning a name to each value), in a variety of combinations (e.g., mappings of sequences). The output of batch_opm is one YAML document per plate which represents a mapping with the following components (key-value pairs):

metadata: Arbitrarily nested mapping of arbitrary metadata entries. Empty if no metadata have been added.
csv_data: Non-nested mapping containing the OmniLog(R) run information read from the input CSV file (character scalars) together with the measurements. The most important entry is most likely the plate type.
measurements: A mapping whose values are sequences of floating-point numbers of the same length and in the appropriate order. The keys are ‘hours’, which refers to the time points, and the well coordinates, ranging between ‘A01’ and ‘H12’.
aggregated: A mapping, only present if curve parameters have been estimated. Its keys correspond to those of ‘measurements’ with the exception of ‘hours’. The values are themselves mappings, whose keys indicate the respective curve parameter and whether this is the point estimate or the upper or lower confidence interval. The values of these secondary mappings are floating-point numbers.
aggr_settings: A mapping, only present if curve parameters have been estimated. Its keys are ‘software’, ‘version’ and ‘options’. The value of the former two is a character scalar. The value of ‘options’ is an arbitrarily nested mapping with arbitrary content.
discretized: A mapping, only present if curve parameters have been estimated and also discretised. Its keys correspond to those of ‘measurements’ with the exception of ‘hours’. The values are logical scalars.
disc_settings: A mapping, only present if curve parameters have been estimated and also discretised. Its keys are ‘software’, ‘version’ and ‘options’. The value of the former two is a character scalar. The value of ‘options’ is an arbitrarily nested mapping with arbitrary content.

Details of the contents should be obvious from the documentation of the classes of the objects from which the YAML output is generated. In the case of YAML input with several plates per file, batch_opm generates YAML output files containing a sequence of mappings as described above, one per plate, to keep a 1:1 relationship between input and output files.

Attempting to generate YAML from input data with a wrong character encoding might cause R to crash or hang. This problem was observed with CSV files that were generated on a distinct operating system and contained special characters such as German umlauts. It is then necessary to explicitly (and correctly) specify the encoding used in these files; see the ‘file.encoding’ option of opm_opt for how to do this.

JSON, which is almost a subset of YAML, can also be generated, but has more restrictions. It is only recommended if a YAML parser is unavailable. It is also more delicate regarding the encoding of character strings.

When inputting YAML files generated with the help of the yaml package (on which the opm implementation is based), or JSON files generated with the help of the rjson package, using other programming languages, a potential problem is that they, and YAML in general, lack a native representation of NA values. Such entries are likely to be misunderstood as ‘NA’ character scalars (if the json package or the yaml package prior to version 2.1.7 are used) or as .na, .na.real, .na.logical or .na.character character scalars (if more recent versions of the yaml package are used). Input functions in other programming languages should conduct according conversions. opm translates these values when converting a list to a OPM object.

See as.data.frame regarding the generated CSV.

The function invisibly returns a matrix which describes each attempted file conversion. See batch_process for details.

http://www.yaml.org/

http://www.json.org/

http://www.biolog.com/

Caspi, R., Billington, R., Ferrer, L., Foerster, H., Fulcher, C. A., Keseler, I. M., Kothari, A., Krummenacker, M., Latendresse, M., Mueller, L. A., Ong, Q., Paley, S., Subhraveti, P., Weaver, D. S., Karp, P. D. 2016 The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Research 44, D471–D480 [opm YAML usage example].

utils::read.csv yaml::yaml.load_file grDevices::Devices

pkgutils::mypdf

Other io-functions: batch_collect, batch_process, collect_template, explode_dir, file_pattern, glob_to_regex, read_opm, read_single_opm, split_files, to_metadata

test.files <- opm_files("omnilog")
if (length(test.files) > 0) { # if the files are found
  num.files <- length(list.files(outdir <- tempdir()))
  x <- batch_opm(test.files[1], outdir = outdir)
  stopifnot(length(list.files(outdir)) == num.files + 1, is.matrix(x))
  stopifnot(file.exists(x[, "outfile"]))
  stopifnot(test.files[1] == x[, "infile"])
  unlink(x[, "outfile"])
} else {
  warning("opm example data files not found")
}