batch_opm: Batch-convert PM data

Description Usage Arguments Details Value References See Also Examples

Description

Batch-convert from OmniLog(R) CSV (or previous opm YAML or JSON) to opm YAML (or JSON). It is possible to add metadata to each set of raw data and to aggregate the curves; these additional data will then be included in the output files.

Usage

1
2
3
4
5
6
7
8
9
  batch_opm(names, md.args = NULL, aggr.args = NULL,
    force.aggr = FALSE, disc.args = NULL,
    force.disc = FALSE, gen.iii = opm_opt("gen.iii"),
    force.plate = FALSE, device = "mypdf", dev.args = NULL,
    plot.args = NULL, csv.args = NULL,
    table.args = list(sep = "\t", row.names = FALSE), ...,
    proc = 1L, outdir = "", overwrite = "no",
    output = c("yaml", "json", "csv", "xyplot", "levelplot", "split", "clean"),
    combine.into = NULL, verbose = TRUE, demo = FALSE)

Arguments

md.args

If not NULL but a list, passed as arguments to include_metadata with the data read from each individual file as additional argument ‘object’. If NULL, metadata are not included (but may already be present in the case of YAML input).

aggr.args

If not NULL but a list, passed as arguments to do_aggr with the data read from each individual file as additional argument object. If NULL, aggregation takes not place (but aggregated data may already be present in case of YAML input).

force.aggr

Logical scalar. If FALSE, do not aggregate already aggregated data (which can be present in YAML input).

disc.args

If not NULL but a list, passed as arguments to do_disc with the data read from each individual file as additional argument object. If NULL, discretisation takes not place (but discretised data may already be present in case of YAML input).

force.disc

Logical scalar. If FALSE, do not discretise already discretised data (which can be present in YAML input).

force.plate

Logical scalar passed as force argument to read_opm.

device

Character scalar describing the graphics device used for outputting plots. See Devices from the grDevices package and mypdf from the pkgutils package for possible values. The extension of the output files is created from the device name after a few adaptations (such as converting postscript to ps).

dev.args

List. Passed as additional arguments to device.

plot.args

List. Passed as additional arguments to the plotting function used.

csv.args

If not NULL but a list, used for specifying ways to use csv_data entries directly as metadata. The list can contain character vectors used for selecting and optionally renaming CSV entries or functions that can be applied to an entire data frame containing all CSV entries. Note that this argument has nothing to do with csv output.

table.args

Passed to write.table from the utils package if output is set to csv. Do not confuse this with csv.args.

...

Optional arguments passed to batch_process in addition to verbose and demo. Note that out.ext, fun and fun.args are set automatically. Alternatively, these are parameters passed to batch_collect.

output

Character scalar determining the main output mode.

clean

Apply clean_filenames from the pkgutils package.

csv

Create CSV files, by default one per input file.

json

Create JSON files, by default one per input file.

levelplot

Create graphics files, by default one per input file, containing the output of level_plot.

split

Split multiple-plate new style or old style CSV files with split_files.

yaml

Create YAML files, by default one per input file.

xyplot

Create graphics files, by default one per input file, containing the output of xy_plot.

The clean mode might be useful for managing OmniLog(R) CSV files, which can contain a lot of special characters.

combine.into

Empty or character scalar modifying the output mode unless it is ‘clean’ or ‘split’. If non-empty, causes the creation of a single output file named per plate type encountered in the input, instead of one per input file (the default). Thus combine.into should be given as a template string for sprintf from the base package with one placeholder for the plate-type, and without a file extension.

names

Character vector with names of files in one of the formats accepted by read_single_opm, or names of directories containing such files, or both; or convertible to such a vector. See the include argument of read_opm and explode_dir for how to select subsets from the input files or directories.

gen.iii

Logical or character scalar. If TRUE, invoke gen_iii on each plate. This is automatically done with CSV input if the plate type is given as OTH (which is usually the case for plates run in ID mode). If a character scalar, it is used as the to argument of gen_iii to set other plate types unless it is empty.

demo

Logical scalar. Do not read files, but print a vector with the names of the files that would be (attempted to) read, and return them invisibly?

proc

Integer scalar. The number of processes to spawn. Cannot be set to more than 1 core if running under Windows. See the cores argument of do_aggr for details.

outdir

Character vector. Directories in which to place the output files. If empty or only containing empty strings, the directory of each input file is used.

overwrite

Character scalar. If ‘yes’, conversion is always tried if infile exists and is not empty. If ‘no’, conversion is not tried if outfile exists and is not empty. If ‘older’, conversion is tried if outfile does not exist or is empty or is older than infile (with respect to the modification time).

verbose

Logical scalar. Print conversion and success/failure information?

Details

This function is for batch-converting many files; for writing a single object to a YAML file (or string), see to_yaml.

A YAML document can comprise scalars (single values of some type), sequences (ordered collections of some values, without names) and mappings (collections assigning a name to each value), in a variety of combinations (e.g., mappings of sequences). The output of batch_opm is one YAML document per plate which represents a mapping with the following components (key-value pairs):

metadata

Arbitrarily nested mapping of arbitrary metadata entries. Empty if no metadata have been added.

csv_data

Non-nested mapping containing the OmniLog(R) run information read from the input CSV file (character scalars) together with the measurements. The most important entry is most likely the plate type.

measurements

A mapping whose values are sequences of floating-point numbers of the same length and in the appropriate order. The keys are ‘hours’, which refers to the time points, and the well coordinates, ranging between ‘A01’ and ‘H12’.

aggregated

A mapping, only present if curve parameters have been estimated. Its keys correspond to those of ‘measurements’ with the exception of ‘hours’. The values are themselves mappings, whose keys indicate the respective curve parameter and whether this is the point estimate or the upper or lower confidence interval. The values of these secondary mappings are floating-point numbers.

aggr_settings

A mapping, only present if curve parameters have been estimated. Its keys are ‘software’, ‘version’ and ‘options’. The value of the former two is a character scalar. The value of ‘options’ is an arbitrarily nested mapping with arbitrary content.

discretized

A mapping, only present if curve parameters have been estimated and also discretised. Its keys correspond to those of ‘measurements’ with the exception of ‘hours’. The values are logical scalars.

disc_settings

A mapping, only present if curve parameters have been estimated and also discretised. Its keys are ‘software’, ‘version’ and ‘options’. The value of the former two is a character scalar. The value of ‘options’ is an arbitrarily nested mapping with arbitrary content.

Details of the contents should be obvious from the documentation of the classes of the objects from which the YAML output is generated. In the case of YAML input with several plates per file, batch_opm generates YAML output files containing a sequence of mappings as described above, one per plate, to keep a 1:1 relationship between input and output files.

Attempting to generate YAML from input data with a wrong character encoding might cause R to crash or hang. This problem was observed with CSV files that were generated on a distinct operating system and contained special characters such as German umlauts. It is then necessary to explicitly (and correctly) specify the encoding used in these files; see the ‘file.encoding’ option of opm_opt for how to do this.

JSON, which is almost a subset of YAML, can also be generated, but has more restrictions. It is only recommended if a YAML parser is unavailable. It is also more delicate regarding the encoding of character strings.

When inputting YAML files generated with the help of the yaml package (on which the opm implementation is based), or JSON files generated with the help of the rjson package, using other programming languages, a potential problem is that they, and YAML in general, lack a native representation of NA values. Such entries are likely to be misunderstood as ‘NA’ character scalars (if the json package or the yaml package prior to version 2.1.7 are used) or as .na, .na.real, .na.logical or .na.character character scalars (if more recent versions of the yaml package are used). Input functions in other programming languages should conduct according conversions. opm translates these values when converting a list to a OPM object.

See as.data.frame regarding the generated CSV.

Value

The function invisibly returns a matrix which describes each attempted file conversion. See batch_process for details.

References

http://www.yaml.org/

http://www.json.org/

http://www.biolog.com/

Caspi, R., Billington, R., Ferrer, L., Foerster, H., Fulcher, C. A., Keseler, I. M., Kothari, A., Krummenacker, M., Latendresse, M., Mueller, L. A., Ong, Q., Paley, S., Subhraveti, P., Weaver, D. S., Karp, P. D. 2016 The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Research 44, D471–D480 [opm YAML usage example].

See Also

utils::read.csv yaml::yaml.load_file grDevices::Devices

pkgutils::mypdf

Other io-functions: batch_collect, batch_process, collect_template, explode_dir, file_pattern, glob_to_regex, read_opm, read_single_opm, split_files, to_metadata

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
test.files <- opm_files("omnilog")
if (length(test.files) > 0) { # if the files are found
  num.files <- length(list.files(outdir <- tempdir()))
  x <- batch_opm(test.files[1], outdir = outdir)
  stopifnot(length(list.files(outdir)) == num.files + 1, is.matrix(x))
  stopifnot(file.exists(x[, "outfile"]))
  stopifnot(test.files[1] == x[, "infile"])
  unlink(x[, "outfile"])
} else {
  warning("opm example data files not found")
}

opm documentation built on May 2, 2019, 6:08 p.m.