Description Usage Arguments Details Value See Also Examples
Batch-collect information from a series of input files or
batch-convert data from input files to data in output
files. Alternatively, turn a mixed file/directory list
into a list of files or create a regular expression
matching certain file extensions, or convert a wildcard
pattern to a regular expression, or split files. These
functions are not normally directly called by an
opm user but by the other IO functions of the
package such as collect_template
or
batch_opm
. One can use their demo
argument directly for testing the results of the applied
file name patterns.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | explode_dir(names, include = NULL, exclude = NULL,
ignore.case = TRUE, wildcard = TRUE, recursive = TRUE,
missing.error = TRUE, remove.dups = TRUE)
batch_collect(names, fun, fun.args = list(), proc = 1L,
..., use.names = TRUE, simplify = FALSE, demo = FALSE)
batch_process(names, out.ext, io.fun, fun.args = list(),
proc = 1L, outdir = NULL,
overwrite = c("yes", "older", "no"), in.ext = "any",
compressed = TRUE,
literally = inherits(in.ext, "AsIs"), ...,
verbose = TRUE, demo = FALSE)
file_pattern(type = c("both", "csv", "yaml", "json", "yorj", "lims", "nolims",
"any", "empty"),
compressed = TRUE, literally = inherits(type, "AsIs"))
split_files(files, pattern, outdir = "", demo = FALSE,
single = TRUE, wildcard = FALSE, invert = FALSE,
include = TRUE, format = opm_opt("file.split.tmpl"),
compressed = TRUE, ...)
glob_to_regex(object)
## S3 method for class 'character'
glob_to_regex(object)
## S3 method for class 'factor'
glob_to_regex(object)
|
names |
Character vector containing file names or directories, or convertible to such. |
object |
Character vector or factor. |
include |
If a character scalar, used as regular
expression or wildcard (see the For |
exclude |
Like |
ignore.case |
Logical scalar. Ignore differences
between uppercase and lowercase when using |
wildcard |
Logical scalar. Are |
recursive |
Logical scalar. Traverse directories
recursively and also consider all subdirectories? See
|
missing.error |
Logical scalar. If a file/directory does not exist, raise an error or only a warning? |
remove.dups |
Logical scalar. Remove duplicates from
|
fun |
Collecting function. Should use the file name as first argument. |
fun.args |
Optional list of arguments to |
... |
Optional further arguments passed from
|
proc |
Integer scalar. The number of processes to
spawn. Cannot be set to more than 1 core if running under
Windows. See the |
simplify |
Logical scalar. Should the resulting list be simplified to a vector or matrix if possible? |
use.names |
Logical scalar. Should |
out.ext |
Character scalar. The extension of the output file names (without the dot). |
outdir |
Character vector. Directories in which to place the output files. If empty or only containing empty strings, the directory of each input file is used. |
in.ext |
Character scalar. Passed through
|
type |
Character scalar indicating the file types to
be matched by extension. For historical reasons,
both means either CSV or YAML
or JSON, whereas yorj means either
YAML or JSON. CSV also
includes the LIMS CSV format
introduced in 2014, which can be specifically selected
using lims or excluded using nolims.
Alternatively, directly the extension or extensions, or a
list of file names (not |
compressed |
Logical scalar. Shall compressed files
also be matched? This affects the returned pattern as
well as the pattern used for extracting file extensions
from complete file names (if
|
literally |
Logical scalar. Interpret |
demo |
Logical scalar. In the case of
For |
files |
Character vector or convertible to such.
Names of the files to be split. In contrast to functions
such as |
pattern |
Regular expression or shell globbing
pattern for matching the separator lines if Conceptually each of the sections into which a file is split comprises a separator line followed by non-separator lines. That is, separator lines followed by another separator line are ignored. Non-separator lines not preceded by a separator line are treated as a section of their own, however. |
single |
Logical scalar. If there is only one group per file, i.e. only one output file would result from the splitting, create this file anyway? Such cases could be recognised by empty character vectors as values of the returned list (see below). |
invert |
Logical scalar. Invert pattern matching,
i.e. treat all lines that not match
|
format |
Character scalar determining the output
file name format. It is passed to
Getting |
io.fun |
Conversion function. Should accept
|
overwrite |
Character scalar. If ‘yes’,
conversion is always tried if |
verbose |
Logical scalar. Print conversion and success/failure information? |
Other functions that call explode_dir
have a
demo
argument which, if set to TRUE
, caused
the respective function to do no real work but print the
names of the files that it would process in normal
running mode.
glob_to_regex
changes a shell globbing wildcard
into a regular expression. This is just a slightly
extended version of glob2rx
from the utils
package, but more conversion steps might need to be added
here in the future. Particularly
explode_dir
and the IO functions calling
that function internally use glob_to_regex
. Some
hints when using globbing patterns are given in the
following.
The here used globbing search patterns contain only two special characters, ‘?’ and ‘*’, and are thus more easy to master than regular expressions. ‘?’ matches a single arbitrary character, whereas ‘*’ matches zero to an arbitrary number of arbitrary characters. Some examples:
Matches abc
, axc
, a c
etc. but not abbc
, abbbc
, ac
etc.
Matches abc
, abbc
, ac
etc. but not abd
etc.
Matches
abc
, abcdefg
, abXYZ
etc. but not
acdefg
etc.
Matches abc
,
Xbc
, bc
etc. but not aabc
,
abbc
, bc
etc.
Despite their simplicity, globbing patterns are often sufficient for selecting file names.
split_files
subdivides each file into sections
which are written individually to newly generated files.
Sections are determined with patterns that match the
start of a section. This function might be useful for
splitting
OmniLog(R)
multiple-plate CSV files before inputting them
with read_opm
, even though that function
could also input such files directly. It is used in one
of the running modes of by batch_opm
for
splitting files. See also the ‘Examples’. The
newly generated files are numbered accordingly; they are
not named after any csv_data
entry
because there is no guarantee that it is present.
explode_dir
returns a character vector (which
would be empty if all existing files, if any, had been
unselected).
batch_collect
returns a list, potentially
simplified to a vector, depending on the output of
fun
and the value of simplify
. See also
demo
.
In normal mode, batch_process
creates an invisibly
returned character matrix in which each row corresponds
to a named character vector with the keys infile
,
outfile
, before
and after
. The
latter two describe the result of the action(s) before
and after attempting to convert infile
to
outfile
. The after
entry is the empty
string if no conversion was tried (see overwrite
),
ok
if conversion was successful and a message
describing the problems otherwise. For the results of the
demo
mode see above.
file_pattern
yields a character scalar, holding a
regular expression. glob_to_regex
yields a vector
of regular expressions.
split_files
creates a list of character vectors,
each vector containing the names of the newly generated
files. The names of the list are the input file names.
The list is returned invisibly.
base::list.files base::Sys.glob utils::glob2rx base::regex base::split base::strsplit base::file.rename
Other io-functions: batch_opm
,
collect_template
, read_opm
,
read_single_opm
, to_metadata
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | # explode_dir()
# Example with temporary directory
td <- tempdir()
tf <- tempfile()
(x <- explode_dir(td))
write(letters, tf)
(y <- explode_dir(td))
stopifnot(length(y) > length(x))
unlink(tf)
(y <- explode_dir(td))
stopifnot(length(y) == length(x))
# Example with R installation directory
(x <- explode_dir(R.home(), include = "*/doc/html/*"))
(y <- explode_dir(R.home(), include = "*/doc/html/*", exclude = "*.html"))
stopifnot(length(x) == 0L || length(x) > length(y))
# batch_collect()
# Read the first line from each of the OPM test data set files
f <- opm_files("testdata")
if (length(f) > 0) { # if the files are found
x <- batch_collect(f, fun = readLines, fun.args = list(n = 1L))
# yields a list with the input files as names and the result from each
# file as values (exactly one line)
stopifnot(is.list(x), identical(names(x), f))
stopifnot(sapply(x, is.character), sapply(x, length) == 1L)
} else {
warning("test files not found")
}
# For serious tasks, consider to first try the function in 'demo' mode.
# batch_process()
# Read the first line from each of the OPM test data set files and store it
# in temporary files
pf <- function(infile, outfile) write(readLines(infile, n = 1), outfile)
infiles <- opm_files("testdata")
if (length(infiles) > 0) { # if the files are found
x <- batch_process(infiles, out.ext = "tmp", io.fun = pf,
outdir = tempdir())
stopifnot(is.matrix(x), identical(x[, 1], infiles))
stopifnot(file.exists(x[, 2]))
unlink(x[, 2])
} else {
warning("test files not found")
}
# For serious tasks, consider to first try the function in 'demo' mode.
# file_pattern()
(x <- file_pattern())
(y <- file_pattern(type = "csv", compressed = FALSE))
stopifnot(nchar(x) > nchar(y))
# constructing pattern from existing files
(files <- list.files(pattern = "[.]"))
(x <- file_pattern(I(files))) # I() causes 'literally' to be TRUE
stopifnot(grepl(x, files, ignore.case = TRUE))
# glob_to_regex()
x <- "*what glob2rx() can't handle because a '+' is included*"
(y <- glob_to_regex(x))
(z <- glob2rx(x))
stopifnot(!identical(y, z))
# Factor method
(z <- glob_to_regex(as.factor(x)))
stopifnot(identical(as.factor(y), z))
## split_files()
# Splitting an old-style CSV file containing several plates
(x <- opm_files("multiple"))
if (length(x) > 0) {
outdir <- tempdir()
# For old-style CSV, use either "^Data File" as pattern or "Data File*"
# with 'wildcard' set to TRUE:
(result <- split_files(x, pattern = "^Data File", outdir = outdir))
stopifnot(is.list(result), length(result) == length(x))
stopifnot(sapply(result, length) == 3)
result <- unlist(result)
stopifnot(file.exists(result))
unlink(result) # tidy up
} else {
warning("opm example files not found")
}
## One could split new-style CSV as follows (if x is a vector of file names):
# split_files(x, pattern = '^"Data File",')
## note the correct setting of the quotes
## A pattern that covers both old and new-style CSV is:
# split_files(x, pattern = '^("Data File",|Data File)')
## This is used by batch_opm() in 'split' mode any by the 'run_opm.R' script
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.