knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-" )
A collection of utilities. These are functions that are commonly used in R packages I write or that solve some tiny problem in a generic way but are too small to release as their own package. Kind of a dumping ground, actually. I expect that over time functions may migrate out of here to other packages.
Currently there are four main groupings of functions:
futile.logger
packageMost useful here are the character concatenation operators %p%
and %pp%
which act like paste0(x,y) and paste(x,y), respectively.
library("JefferysRUtils") "Paste" %p% "No" %p% "Spaces." "Paste" %pp% "With" %pp% "Spaces." "Paste" %p% "Mixed" %pp% "Spaces." ("Paste" %pp% "across" %pp% "line breaks." %pp% "Requires '()' if operator starts second line.") ( c( "Probably shouldn't", "I wouldn't" ) %pp% c( "use with", "apply to" ) %pp% "vectors." %pp% "But it will work (like nested 'paste()')." )
Functions are provided to support logging to the file and the screen but filtering at different levels. This may be supported directly by futile.logger at some point.
You have to initialize the file and screen loggers before use, then logging in tandem is provided with separate logging commands, one for each level, e.g. sayInfo("Message")
.
Currently only provides a version of the base S3 function merge
that works on lists.
To allow applying a function to file in one step, two apply-like functions are defined:
fileLineApply
applies a supplied function to each line of a file or connection (read as text). Basically implements sapply(readlines(file), FUN, ...)
fileBlockApply
applies a supplied function to a vector of lines from a file or connection. . Basically implements FUN(readlines(file), ...)
Both functions are implemented in a way that makes them work even on large files, possibly files larger than would otherwise fit in memory. Connections are supported so functions can be applied to compressed files or files
being read from URLs. Additionally, a filter
flag is provided to allow
returning values selected by a logical function (or for fileBlockApply
an
index returning function).
For example, a simple file grep that returns lines from a (fake) file connection:
content <- c( "One line", "Two lines.", "", "Four" ) con <- textConnection( content ) fileBlockApply( con, "grep", pattern="line", value=TRUE )
Applying a function by line is often not necessary, may be simpler when a function is complex and may be faster as only requires one file pass. Here I have a function that converts a comma-separated string of key=integer entries into a named vector. I could rewrite the function to apply successive vector operations on a vector if inputs, but it easier to just apply the function.
content <- c( "A=1,B =2, c = 3", "C=4", "B=1,A=", "" ) con <- textConnection( content ) parseKeyValues <- function(x, sep= "\\s*,\\s*", valSep= "\\s*=\\s*") { pairs <- strsplit(x, sep)[[1]] kv <- strsplit(pairs, valSep) values <- as.integer(sapply(kv, `[`, 2)) names(values) <- toupper(sapply(kv, `[`, 1)) return(values) } fileLineApply( con, "parseKeyValues" )
To support large files, these file apply functions read in a file in blocks of chunkSize
lines, keeping only the results after processing a chunk. E.g. this can work if only a few lines are being returned when applying grep() to a very large file. However, the default is set so all lines from a file are read in one chunk as most times files are small enough. Setting some reasonable chunk size given available memory and file line size is needed for large files.
Internally, the results from each block are stored as a list element and are joined together only when results are returned. To get the raw list of results split by block, set unlist=FALSE
. Joining results together is done with
unlist(recursive=FALSE)
, and this may do unexpected things if your results
are matrices or each line is a vector.
Note: Setting chunkSize
to a small number is not something you should do, but it is done in this example for expository purposes.
content <- c( "One line", "Two lines.", "", "Four" ) con <- textConnection( content ) fileBlockApply( con, function (x) { lengths(strsplit(x, "\\s+")) > 1 }, chunkSize= 2 ) # Have to rewind the connection on each for each pass con <- textConnection( content ) fileBlockApply( con, function (x) { lengths(strsplit(x, "\\s+")) > 1 }, chunkSize= 2, unlist=FALSE )
If a function returns an index, it will be relative to the start of each block, not the file.
``` {R Index returned is relative to block}
con <- textConnection( content ) fileBlockApply( con, "grep", pattern= "F" )
con <- textConnection( content ) fileBlockApply( con, "grep", pattern= "F", chunkSize= 2 )
con <- textConnection( content ) fileBlockApply( con, "grep", pattern= "F", chunkSize= 2, unlist= FALSE )
#### Selection with filtering functions It is easy to select based on the result of a logical function by setting `filter= TRUE`. `fileBlockApply` also supports selection by index-returning functions. The block-relative offset is automatically handled, whether or not you keep the block structure ```r content <- c( "One line", "Two lines.", "", "Four" ) # Logical function con <- textConnection( content ) fileLineApply( con, function (x) { lengths(strsplit(x, "\\s+")) > 1 } ) # Filtering by logical function con <- textConnection( content ) fileLineApply( con, function (x) { lengths(strsplit(x, "\\s+")) > 1 }, filter= TRUE ) # Indexes are relative to file chunks con <- textConnection( content ) fileBlockApply( con, "grep", pattern= "F") con <- textConnection( content ) fileBlockApply( con, "grep", pattern= "F", chunkSize= 2) # Can filter by index regardless of file chunking (fileBlockApply only). con <- textConnection( content ) fileBlockApply( con, "grep", pattern= "F", filter= TRUE ) con <- textConnection( content ) fileBlockApply( con, "grep", pattern= "F", chunkSize= 2, filter= TRUE )
If the apply
of a function to a line results in neither a list nor a single element vector, it is probably best to keep the block structure and manually merge results. Unlisting destroys non-list sub-structures like vectors or objects without remorse. It will interact especially problematically with fileLineApply
, which simplifies to arrays when possible. Setting .simplify=FALSE
will preserve per-line structure as a list.
# Returns a vector for each line, simplifies into a matrix. con <- textConnection( content ) fileLineApply( con, function (x) { c(nchar(x), 42) }, chunkSize= 2, unlist=FALSE ) # If don't simplify, get a list of list, one per line in each block con <- textConnection( content ) fileLineApply( con, function (x) { c(nchar(x), 42) }, chunkSize= 2, unlist=FALSE, .simplify = FALSE ) # Unlisting only unlists one level of lists con <- textConnection( content ) fileLineApply( con, function (x) { c(nchar(x), 42) }, chunkSize= 2, .simplify = FALSE ) # However, unlisting the matrix result *does* flatten the matrix con <- textConnection( content ) fileLineApply( con, function (x) { c(nchar(x), 42) }, chunkSize= 2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.