splitToDF | R Documentation |
Read a dataframe from a character vector, using a regular expression with named fields to extract values from matching items. The named fields become columns in the result, and each matching item in the input yields a row in the result. FIXME (Eventually): when the stringi package regexp code can handle named subexpressions, use stri_extract_all_regex(..., simplify=TRUE)
splitToDF(rx, s, namedOnly = TRUE, validOnly = TRUE, guess = TRUE, ...)
rx: |
Perl-type regular expression with named fields, as
described in |
s: |
character vector. Each element must match |
namedOnly: |
if |
validOnly: |
if |
guess: |
if |
...: |
additional parameters to |
a data.frame. Each column is a vector and corresponds to a
named field in rx
, going from left to right. Each row in
the data.frame corresponds to an item in s
which matches rx
.
If no items of s
match rx
, the function
returns NULL
. If guess
is TRUE
, columns
have been converted to their guessed types.
This function serves a similar purpose to read.csv
,
except that the rules for splitting input lines into columns
are much more flexible. Any format which can be described by a
regular expression with named fields can be handled. For
example, logfile messages often contain extra text and variable
field positions and interspersed unrelated messages which
prevent direct use of functions like read.csv
or
scan
to extract what is really just a dataframe with
syntactic sugar and interleaved junk.
For example, if input lines look like this:
s = c( "Mar 10 06:25:11 SG [62442.231077] pps-gpio: PPS @ 1425968711.000018004: pre_age = 163, post_age = 1130", "Mar 10 06:25:11 SG [62442.23108] usb-debug: device 45 disconnected", "Mar 10 06:25:12 SG [62443.2311] pps-gpio: PPS @ 1425968712.000011015: pre_age = 1055, post_age = 11655", "Mar 10 06:25:13 SG [62444.2] dbus[2872]: [system] Successfully activated service 'org.freedesktop.PackageKit' "Mar 10 06:25:13 SG [62444.23] pps-gpio: PPS @ 1425968713.000011275: pre_age = 160, post_age = 12120" )
and we wish to extract timestamps and pre_age and post_age from the pps-gpio messages as a data.frame, we can use this regular expression:
rx = "pps-gpio: PPS @ (?<ts>[0-9]+\\.[0-9]*): pre_age = (?<preAge>[0-9]+), post_age = (?<postAge>[0-9]+)"
splitToDF(rx, s) then gives:
ts preAge postAge 1 1425968711 163 1130 2 1425968712 1055 11655 3 1425968713 160 12120
where the first column is numeric and others are integer.
John Brzustowski jbrzusto@REMOVE_THIS_PART_fastmail.fm
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.