knitr_cawk_engine | R Documentation |
The one, true Awk now has support for using CSV columns as fields.
knitr_cawk_engine(opts)
options |
knitr options (required parameter) |
It will eventually be the default on most distros, until then, to use this knitr engine, you will need to do the following:
git clone git@github.com:onetrueawk/awk.git
git checkout csv
make
Move a.out
to cawk
somewhere on your PATH
You can find an example Quarto document that has a cawk
section via:
system.file("examples/cawk-test.qmd", package="knitrengines")
Any chunk with the language of cawk
will be processed
by this engine. Because there is quite a bit baked into
the way Awk works, we need to be able to set up various
command line options to get it to work. We do that
via knitr chunk options (either inline or in quarto
document structured comments).
The following chunk options are supported:
awk.csv
: when set to TRUE
, this tells cawk
to assume
the input files are CSV files. This is not set by default
though I'm open to feedback on that life choice.
awk.var.NAME
: CSV-enabled Awk knows nothing about
column names. You reference each column by field number.
That's not great for "data analysis" work (it isn't great
for anything but one-liners, tbh), so one can use the -v
CLI option to define variable name mappings to field numbers.
So defining awk.var.logdate=1
will let you use $logdate
in
the Awk script vs $1
.
awk.file.#
: Awk processes stdin and files. You can't use data
from the previous chunk without shoving it into a file. Awk can
also work with multiple input files. You can specify the relative
or full path(s) via something like awk.file.1="this.csv"
. The
entire awk.file.#
gets ignored. It just needs to be unique.
They are passed to the Awk CLI in the order they appear in the options.
Bob Rudis
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.