A friendly, focused alternative to using regular expressions for path parsing.
The purpose of the dirdf package is to let you, the user, write a path specification that we can apply to file paths, extracting out relevant chunks into data frame columns. The most obvious mechanism for doing so is a regular expression, and indeed, dirdf lets you provide a regex argument.
But for most reasonable directory/file naming conventions, regex is overkill;
its power is wasted on something like YYYY-MM/DD/LocationId/SubjectId.csv
,
yet you still have to pay the price of regexes being difficult to write and
to read, and easy to get subtly wrong.
Path templates are a friendlier alternative. A path template is a string that
consists of variable names and delimiters. A variable name is any contiguous
run of alphanumeric characters (optionally, with a trailing ?
character);
delimiters are everything else.
For example:
Year-Month/Day/FirstName_MiddleInitial?_LastName.ext
In this example, Year
, Month
, Day
, FirstName
,
MiddleInitial
, LastName
, and ext
are variable names. All
of the dash, slash, underscore, and period characters between them are
considered delimiters.
When parsed, this template will match each variable to any number of non-slash characters, up until the next delimiter. (Slash will never be considered part of a variable match, as we consider it the path separator.)
The trailing question mark makes MiddleInitial?
optional; both its
value and its preceding delimiter (_
in this case) can be omitted from
target paths, in which case the resulting value for that variable will be
NA
(or in some edge cases, ""
).
Surrounding with tilde (~
) makes ~MiddleInitial~
to be marked for being dropped.
It is possible to drop an optional field, e.g. ~MiddleInitial?~
.
1 2 3 4 5 6 | template <- "Year-Month/Day/FirstName_MiddleInitial?_LastName.ext"
paths <- c(
"1860-02/01/Abel_Magwitch.csv",
"1847-10/13/Bertha_A_Mason.csv"
)
dirdf_parse(paths, template)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.