dirdf_parse: Path Metadata Parsing

Description Usage Arguments See Also Examples

Description

Creates a data frame using information from the paths and file names. It accepts either a template or a regular expression and column names. Similar to dirdf(), but this takes a vector of pathnames and tries to match them directly, rather than calling base::dir() on them and matching those results. This is helpful if you want to filter or transform the set of paths before matching, e.g. to remove any irrelevant filenames like ‘.gitignore’, ‘.DS_Store’, ‘desktop.ini’.

Usage

1
2
dirdf_parse(pathnames, template = NULL, regexp = NULL, colnames = NULL,
  missing = NA_character_, ignore.case = FALSE, perl = TRUE)

Arguments

pathnames

character vector of pathname(s), e.g. the result of calling base::dir().

template

template character string, e.g. "Country/Province/City/StationID_Date.ext".

regexp

regular expression used to parse the file names. Only one of the arguments regexp and template must be specified, i.e. only one of them can be non-NULL.

colnames

character vector containing the names of the columns in the data frame. Not required if using template or if regexp uses named capturing groups (see examples), but may still be used to override column names.

missing

value to use for unmatched optional template elements or regexp capturing groups.

ignore.case, perl

If regexp is used, these are passed to base::regexpr(). Note that unlike regexpr(), the default value for perl is TRUE (to make it more convenient to use named capture groups, which are only supported in Perl mode).

See Also

dirdf()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
path1 <- system.file(package = "dirdf", "examples", "dataset_1")
pathnames1 <- dir(path1)

template1 <- "Year-Month-Day_Assay_Plasmid-Type-Fraction_WellNumber?.extension"
regex1 <- paste0(
  "^(?P<Year>\\d{4})-(?P<Month>\\d{2})-(?P<Day>\\d{2})",
  "_(?P<Assay>[a-zA-Z0-9]+)_(?P<Plasmid>[a-zA-Z0-9]+)",
  "-(?P<Type>[a-zA-Z0-9]+)-(?P<Fraction>[a-zA-Z0-9\\-]+)",
  "(?:_(?P<WellNumber>\\w+))?\\.csv$"
)
regex1a <- paste0(
  "^(\\d{4})-(\\d{2})-(\\d{2})_([a-zA-Z0-9]+)_([a-zA-Z0-9]+)",
  "-([a-zA-Z0-9]+)-([a-zA-Z0-9\\-]+)(?:_(\\w+))?\\.csv$"
)
names_regex1a <- c("Year", "Month", "Day", "Assay", "Plasmid", "Type", "Fraction", "WellNumber")

dirdf_parse(pathnames1, template1)
dirdf_parse(pathnames1, regexp = regex1)
dirdf_parse(pathnames1, regexp = regex1a, colnames = names_regex1a)

path2 <- system.file(package = "dirdf", "examples", "dataset_2")
pathnames2 <- dir(path2)
template2 <- "Date_Assay_Experiment_WellNumber?.extension"
dirdf_parse(pathnames2, template2)

ropenscilabs/dirdf documentation built on May 27, 2019, 8:32 p.m.