import_layout_from_paths: Import an experiment layout from nested directories
In benjbuch/summerr: General Helpers for Lab Fun

Description Usage Arguments Details Value Examples

Some data is organized in folders that contain files with similar or even identical names. Given a list of paths pointing to those files, the layout of the experiment is established from the nesting of folders.

import_layout_from_paths(
  paths,
  pivot = "[0-9]_[A-Z]+[0-9]+",
  relative_to = getwd()
)

`paths`	A list character vector or list of file paths. See Details.
`pivot`	A regular expression describing the name of a single folder in each tree up to which "groups" and from which "replicates" are established. See Details.
`relative_to`	If not `NULL`, this subpath is hidden from the paths for all operations; "" to suppress hiding; `NULL` will hide the subpath that is common in all elements of `paths`.

paths can be a vector such as c("path1", "path2", "path3") or a list of character vectors, e.g. as a result of averaging data, such as list(c("path1", "path2"), "path3"). The latter case is usually not exposed to the user; the algorithm applies to "path1" and "path3".

pivot must be a unique match for each path. Nesting above and within the matched folder can differ from path to path; care is taken to handle the grouping accordingly.

If pivot is a regular expression with lookahead or lookbehind, these elements are kept in the path.

If pivot has multiple matches in the path, it is advisable to call this function with relative_to = (common path) since it will be removed from the paths before a match is sought for. Alternatively, relative_to = NULL will automatically consume the longest shared path between all paths.

relative_to is expanded (like all paths) before the regular expression is looked for.

relative_to is preserved as attr(., "dir") for future use.

Sample groups by nesting parent folders

Sample groups are determined from the enclosing (parent) folders. For example, in

/common1/folder1/folderA/0_A1/...
                      ../0_A2/...
              ../folderB/0_A1/...
/common1/folder2/folderA/0_A1/...
              ../folderB/0_A1/...
                        /0_A2/...

two nested groupings are identified: (1) "folder1" and "folder2" as grp_0, and (2) "folderA" and "folderB" as grp_1. The common path "common1" will not be used as a grouping variable. The first parent is always used as a grouping variable. The full (unique) grouping is returned under group.

Grouping can allow to anaylze multiple directories simultaneously with their own set of parameters enclosed at an appropriate level of nesting.

Replicates by nesting subfolders

Sample replicates are determined from the enclosed (child) folders. For example, in

1
2
3

../0_A1/1/targetfile
       /2/targetfile
../0_A2/1/targetfile

the first pivot ("0_A1") contains two replicates, the second pivot ("0_A2") one.

A tibble with the experiment layout as determined from the paths with columns grp_N, ..., grp_0 specifiying the sample groups, sub_1, ..., sub_N specifiying the subfolder nesting, pivot, replicate and n_replicates, based on sub_1, the path of the file relative to relative_to and its original position in paths as findex.

demo_paths <- c("folderA/0_A1/1/file.x", "folderA/0_A1/2/file.x",
  "folderA/0_A2/1/file.x", "folderB/0_A1/1/file.x")
import_layout_from_paths(demo_paths, relative_to = NULL)
import_layout_from_paths(paste0("folderX/", demo_paths), relative_to = NULL)
import_layout_from_paths(paste0("folderX/", demo_paths[1:2]), relative_to = NULL)
import_layout_from_paths(paste0("folderY/folderX/", demo_paths), relative_to = NULL)

# more complex scenarios
import_layout_from_paths(paste0(c("folderY/", "folderX/"), demo_paths), relative_to = NULL)
import_layout_from_paths(paste0(c("folderZ/folderY/", "folderX/"), demo_paths), relative_to = NULL)