prepare_dataset: Extract Sample Attributes from Filenames

View source: R/io.R

prepare_datasetR Documentation

Extract Sample Attributes from Filenames

Description

Find files matching a pattern in a given directory, and build a data frame of standard sample attributes from fields in the filenames. Nested directory structures are supported. Alternatively, use load_dataset to load a spreadsheet of sample attributes explicitly. load_dataset can be used for cases where more than one locus is to be analyzed from a single sequencer sample (i.e., multiplexed samples), though the locusmap argument here can allow automatic matching of locus names for multiplexed samples. If the directory path given does not exist or if no matching files are found, an error is thrown.

Usage

prepare_dataset(
  dp = cfg("prep_dataset_path"),
  pattern = cfg("prep_dataset_pattern"),
  ord = cfg("prep_dataset_order"),
  autorep = cfg("prep_dataset_autorep"),
  locusmap = NULL
)

Arguments

dp

directory path to search for matching data files.

pattern

regular expression to use for parsing filenames. There should be exactly three groups in the pattern, for Replicate, Sample, and Locus.

ord

integer vector giving order of the fields Replicate, Sample, and Locus in filenames. For example, if Locus is the first field followed by Replicate and Sample, set ord=c(3, 1, 2).

autorep

logical allowing for automatic handling of any duplicates found, labeling them as replicates. FALSE by default.

locusmap

list of character vectors, each list item name being the locus text given in the filenames, and each vector being a set of separate locus names. Each entry with a locus name text matching one of these list items will be replaced in the final output with several separate entries, one for each locus name in the corresponding vector. (For example, locusmap=list(ABCD=c("A", "B", "C", "D")) would take a filename with "ABCD" in the locus field and split it out into four entries for the four loci.)

Value

data frame of metadata for all files found


ShawHahnLab/chiimp documentation built on Aug. 20, 2023, 1:41 a.m.