process_GM: Clean and reformat peak data files from GeneMapper software.

Description Usage Arguments Details Value

View source: R/functions_clean_GM.R


Function process_GM cleans up various aspects of raw GeneMapper-output peak data files and converts them to a nicer data structure. It also optionally formats them for use with the online tool T-REX.


process_GM(peaks_file_path, TREX = FALSE, targets, label_file_path = NULL,
  desired_columns = NULL, write_path = paste(getwd(), "/", sep = ""))



A character vector with one element, containing the filepath (absolute or relative to working directory) of the GeneMapper peaks file.


Should files containing peaks and label data be written to disk for uploading to TREX? If TRUE, files will be created and written to the working directory as "TREX_peaks.txt" and TREX_label.txt", unless an alternative is supplied to write_path (see below). The label file will contain a column FileName as required by TREX, and then a column for sample_ref and plate_well as created by function split_filename(). Defaults to FALSE.


A named character vector. Elements should be names for the targets of the TRFLP, while names must be the capitalised first letter of the dye colour as referenced by GeneMapper. For example, for a primer pair with an attached red fluorophore and which targets domain Archaea, the corresponding entry would be "R" = "archaea". Element "standard" is special; it refers to the colour of the size standard included in the fragment analysis run. This is usually orange, so the corresponding entry in the targets vector would be "O" = "standard".


An additional label file (in .csv format) containing additional informative columns to be included in the TREX label file. Must include one column of filenames named file_name, identical to the file names in the peaks file (though the order may be arbitrary). See the TREX documentation for further information on label files.


A character vector containing the names of columns to be retained from the original GeneMapper peak file. Used to remove blank columns. Included for potential expansion to allele and marker data; for now, avoid passing a value to this argument.


A character vector with one element, specifying the filepath for writing the TREX label and peak files. Defaults to the working directory. Note that supplied alternatives can include a prefix e.g. "C:/user/data/set1_", which will result in files names "set1_TREX_label.txt" and "set1_TREX_peaks.txt" in directory "C:/user/data/". If you do not use a prefix, then the trailing backslash must be supplied e.g. "C:/user/data/"; supplying "C:/user/data" will result in files "dataTREX_label.txt" and "dataTREX_peaks.txt" in directory "C:/user/".


Peak data files exported from GeneMapper contain some empty or redundant columns (particuarly for TRFLP purposes) as well as poorly-formatted data. This function removes empty columns, separates peak and dye identifiers, and reformats a flat data frame into a nested list with two levels: sample (top level) and target (second level).

Optionally, process_GM() can also write a T-REX label and peak file which can be directly uploaded to the online TRFLP analysis tool TREX without further manual formatting.


A list with two nestings: the top level is a list of samples, named using their sample_ref as extracted from file_name by function split_filename. The second level is a list of peak data frames named by targets (as supplied to argument targets) and containing peak data, with rows numbered as the peak number extracted from the original file.

mixtrak/wellington.aquifer.ecol documentation built on Nov. 30, 2017, 4:25 a.m.