process: cghRA array processing

Description Usage Arguments Value Processing steps Author(s) See Also

Description

These functions implement the cghRA workflow, as a sequence of process subfunction calls. Each of them rely on cghRA.array and cghRA.regions methods, so custom processing can be easily achieved using them directly if the steps argument is not flexible enough to your purpose.

Custom steps can be added as well on the model of existing ones, defining a function called process.NAME and adding "NAME" to the steps vector during the call to process. Step functions need to handle at least an input parameter which will be returned directly by the previous step, thus forming a pipeline.

The tk.process function is a wrapper for process, built around a Tcl-Tk interface for more user-friendliness.

The process function is a multi-core command line interface that will dispatch its arguments to individual process.core calls, and should be the prefered entry point even on single core computers. process.log is a wrapper to process.core which captures warnings and errors into a log file.

The process.default function is a common way for process and tk.process to obtain default values for complex arguments like 'segmentArgs' and 'modelizeArgs'. It can be used to obtain the profiles proposed by tk.process in process.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
  process(inputs, logFile = "process.log", cluster = NA, ...)
  process.log(..., logFile)
  process.core(input, inputName, steps = c("parse", "mask", "replicates", "waca",
    "export", "spatial", "segment", "fill", "modelize", "export", "fittest", "export",
    "applyModel", "export"), ...)
  process.parse(input, design, probeParser = Agilent.probes, probeArgs = list(), ...)
  process.probes(input, design, ...)
  process.regions(input, ...)
  process.mask(input, ...)
  process.replicates(input, replicateFun = stats::median, ...)
  process.waca(input, ...)
  process.spatial(input, outDirectory, ...)
  process.segment(input, segmentArgs = process.default("segmentArgs"), ...)
  process.fill(input, ...)
  process.modelize(input, modelizeArgs = process.default("modelizeArgs"), ...)
  process.applyModel(input, ...)
  process.fittest(input, ...)
  process.export(input, outDirectory, ...)
  tk.process(globalTopLevel, localTopLevel)
  process.default(argName, profileName)

Arguments

inputs

List of input to dispatch to each node (preferably named). The default workflow expects it to be a character vector naming raw data files to be parsed.

logFile

Single character value, the path to the log file to produce with messages, warnings and errors. If the file already exists, it will be emptied first. The behavior when logFile is set to NA or "" depends on cluster: if cluster is FALSE (unparallelized mode), messages and errors will be passed to the R console rather than logged in a file, if cluster is anything else they will be silently ignored.

cluster

Arguments to be passed to makeCluster as a list, for parallel processing (requires the optionnal parallel package). Remote machines are not handled properly in the current version of process, you should limit to "spec" defining how many processors can be used on the local machine as an integer value. The FALSE value requires an unparallelized mode, slower but more suitable for error tracking. The NA default value tries to detect the CPU count on the local machine if parallel is installed, else switches to unparallelized mode.

...

Further arguments to be passed to process sub-functions, depending on the steps choosen (see below). The default workflow expects at least design and outDirectory to be provided.

input

A single input to process on one node. The default workflow expects it to be a single character value naming a raw data file to be parsed.

inputName

Single character value, the name of the input currently processed (for logging only).

steps

Ordered character vector, naming the processing steps to apply. Custom steps can be named as well, as long as a function named "process.[step]" exists in the global environment. Each step will take as input the output of the previous step, the first step taking the value of the input argument as input.

probeParser

The function to parse probeFiles into cghRA.probes objects, such as Agilent.probes for Agilent FeatureExtraction arrays.

probeArgs

A list of arguments to pass to probeParser (apart from 'file' which is always provided).

design

Single character vector, the path and name of the RDT design file, as produced by tk.design.

replicateFun

The function to apply to replicate groups, if the "replicate" step is to be applied. This function must use a vector of numeric values (logRatios) as input, and return a single representative value (typically median or mean).

outDirectory

Single character value, the directory in which produce the output files.

segmentArgs

Character vector, the arguments to be passed to the DNAcopy method of the cghRA.array class. Arguments are defined as a character string that will be parsed, multiple values define multiple segmentation profiles to apply sequentially.

modelizeArgs

Single character value, the arguments to be passed to the model.auto method of the cghRA.array class. Arguments are defined as a character string that will be parsed.

argName

Single character value, 'segmentArgs' or 'modelizeArgs', the argument to get the default value for. If missing, the list of profiles and arguments handled is returned.

profileName

Single character value, altering the default values returned. If missing, the default profile is returned.

globalTopLevel

This argument should be filled only when embedding this Tcl-Tk interface in an other. It is the top level of the embedding interface, generally a call to tktoplevel.

localTopLevel

This argument should be filled only when embedding this Tcl-Tk interface in an other. It is the local top level to use to build this interface, generally a tkframe or ttkframe.

Value

Only process.default returns something : if argName is provided it returns the default value for the queried argument, else a list of profiles available for each handled argument. When many profiles are handled, the first value in the list is the default one (returned when profileName is missing).

Processing steps

The complete workflow involves the following steps :

parse

Read a raw data file and return a cghRA.array object.

probes

Read a cghRA.probes object stored in a RDT file and return a cghRA.array object.

regions

Reads one or many cghRA.regions file(s) stored in RDT file(s).

mask

Discard flagged probes (saturated, high background ...) in a cghRA.array object. Any TRUE value in a column whose name begins with "flag_" is enough to discard a probe (turn its logRatio into NA. See the cghRA.array$maskByFlag() method for further details.

replicates

Replace replicated probe groups (same "name") by a single representative value (all logRatios are turned to NA except from the first one which will hold the representative value). See the cghRA.array$replicates() method for further details.

waca

Apply the WACA algorithm to the logRatios. See the cghRA.array$WACA() method for further details.

spatial

Produce a PNG file to visually check spatial biases. See the cghRA.array$spatial() method for further details.

segment

Compute regions with similar logRatios along the genome, using the CBS algorithm. See the cghRA.array$DNAcopy() method for further details.

fill

Extend segments to the right to join consecutive segments. See the cghRA.regions$fillGaps() method for further details.

modelize

Fit a copy number model to segments, in order to convert logRatios to true copy numbers. If segmentArgs contains multiple values, each segmentation profile will lead to distinct "copies" and "regions" files numbered according to its position in segmentArgs. See the cghRA.regions$model.auto() method for further details.

applyModel

Convert a modelized cghRA.regions objects into cghRA.copies.

fittest

If multiple segmentation profiles have been used, select the fittest model ("copies" and "regions" files duplicated without number). For further details on the STM score used for fittest model selection, see the model.auto function of the cghRA.copies package.

clean

Erase "copies" and "regions" files of the different segmentation profiles tested, as "fittest" should have saved the best.

Author(s)

Sylvain Mareschal

See Also

tk.design, cghRA.array


cghRA documentation built on May 2, 2019, 3:34 a.m.