concensusDataSetFromFile: Set up analysis container for ConcensusGLM

Description Usage Arguments Details Value See Also

Description

Set up analysis container for ConcensusGLM using path to data as input. Performs validity checks on the input. Optionally annotates input with experimental metadata from annotation_filename.

Usage

1
2
3
4
concensusDataSetFromFile(data_filename, annotation_filename = NULL,
  output_path = ".", controls = NULL, rename = NULL, test = FALSE,
  checkpoint = FALSE, threshold = 100, spike_in = "^intcon",
  pseudostrains = TRUE, ...)

Arguments

data_filename

Character. Path to a table of counts.

annotation_filename

Character. Optional. Path to experimental annotations. These will be joined to the input data and used for batch correction.

output_path

Character. Path to directory where you want the analysis output. This is also where checkpoints and logs from cluster execution are stored. Default is current working directory.

controls

Named list of Characters with elements positive and negative. Adds logical columns to data called positive_control and negative_control based on regular expression matches.

test

Logical. Run in test mode? If so, only reads the first 5 million lines of data_filename.

checkpoint

Logical. Save intermediate results as checkpoints?

threshold

Numeric. Strains below this total count threshold will be discarded. Default 1000. Plates below 1000 x threshold will be discarded. Set to 0 to skip this.

spike_in

Character. A regular expression to match spike-in controls.

pseudostrains

Logical. Make pseudostrains such as "total", which is the sum of all non-spike-ins.

Details

This creates the concensusDataSet object from data_filename on which downstream analysis is carried out.

The input CSV should be found at data_filename. This CSV should at least have the headers 'id', 'compound', 'concentration', 'strain', 'plate_name', 'count', 'well', and must have one row per strain-compound-concentration-plate_name-well combination. Together, "id", "plate_name" and "well" define unique experimental samples; "id" refers to sequencing (technical) replicates (if any) and "plate_name" and "well" are biological replicates (recommended). Any given condition should have at least 2 replicates of some kind. If "row" and "column" are present but not "well", then well is constructed by concatenating "row" and "column".

Firstly, this function loads a CSV from data_filename. This may take some time if it is a large file. It then checks that the minimum headers are present.

It then checks for either a negative_control column or a list of control compounds supplied to controls argument.

Under-represented (assumed to be spurious) strains and plates (as defined by threshold) are removed.

Then, pseudo-strains are built if requested (the default). So far, the only pseudo-strain defined is "total", which is the total counts per well of non-spike-in strains.

If defined, annotations (like experimental meta-data) are loaded from annotation_filename. This CSV file needs a column name in common with data_filename (case sensitive) since the next step is a join on the shared columns. Also, every observation in the input data that you want to keep must be annotated.

Finally, it adds a negative_control column if necessary and a positive_control column if defined, and checks that there are at least 2 negative control observations.

Value

list of class "concensusDataSet"

See Also

newWorkflow, pipeline, concensusDataSetFromFile


eachanjohnson/concensusGLM documentation built on June 26, 2019, 2:26 a.m.