load_data: Load and verify all required data for the KOMODO2 workflow
In fcampelo/KOMODO2-CRAN: Kegg Orthology EnrichMent Online DetectiOn

Description Usage Arguments Details Value Examples

View source: R/load_data.R

This script represents the first step of the LCFD workflow of KOMODO2. It separates the data loading, which can be the longest step of a workflow, from the analysis itself, which is faster and can be redone multiple times.

1	load_data(defs, cores = NULL)

`defs`	either a KOMODO2-type list object (see Details) or a path to a text file containing the required definitions.
`cores`	positive integer, how many CPU cores to use (multicore acceleration does not work in Windows systems). Notice that setting this parameter will override any 'type' field from 'defs'.

The script expects a 'KOMODO2'-type list, which is a list object containing at least the following fields:

test.path (char string): path to the folder containing annotation files of the test group
back.path (char string): path to the folder containing annotation files of the background group
x.path (char string): path to the file containing the genomes' attributes (for correlation test)
y.path (char string): path to the folder containing the the genomes and their annotations (for correlation test)
ontology (char string): which ontology to use. Currently accepts "GO" or "Gene Ontology", "KEGG" and "other".
dict.path (char string): file with the dictionary (terms and their meaning) of the ontology, if 'ontology' is set as "other".
type (char string): comparison module to use. Accepts "significance" (compares two groups of genomes within an ontology) or "correlation" (establishes how much avariable explains the variations seen in the genomes).

The input definitions can also be passed as a file path. If that is the case the file must be in a 'field = value' format. Blank likes and lines starting with '#' are ignored. Required fields are the same described for the 'KOMODO2' list described above.

updated defs list containing the information loaded from the files.

## Not run: 
# Build an input list:

fpath1 <- system.file("extdata", "gene2GO", package="KOMODO2")
fpath2 <- system.file("extdata", "metadata/GO_metadata_Pan_proxy.txt", package="KOMODO2")
fpath3 <- system.file("extdata", "trees/tree_genome_IDs.nwk", package="KOMODO2")

defs <- list(annotation_files_dir = fpath1,
             output_dir = "./results/GO_Pan_proxy/",
             dataset.info = fpath2,
             x.column = 2,
             ontology = "GO",
             dict.path = "",
             column = "GO",
             denominator.column = "",
             tree_path = fpath3,
             tree_type = "newick",
             linear_model_cutoff = 0.5,
             type = "correlation")

out_list <- load_data(defs, cores = 2)

## End(Not run)