clean_data: Clean data for the KOMODO2 workflow
In fcampelo/KOMODO2-CRAN: Kegg Orthology EnrichMent Online DetectiOn

Description Usage Arguments Details Value Examples

View source: R/clean_data.R

This script implements the second step of the LCFD workflow of KOMODO2. It is responsible for dealing with data inconsistencies, including missing values, outliers and undesired characteres, as well as data merging. It also preprocesses data to allow for more flexible inputs form the user, such as automatically converting common annotation output to a single standard format.

1	clean_data(defs)

defs

an enriched KOMODO2-type list object (see Details).

The script expects enriched 'KOMODO2'-type lists, which are generated by [load_data()].

updated defs list containing information from parsed genome maps (e.g., for test and back genomes if 'type == "significance"')

## Not run: 
# Build an input list:
fpath1 <- system.file("extdata", "gene2GO", package="KOMODO2")
fpath2 <- system.file("extdata", "metadata/GO_metadata_Pan_proxy.txt", package="KOMODO2")
fpath3 <- system.file("extdata", "trees/tree_genome_IDs.nwk", package="KOMODO2")

defs <- list(annotation_files_dir = fpath1,
             output_dir = "./results/GO_Pan_proxy/",
             dataset.info = fpath2,
             x.column = 2,
             ontology = "GO",
             dict.path = "",
             column = "GO",
             denominator.column = "",
             tree_path = fpath3,
             tree_type = "newick",
             linear_model_cutoff = 0.5,
             type = "correlation")

defs <- load_data(defs, cores = 2)
defs <- clean_data(defs)

## End(Not run)