clean_data: Clean data for the KOMODO2 workflow

Description Usage Arguments Details Value Examples

View source: R/clean_data.R

Description

This script implements the second step of the LCFD workflow of KOMODO2. It is responsible for dealing with data inconsistencies, including missing values, outliers and undesired characteres, as well as data merging. It also preprocesses data to allow for more flexible inputs form the user, such as automatically converting common annotation output to a single standard format.

Usage

1
clean_data(defs)

Arguments

defs

an enriched KOMODO2-type list object (see Details).

Details

The script expects enriched 'KOMODO2'-type lists, which are generated by [load_data()].

Value

updated defs list containing information from parsed genome maps (e.g., for test and back genomes if 'type == "significance"')

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
## Not run: 
# Build an input list:
fpath1 <- system.file("extdata", "gene2GO", package="KOMODO2")
fpath2 <- system.file("extdata", "metadata/GO_metadata_Pan_proxy.txt", package="KOMODO2")
fpath3 <- system.file("extdata", "trees/tree_genome_IDs.nwk", package="KOMODO2")

defs <- list(annotation_files_dir = fpath1,
             output_dir = "./results/GO_Pan_proxy/",
             dataset.info = fpath2,
             x.column = 2,
             ontology = "GO",
             dict.path = "",
             column = "GO",
             denominator.column = "",
             tree_path = fpath3,
             tree_type = "newick",
             linear_model_cutoff = 0.5,
             type = "correlation")

defs <- load_data(defs, cores = 2)
defs <- clean_data(defs)

## End(Not run)

fcampelo/KOMODO2-CRAN documentation built on March 7, 2020, 6:35 a.m.