Workflow4Metabolomics (W4M) provides tools to help people to process metabolomic datasets. Its preferred Chanel is the web-based interface Galaxy, particularly suited for analysts with no skill regarding software programing nor programing language in general. Galaxy also has the advantage to be compatible with a large variety of languages, enabling to combine tools coded in various programing languages, which is a huge advantage regarding to the diversity of steps (and thus tools) needed for comprehensive metabolomic data processing.
Nonetheless, a significant part of the tools provided by W4M are coded using R. Due to the diversity of tools, W4M chose to adopt common standards across its tools, in particular regarding data table formats and basics data checks. It also intents to gather utility functions to help the construction of R-based Galaxy tools using W4M standards, along with functions to easily handle basic actions given the W4M table data format.
The W4MRUtils
package is meant to gather such functions, to enable W4M R-based Galaxy tools to
have easy access to these standardised approaches.
Some functions are closely linked to Galaxy integration as the parse_args
function for example,
but a large variety of functions can be helpful independently of Galaxy.
To use these functions, it is useful to understand the choices made by W4M,
in particular regarding the formats, as these will be often used as input parameters for functions.
The main format aspect to consider is the W4M 3-tables format. Metabolomic data, similarly to other types of 'omics data, can be divided in 3 main types of information:
sampleMetadata
in short): it corresponds to information
regarding the samples that have been analysed with an analytical device; this covers any information about the samples
that is not the analytical measures themselves (examples: studied biological groups, batches of analysis)dataMatrix
in short): it corresponds to the intensities measured
using the analytical device used; it is a table with intensity values for each sample and for each analytical variablevariableMetadata
in short): it corresponds to
information regarding the variables that have been measured through the analytical device; this covers
any information about the variables that is not the analytical measures themselves (examples: m/z ratios for MS data,
chemical shift for NMR data)These 3 types of information can be stocked in 3 distinct tables. In the package, the chosen format is the following:
These 3 tables are standard inputs and/or outputs for several of the functions found in this package. Consequently, when you need to specify sample-wise or variable-wise information, you will generally be invited to stock it in one of these tables (same goes for outputs of functions, where to find the generated information).
There are several types of functions in the package. Globally, you will find functions enabling the following:
knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
input <- file.path(tempdir(), "./input.csv") commandArgs <- function() { #nolint list( "--input", input, "--output", file.path(tempdir(), "./output.csv"), "--threshold", "2" ) } write.csv( data.frame(x = c(10, 2, 1, 4, 7, 9), y = c("a", "b", "c", "d", "e", "f")), input, row.names = FALSE )
library(W4MRUtils) TOOL_NAME <- "A Test Tool" #nolint
logger <- get_logger(TOOL_NAME)
logger$info("Parsing parameters...") args <- optparse_parameters( input = optparse_character(), output = optparse_character(), threshold = optparse_numeric(default = 10), args = commandArgs() ) logger$info("Parsing parameters OK.") show_galaxy_header( TOOL_NAME, "1.2.0", args = args, logger = logger, show_sys = FALSE )
check_parameters <- function(args, logger) { logger$info("Checking parameters...") check_one_character(args$input) check_one_character(args$output) check_one_numeric(args$threshold) if (args$threshold < 1) { logger$errorf( "The threshold is too low (%s). Cannot continue", args$threshold ) stopf("Threshold = %s is tool low.", args$threshold) } if (args$threshold < 3) { logger$warningf( paste( "The threshold is very low (%s).", "This may lead to erroneous results." ), args$threshold ) } logger$info("Parameters OK.") } read_input <- function(args, logger) { logger$infof("Reading file in %s ...", args$input) return(read.csv(args$input)) } process_input <- function(args, logger, table) { filter <- table[1, ] > args$threshold logger$infof( "%s values filtered (%s%%)", length(filter), round(length(filter) / nrow(table) * 100, 2) ) if (length(filter) == nrow(table)) { logger$warning("All values have been filtered.") } return(table[which(table[, 1] > args$threshold), ]) } write_output <- function(args, logger, output) { logger$info("Writing output file...") write.table(output, args$output) } # check_parameters(args, logger$sublogger("Param Checking")) # input <- read_input(args, logger$sublogger("Inputs Reader")) # print(input) # output <- process_input(args, logger$sublogger("Inputs Processing"), input) # print(output) # write_output(args, logger$sublogger("Output Writter"), output) # show_galaxy_footer(TOOL_NAME, "1.2.0", logger = logger, show_packages = FALSE)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.