Description Usage Arguments Value Basic components of a tool wrapper Modelling functions
This function lets you create wrappers of projection or clustering tools. Then, you can include them in benchmark pipelines.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | WrapTool(
name,
type,
r_packages = NULL,
python_modules = NULL,
fun.build_model.single_input = NULL,
fun.build_model.batch_input = NULL,
fun.build_model = NULL,
fun.extract = function(model) model,
fun.apply_model.single_input = NULL,
fun.apply_model.batch_input = NULL,
fun.apply_model = NULL,
prevent_parallel_execution = TRUE,
use_python = !is.null(python_modules) && length(python_modules) > 0,
use_original_expression_matrix = FALSE,
use_knn_graph = FALSE
)
|
name |
string: name of tool |
type |
string: type of tool type (either ' |
r_packages |
string vector: names of all |
python_modules |
optional string vector: names of |
fun.build_model.single_input |
optional function: modelling function which accepts a single coordinate matrix as input data. Minimal signature |
fun.build_model.batch_input |
optional function: modelling function which accepts a list of coordinate matrices as input data. Minimal signature |
fun.build_model |
optional function: different parameter name for |
fun.extract |
optional function: modelling function which accepts a model generated by |
fun.apply_model.single_input |
optional function: modelling function which accepts a model generated by |
fun.apply_model.batch_input |
optional function: modelling function which accepts a model generated by |
fun.apply_model |
optional function: different parameter name for |
prevent_parallel_execution |
logical: whether running the tool in parallel on multiple CPU cores should be prevented. Default value is |
use_python |
logical: whether the tool uses |
use_original_expression_matrix |
logical: whether the tool uses original expression matrix apart from the output of the preceding dimension-reduction tool. Default value is |
use_knn_graph |
logical: whether the tool uses a |
This function returns a wrapper function that can be used in constructing a benchmark pipeline using Fix
, Module
and Subpipeline
.
To create a wrapper, you need to specify a handful of components (as arguments to WrapTool
).
name
is a unique string identifier. This is also included in the name of the wrapper (for example, FlowSOM
will have wrapper.clustering.FlowSOM
).
type
specifies whether it is a projection tool (for dimension reduction or denoising) or clustering tool.
The string vector r_packages
specifies names all required R
packages and python_modules
specifies names of required Python
modules (that will be accessed via reticulate
: the R
/Python
interface).
Modelling functions are the ones that do the work: transform input data.
At least one of them (fun.build_model
) needs to be specified.
fun.build_model.single_input
takes a single coordinate matrix of data and returns a model.
The model is an object from which the desired result (projection coordinate matrix or vector of cluster indices per data point) can be extracted.
fun.build_model.batch_input
, instead, takes a list of multiple coordinate matrices (one per sample) as input and returns a model.
If the tool does not distinguish between a single input matrix and multiple input matrices (it would just concatenate the inputs and apply fun.build_model.single_input
), fun.build_model.batch_input
can be left unspecified and it will be auto-generated.
In that case, you can specify the function summarily as fun.build_model
.
fun.extract
is a function that takes a model object (generated by fun.build_model...
) as input and extracts results of the model applied to the original input data.
fun.apply_model.single_input
takes a model object and a new coordinate matrix as input.
It returns the result of applying the previously trained model on new data.
fun.apply_model.batch_input
takes a list of coordinate matrices as input and applies the model to new data.
Results of the ...batch_input
functions should not be split up into lists according to the sizes of the original inputs: they always return a single coordinate matrix or cluster vector (the splitting per sample is implemented automatically).
The minimal signature of a fun.build_model...
function is function(input)
.
Other arguments, with their default values, can (and should) be included: that way, changes in other parameters can be tested.
For example, a simple signature of a fun.build_model...
function for the dimension-reduction tool t-SNE
might be function(input, latent_dim = 2, perplexity = 2)
, allowing the user to alter target dimensionality or the perplexity parameter.
Signatures of the other modelling functions are fixed.
For fun.extract
it is function(model)
and for fun.apply_model...
it is function(model, input)
.
If a clustering tool uses the original high-dimensional expression data as well as a projection (generated in the previous step by some projection method), then include the parameter expression
in your function signature and set parameter use_original_expression_matrix
to TRUE
.
expression
is either a single matrix or a list of matrices, much like input
.
input
, then, will be the output of the preceding projection tool in that given sub-pipeline.
If your tool uses a k
-nearest_neighbour graph (k-NNG), you are encouraged to always use one that was computed at the beginning of your pipeline evaluation.
(The k
-NNG will be created if SingleBench
knows it will run one or more tool that need it.)
To do this, set use_knn_graph
to TRUE
and add the argument knn
to the signature of your model-building functions.
knn
will then be a list of two names matrices: Indices
for indices of nearest neighbours (row-wise) and Distances
for distances to those neighbours.
Warning: the entries in Indices
will be 1
-indexed and the matrices do not contain a column for the 'zero-th' neighbour (for each point, the zero-th neighbour is itself).
To modify the knn
object (switch to 0-indexing or include zero-th neighobur), use the convertor kNNTweak
inside your model-building function.
For instance, to convert knn
to only a matrix of indices that does include zero-th neighbours, is 1-indexed and k
is lowered from its original value to 30
, use: knn <- kNNGTweak(knn, only_indices = TRUE, zero_index = TRUE, zeroth_neighbours = TRUE, new_k = 30)
.
Most tools can accept custom numeric parameters.
Any one of the arguments to a model-building function can be chosen as the n-parameter by the user: then, SingleBench
can do parameter sweeps over different values of these parameters.
Dimension-reduction tools, if possible, should have a parameter latent_dim
for iterating over latent-space dimensionality.
Clustering tools, if possible, should have a parameter n_clusters
for iterating over target cluster count.
If there is an option to determine number of clusters automatically, it might be a good idea to use n_clusters = 0
for this.
For methods that are made to run on multiple CPU cores, set prevent_parallel_execution
to TRUE
(otherwise, SingleBench
may attempt to run them in parallel if the user wants repeated runs for stability analysis).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.