#' @title An R function for calling ClusterONE command line
#' @description ClusterONE strives to discover densely connected and possibly
#' overlapping regions within the Cytoscape network you are working with.
#' The interpretation of these regions depends on the context (i.e. what the
#' network represents) and it is left up to you. For instance,
#' in protein-protein interaction networks derived from high-throughput
#' AP-MS experiments, these dense regions usually correspond to protein
#' complexes or fractions of them. ClusterONE works by "growing" dense regions
#' out of small seeds (typically one or two vertices), driven by a quality
#' function called cohesiveness.
#' @param inputFile the network edge file name. The each column of this file
#' is seperated by a tab. And the elements in the first row of this file is
#' considered column names.
#' @param inputFormat specifies the format of the input file ("sif" or
#' "edge_list"). Use this option only if ClusterONE failed to detect the
#' format automatically.
#' @param outputFormat specifies the format of the output file ("plain",
#' "csv" or "genepro").
#' @param minDensity sets the minimum density of predicted complexes.
#' "auto" means that the density threshold will be set automatically
#' based on whether the graph is weighted or not, and if not, what its
#' clustering coefficient is. Weighted graphs will have a default density
#' threshold of 0.3, unweighted graphs will have a density threshold of 0.5,
#' unless their global clustering coefficient is less than 0.1, in which
#' case the density threshold is set to 0.6.
#' @param minSize sets the minimum size of the predicted complexes.
#' @param fluff fluffs the clusters as a post-processing step.
#' This is not used in the published algorithm, but it may be useful
#' for your specific problem. The idea is to check whether the external
#' boundary nodes of each cluster connect to more than two third of the
#' internal nodes; if so, such external boundary nodes are added to the
#' cluster. Fluffing is applied before the size and density filters.
#' @param haircut apply a haircut transformation as a post-processing
#' step on the detected clusters. This is not used in the published
#' algorithm either, but it may be useful for your specific problem.
#' A haircut transformation removes dangling nodes from a cluster:
#' if the total weight of connections from a node to the rest of the
#' cluster is less than x times the average node weight in the cluster
#' (where x is the argument of the switch), the node will be removed.
#' The process is repeated iteratively until there are no more nodes to
#' be removed. Haircut is applied before the size and density filters.
#' @param maxOverlap specifies the maximum allowed overlap between two
#' clusters, as measured by the match coefficient, which takes the size
#' of the overlap squared, divided by the product of the sizes of the
#' two clusters being considered, as in the paper of Bader and Hogue.
#' @param mergeMethod specifies the method to be used to merge highly
#' overlapping complexes. The following values are accepted: \cr
#' \itemize{
#' \item "single" calculates similarity scores between all pairs of
#' complexes and creates a graph where the nodes are the complexes
#' and two nodes are connected if the corresponding complexes are
#' highly overlapping. Complexes in the same connected component
#' of the graph will then be merged. As its name suggests,
#' this is a single-pass method. \cr
#' \item "multi" calculates similarity scores between all pairs of complexes
#' and stores those pairs that have a score larger than a given threshold.
#' The highest scoring pair is then merged and the similarity of the
#' merged complex towards its neighbors is re-calculated. This is repeated
#' until there are no more highly overlapping complexes in the result.
#' As its name suggests, this is a multi-pass method where similarities
#' are re-calculated after each merge. \cr
#' }
#' @param similarity specifies the similarity function to be used in
#' the merging step. More precisely, this switch controls which scoring
#' function is used to decide whether two complexes overlap significantly
#' or not. The following values are accepted: \cr
#' \itemize{
#' \item "match" calculates the intersection size squared, divided by
#' the product of the sizes of the two complexes. This is also called
#' the **matching score**. This is the default. \cr
#' \item "simpson" or meet/min calculates the Simpson coefficient, i.e. the
#' intersection size over the size of the smaller complex. \cr
#' \item "jaccard" calculates the Jaccard similarity, i.e. the intersection
#' size over the size of the union of the two complexes. \cr
#' \item "dice" calculates the Dice similarity, i.e. twice the intersection
#' size over the sum of the sizes of the two complexes. \cr
#' }
#' @param noFluff don't fluff the clusters, this is the default.
#' For more details about fluffing, see the --fluff switch above.
#' @param noMerge don't merge highly overlapping clusters (in other words,
#' skip the last merging phase). This is useful for debugging purposes only.
#' @param penalty sets a penalty value for the inclusion of each node.
#' When you set this option to x, ClusterONE will assume that each node has
#' an extra boundary weight of x when it considers the addition of the node
#' to a cluster. It can be used to model the possibility of uncharted
#' connections for each node, so nodes with only a single weak connection
#' to a cluster will not be added to the cluster as the penalty value will
#' outweigh the benefits of adding the node. The default penalty value is 2.
#' @param seedMethod specifies the seed generation method to use.
#' The following values are accepted: \cr
#' \itemize{
#' \item "nodes": every node will be used as a seed.
#' \item "unused_nodes": nodes will be tried in the descending
#' order of their weights
#' (where the weight of a node is the sum of the weights on its incident
#' edges), and whenever a cluster is found, the nodes in that cluster will
#' be excluded from the list of potential seeds. In other words, the node
#' with the largest weight that does not participate in any of the clusters
#' found so far will be selected as the next seed. \cr
#' \item "edges": every edge will be considered once, each yielding a seed
#' consisting of the two endpoints of the edge. \cr
#' \item "cliques": every maximal clique of the graph will be considered
#' once as a seed. \cr
#' \item "file"(*filename*): seeds will be generated from the given file.
#' Each line of the file must contain a space-separated list of node
#' IDs that will be part of the seed (and of course each line encodes
#' a single seed). If a line contains a single * character only, this
#' means that besides the seeds given in the file, every node that is not
#' part of any of the seeds will also be considered as a potential seed
#' on its own. \cr
#' \item "'single(*node1*,*node2*,...)'": a single seed will be used with the given
#' nodes as members. Node names must be separated by commas or spaces. \cr
#' \item "stdin": seeds will be given on the standard input, one by line. Each
#' line must contain a space-separated list of node IDs that will be
#' part of the seed. It may be useful to use this method in conjunction
#' with --no-merge if you don't want the result of earlier seedings to
#' influence the result of later ones. \cr
#' }
#' @details The following input file formats are recognised: \cr
#' \itemize{
#' \item *Cytoscape SIF files* \cr
#' When the extension of the input file is .sif, ClusterONE will
#' automatically try to parse the file according to the SIF format of
#' Cytoscape. Each line of the file must be according to the following
#' format: \cr
#' id1 type id2 \cr
#' where id1 and id2 are the IDs of the two interacting proteins and
#' type is the interaction type (which will silently be ignored by
#' ClusterONE). Each edge will have unit weight. The columns of the
#' input file may be separated by spaces or tabs; however, make sure
#' that you do not mix these separator characters. \cr
#' \item *Weighted edge lists* \cr
#' This is the default file format assumed by ClusterONE unless the
#' file extension suggests otherwise. Each line of the file has the
#' following format: \cr
#' id1 id2 weight \cr
#' where id1 and id2 are the IDs of the interaction proteins and weight
#' is the associated confidence value between 0 and 1. If the weight is
#' omitted, it is considered to be equal to 1. Lines starting with hash
#' marks (#) or percentage signs (\%) are considered as comments and they
#' are silently ignored. \cr \cr
#' If ClusterONE fails to recognise the input format of your file, feel
#' free to specify it using the "inputFormat" option.
#' }
#' The following output file formats are available:
#' \itemize{
#' \item *Plain text output (plain)* \cr
#' A simple and easy-to-parse output format, where each line represents a
#' cluster. Members of the clusters are separated by Tab characters.
#' \item *CSV output (csv)* \cr
#' This format is suitable is you need more details about each cluster
#' and/or you want to import the clusters to Microsoft Excel or OpenOffice.
#' Each line corresponds to a cluster and contain the size, density, total
#' internal and boundary weight, the value of the quality function, a P-value
#' and the list of members for each cluster. Columns are separated by commas,
#' and each individual column may optionally be quoted within quotation marks
#' if necessary.
#' \item *GenePro output (genepro)*
#' Use this format if you want to visualize the clusters later on using the
#' [GenePro](http://wodaklab.org/genepro) plugin of Cytoscape.
#' }
#' @return A matrix of complex, where each row represents the proteins in
#' a single complex.
#' @export
#' @examples {
#' \dontrun{
#' # Run on an example network edges in the package
#' file = paste0(system.file('extdata', package = 'ClusterOneR'),
#' "/Weighted_edge_lists.tsv")
#' head(file)
#' y = clusterOneR(file)
#' View(y)
#'
#' # Run on your own file "/my/path/myEdgeFile.tsv", which is a
#' "weighted edge lists" file type.
#' file = "/my/path/myEdgeFile.tsv"
#' y = clusterOneR(file, inputFormat = "edge_list")
#' View(y)
#'
#' # Run on a SIF file (Standard Interaction Format)
#' file = "/my/path/myEdgeFile.tsv"
#' y = clusterOneR(file, inputFormat = "edge_list")
#' View(y)
#' }
#' }
clusterOneR = function(inputFile = paste0(system.file('extdata', package = 'ClusterOneR'),
"/Weighted_edge_lists.tsv"),
inputFormat = c("edge_list", "sif"),
outputFormat = c("plain", "csv", "genepro"),
minDensity = "auto",
minSize = 3, fluff = NULL, haircut = NULL,
maxOverlap = 0.8,
mergeMethod = c("single", "multi"),
similarity = "match", noFluff = TRUE, noMerge = FALSE,
penalty = 2, seedMethod = NULL){
inputFormat = match.arg(inputFormat)
stopifnot(inputFormat %in% c("sif", "edge_list"))
outputFormat = match.arg(outputFormat)
stopifnot(outputFormat %in% c("plain", "csv", "genepro"))
if (is.character(minDensity)){
stopifnot(minDensity == "auto")
} else{
stopifnot(is.numeric(minDensity))
}
mergeMethod = match.arg(mergeMethod)
similarity = match.arg(similarity)
if (noFluff){
noFluff = ""
} else {
noFluff = NULL
}
if (noMerge){
noMerge = ""
} else {
noMerge = NULL
}
args = as.list(environment())
endCMD = unlist(lapply(setNames(names(args[-1]), names(args[-1])), function(x){
params = c(inputFormat = "f", outputFormat = "F",
minDensity = "d", minSize = "s", fluff = "-fluff",
haircut = "-haircut", maxOverlap = "-max-overlap",
mergeMethod = "-merge-method",
similarity = "-similarity", noFluff = "-no-fluff",
noMerge = "-no-merge", penalty = "-penalty",
seedMethod = "-seed-method")
params = setNames(paste0("-", params), names(params))
if(!is.null(args[[x]])){
y = paste(params[x], args[[x]])
} else{ y = NULL }
return(y)}))
endCMD = paste0(endCMD[endCMD != ""], collapse = " ")
jarFile = paste0(system.file('extdata', package = 'ClusterOneR'), "/cluster_one.jar")
preCMD = paste("java -jar", jarFile)
CMD = paste(preCMD, endCMD, inputFile)
resJar = system(CMD, intern = TRUE)
resMat = strSplit(resJar, split = "\t")
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.