VisuNet works with any rule-based classifier in supported data frame formats.
Input data should be in a data frame format that contains the following columns:
features
- the left-hand side of the rule corresponding to comma-separated attributes and their values, of type, type 'factor' decision
- the right-hand side of the rule corresponding to the decision value, of type 'factor'accuracyRHS
- the rule accuracy, of type 'numeric' supportRHS
- the rule support, of type 'numeric'decisionCoverage
or coverageRHS
- the fraction of correcly classfied objects, of type 'numeric'library(VisuNet) library(knitr) library("kableExtra") library(autoimage) kable(autcon_ruleset[1:5,], format="html", caption = "The sample 'line by line' data frame") %>% kable_styling(bootstrap_options = "striped", full_width = F, position = "left")
You can use the 'line by line' format with the option:
type = "L"
rules <- autcon_ruleset vis_out <- visunet(rules, type = "L")
The rules data frame that is the output of R.ROSETTA can be directly imported in VisuNet. See ?rosetta from the R.ROSETTA package for details.
The R.ROSETTA output format can be used with the option:
type = "RDF"
.
#the rule-based model construction using R.ROSETTA resultsRos <- rosetta(autcon) vis_out <- visunet(resultsRos$main, type = "RDF")
VisuNet is an R package implemented as Shiny Gadgets.
require(VisuNet) #Sample rule set for a classifier of young males with autism and control #'Line by line' data type autcon_ruleset #Run VisuNet #Remember to click DONE once you finish working on VisuNet vis_out <- visunet(autcon_ruleset, type = "L")
The available visunet
parameters are:
ruleSet
- the appropriately formatted set of rules: Input formatstype
- a character string specifying the type of the input data:"RDF"
- the R.ROSETTA output (see R.Rosetta format)"L"
- the "Line by line" file format (see 'line by line' format) The default is "RDF"
NodeColorType
- a character string specifying the color of nodes:
"DL"
- feature discretization levels, option is available for data discretized into three levels: 1 - r icon::fa_circle(color = "#4CABE6", size = 2)
, 2 - r icon::fa_circle( color = "#8E8E8E", size = 2)
and 3 - r icon::fa_circle(color = "#E29402", size = 2)
. In the case of gene expression, data discretization levels correspond to: 1 - under-expressed gene, 2 - no change gene expression and 3 - over-expressed gene."A"
- color of nodes defined by the mean accuracy value for the node.NodeSize
- a character string parametr for scaling the node size:
"DC"
- the mean decision coverage for the node"S"
- the mean support for the nodePlease note that the node scaling is perfomed per decision. See ?visNodes for the node scaling details.
CustObjectNodes
and CustObjectEdges
parameters are optional and can be used when rule network customization is needed.
CustObjectNodes
- a list that contains the customized VisuNet output for nodes. The list needs to contain two variables:nodes
- a customized VisuNet output for nodesCustCol
- the names of variables added/changed in the VisuNet output for nodes.See Node customization for details.
CustObjectEdges
- a list that contains customized VisuNet output for edges. The list needs to contain two variables:edges
- a customized VisuNet output for edgesCustCol
- the names of variables added/changed in the VisuNet output for edges. See Edge customization for details.
VisuNet displays the rule network construct for the 10% of rules with the highest connection score. When only one decision variable is visible in the top 10% of rules, we extend the threshold to obtain rules for all decisions. The initial values of accuracy and support are defined for this set of rules.
The rule networks filtration panel:
min Accuracy
- the minimum accuracy value for the set of rules that create the rule networkMin Decision Coverage
) or support (Min Support
) value for the set of rules that create the rule networkShow top n nodes
- enable to show the exact number of nodes according the connection value from the current rule network. Set "0" to switch off the parameterColor of nodes
- the node color schema. See NodeColorType
in the Run Visunet section for details.The VisuNet output is a collection of lists corresponding to decision variables and an additional list for the combined decision "all".
The lists contain information required to reproduce the rule network, i.e. data frames for nodes
, edges
and RulesSetPerNode
- a list that shows rules for each node. Data frames for nodes
and edges
incorporate essential variables from the visNetwork package and variables that describe the quality of each node/edge obtained from the rules.
Structure of the data frame for nodes:
id
- a unique node id, based on attribute value and left-hand side value of the rule setlabel
- the attribute variable without the "=value" part from the left-hand side of the rule setDiscState
- the attribute valuecolor.background
- the node color (see node color types in section Run Visunet for details)value
- the node sizecolor.border
- the color of the node bordermeanAcc
- the mean accuracy value of all rules that contain the nodemeanSupp
- the mean support value of all rules that contain the nodeNRules
- the number of rules that contain the nodePrecRules
- fraction of rules that contain the nodeNodeConnection
- the total connection value obtained from the rules that contain the nodetitle
- information visible on the tooltipgroup
- the decision value that occurs most frequently (>50%) in rules associated with the node; otherwise group contains all comma-separated decision values corresponding to rules associated with the node. group defines the content of the "Select by decision" drop-down box.Structure of the data frame for edges:
from
, to
- the pair of nodes that create the edge conn
- the connection variable obtained from the edge-associated rules.connNorm
- the connection variable normalized according to the maximum connection variable in the rule networklabel2
- the edge idcolor
- the edge color title
- information visible on the tooltipwidth
- the edge width, defined according to the normalized connection valueRule networks are constructed using the visNetwork package that enables adding and modifying node- and edge- properties. We can add other variables that are implemented in visNetwork. See ?visNodes and ?visEdges for a full list of available options.
We identified 11 genes previously reported in databases of autism associations: SFARI, AutDB and ASD. In this example we would like to mark those genes as stars.
#genes reported in databases of autism associations aut_genes <- c("TSPOAP1", "COX2","NCS1","RHPN1","FLRT2", "BAHD1","NCKAP5L","PPOX", "NGR2", "ATXN8OS","DEPDC1") #create a new variable that contains node information for the "all" decision nodes_RNO <- vis_out$all$nodes #create a new vector of variables: shape. "dot" is the default shape of nodes nodes_RNO$shape <- rep("dot", length(nodes_RNO$label)) #mark selected genes as stars using the label attribute nodes_RNO$shape[which(as.character(nodes_RNO$label) %in% aut_genes)] <- "star" #create the node object list nodesL <- list(nodes = nodes_RNO,CustCol = c("shape")) #rerun VisuNet with the new shape for nodes vis_out2 <- visunet(autcon_ruleset, type = "L", CustObjectNodes = nodesL)
To rerun VisuNet with the customized object for nodes, you need to provide the original rule set and a list CustObjectNodes
that contains the customized VisuNet object for nodes.
CustObjectNodes
includes the customized object for nodes: nodes
and a vector of column names that were changed/added to the object: CustCol
.
vis_out <- readRDS("data/visunet_out_nodes.RDS") visNetwork(nodes = vis_out$nodes, edges = vis_out$edges, width = "100%")%>% visLayout(randomSeed = 123) %>% visInteraction(hover = TRUE, navigationButtons = TRUE) %>% visOptions(selectedBy = list(variable = "group" , multiple = TRUE, main = "Select by decision", style = 'width: 200px; height: 30px; padding-left: 80px; font-size: 15px; color: black; border:none; outline:none;'))
Let’s assume that COX2 controls MAP7 and we would like to show the directionality of this edge in the rule network:
#mark the interaction between COX2 and MAP7 genes edges_RNO <- vis_out$all$edges #create a new vector of variables: arrows. "enabled" is the default variable for edges edges_RNO$arrows <- rep("enabled", length(edges_RNO$label2)) #add direction to the selected edge using the label2 attribute edges_RNO$arrows[which(edges_RNO$label2 == "COX2=3-MAP7=2")] <- "to" #create the edge object list edgesL <- list(edges = edges_RNO,CustCol = c("arrows")) #rerun VisuNet with a new variable for edges vis_out3 <- visunet(autcon_ruleset, type = "L", CustObjectNodes = nodesL, CustObjectEdges = edgesL)
We can rerun VisuNet using customized objects for edges by providing the original rule set and a list CustObjectEdges
that contains the customized VisuNet object for edges.
CustObjectEdges
includes the customized object for edges: edges
and a vector of column names that were changed/added to the object: CustCol
.
We can rerun VisuNet using both customized objects: CustObjectEdges
and CustObjectNodes
.
vis_out <- readRDS("data/visunet_out_edges.RDS") visNetwork(nodes = vis_out$nodes, edges = vis_out$edges, width = '100%')%>% visLayout(randomSeed = 123) %>% visInteraction(hover = TRUE, navigationButtons = TRUE) %>% visOptions(selectedBy = list(variable = "group", multiple = TRUE, main = "Select by decision", style = 'width: 200px; height: 30px; padding-left: 80px; font-size: 15px; color: black; border:none; outline:none;'))
VisuNet can be used with any rule-based model as long as the input data is created in the VisuNet format. To show the universality of VisuNet we created an association rule-based model for the case-control study of autism based on an arulesCBA package.
library("arulesCBA") #create a training set by randomly selecting 48 objects from each decsion class autcon_training <- autcon %>% group_by(decision) %>% sample_n(48, set.seed=1, replace = FALSE) %>% as.data.frame() #create a test set id_training <- rbind(autcon_training, autcon) %>% duplicated() id_test <- which(id_training[97:length(id_training)] == FALSE) autcon_test <- autcon[id_test,] # run the CBA classifier classifier <- CBA(decision ~ ., data = autcon_training) #obtain rules from the model rules <- DATAFRAME(rules(classifier), setStart = '',setEnd = '') #format a left-hand side of rules ruleslhs <- rules$LHS #change discretization levels from intervals to 1,2,3 ruleslhs2 <- gsub( " *\\[-Inf.*?\\) *", "1", ruleslhs) ruleslhs2 <- gsub( " *\\[.*?\\) *", "2", ruleslhs2) ruleslhs2 <- gsub( " *\\[.*?\\Inf] *", "3", ruleslhs2) #select required columns from the rules rules2 <- rules %>% dplyr::select(LHS:confidence) #replace the original left-hand side of rules with the modified version rules2$LHS <- ruleslhs2 #rename columns name to fullfil the VisuNet input format colnames(rules2) <- c('features', 'decision', 'supportRHS', 'accuracyRHS') # format support to obtain intiger numbers rules2$supportRHS <- (rules2$supportRHS * 96) %>% round() #run VisuNet vis <- visunet(rules2, type = 'L')
vis_out <- readRDS("data/arules_vis.RDS") visNetwork(nodes = vis_out$all$nodes, edges = vis_out$all$edges, width = '100%')%>% visLayout(randomSeed = 123) %>% visInteraction(hover = TRUE, navigationButtons = TRUE) %>% visOptions(selectedBy = list(variable = "group", multiple = TRUE, main = "Select by decision", style = 'width: 200px; height: 30px; padding-left: 80px; font-size: 15px; color: black; border:none; outline:none;'))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.