rf_sankey: Sankey Diagram of paths through a random forest

Description Usage Arguments See Also Examples

Description

To produce a Sankey diagram of the paths through a random forest rf_sankey accepts the output of rf_pathfinder and inturn outputs the d3network object necessary to produce a Sankey diagram with sankeyNetwork. Examples of how to produce a Sankey diagram are given in the Examples section. The Sankey diagram will display in a web browser and may be interacted with via the mouse. An explanation of how to interpret these diagrams is presented in our manuscript introducing these visualisations. A preprint of this manuscript is available from arXiv at https://arxiv.org/abs/1706.08702.

Usage

1
rf_sankey(all.paths.out, all.nodes = FALSE, plot.node.lim = 6)

Arguments

all.paths.out

the output of rf_pathfinder

all.nodes

If TRUE the network contains all nodes on all paths through the random forest. If FALSE the network contains the first plot.node.lim nodes of all paths through the network

plot.node.lim

this number specifies the maximum rank of nodes to plot. The plot will always start with the root nodes of trees on the far left and proceed to the right with nodes of successively larger ranks until nodes of this rank are reached. This argument only has an effect is all.nodes = FALSE.

See Also

rf_pathfinder, sankeyNetwork

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# example 1:
library(mlbench)
data(Satellite)
library(randomForest)
library(networkD3)
rf.1 <- randomForest(classes ~ ., data = Satellite, mtry = 8, keep.forest = TRUE, ntree = 25, importance = TRUE)
rf.1.all.paths <- rf_pathfinder(rf = rf.1)
nd3 <- rf_sankey(all.paths.out = rf.1.all.paths, all.nodes = FALSE, plot.node.lim = 6)
sankeyNetwork(Links = nd3$links, Nodes = nd3$nodes , Source = 'source', Target = 'target', Value = 'value', NodeID = 'name', units = 'Count', fontSize = 12, nodeWidth = 30, NodeGroup = NULL)

# example 2:
data(iris)
library(randomForest)
library(networkD3)
rf.2 <- randomForest(Species ~ ., data = iris, mtry = 2, keep.forest = TRUE, ntree = 100, importance = TRUE)
rf.2.all.paths <- rf_pathfinder(rf = rf.2)
nd3.2 <- rf_sankey(all.paths.out = rf.2.all.paths, all.nodes = TRUE)
sankeyNetwork(Links = nd3.2$links, Nodes = nd3.2$nodes , Source = 'source', Target = 'target', Value = 'value', NodeID = 'name', units = 'Count', fontSize = 12, nodeWidth = 30, NodeGroup = NULL)
# colour by covariate identity (only a good idea with less than 20 covariates)
nd3.2$links$nd3.2_type <- sub(' .*', '', nd3.2$nodes[nd3.2$links$source + 1, 'group'])
# while we only need five colours for this example see below for how to use predefined d3js categorical colour scales of various sizes:
# d3js: 10 categorical colour scale:
sankeyNetwork(Links = nd3.2$links, Nodes = nd3.2$nodes, Source = 'source', Target = 'target', Value = 'value', NodeID = 'name', LinkGroup = 'nd3.2_type', NodeGroup = 'group', colourScale = JS("d3.scaleOrdinal(d3.schemeCategory10);"))
# d3js: 20 categorical colour scale (a):
sankeyNetwork(Links = nd3.2$links, Nodes = nd3.2$nodes, Source = 'source', Target = 'target', Value = 'value', NodeID = 'name', LinkGroup = 'nd3.2_type', NodeGroup = 'group', colourScale = JS("d3.scaleOrdinal(d3.schemeCategory20);"))
# d3js: 20 categorical colour scale (b):
sankeyNetwork(Links = nd3.2$links, Nodes = nd3.2$nodes, Source = 'source', Target = 'target', Value = 'value', NodeID = 'name', LinkGroup = 'nd3.2_type', NodeGroup = 'group', colourScale = JS("d3.scaleOrdinal(d3.schemeCategory20b);"))
# d3js: 20 categorical colour scale (c):
sankeyNetwork(Links = nd3.2$links, Nodes = nd3.2$nodes, Source = 'source', Target = 'target', Value = 'value', NodeID = 'name', LinkGroup = 'nd3.2_type', NodeGroup = 'group', colourScale = JS("d3.scaleOrdinal(d3.schemeCategory20c);"))

brfitzpatrick/forestviews documentation built on May 14, 2019, 8:17 a.m.