plot.mcfs: Plots various MCFS result components

Description Usage Arguments Examples

View source: R/rmcfs.plot.R

Description

Plots various aspects of the MCFS-ID result.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
  ## S3 method for class 'mcfs'
plot(x, type = c("features", "ri", "id", "distances", "cv", "cmatrix", "heatmap"), 
        size = NA, 
        ri_permutations = c("max", "all", "sorted", "none"),
        diff_bars = TRUE,
        features_margin = 10,
        cv_measure = c("wacc", "acc", "pearson", "MAE", "RMSE", "SMAPE"),
        heatmap_norm = c('none', 'norm', 'scale'),
        heatmap_fun = c('median', 'mean'),
        color = c('darkred'),
        gg = TRUE,
        cex = 1, ...)

Arguments

x

'mcfs' S3 object - result of the MCFS-ID experiment returned by mcfs function.

type
  • features plots top features set along with their RI. It is a horizontal barplot that shows important features in red color and unimportant in grey.

  • ri plots top features set with their RIs as well as max RI obtained from permutation experiments. Red color denotes important features.

  • id plots top ID values obtained from the MCFS-ID.

  • distances plots distances (convergence diagnostics of the algorithm) between subsequent feature rankings obtained during the MCFS-ID experiment.

  • cv plots cross validation results based on top features.

  • cmatrix plots the confusion matrix obtained on all s \cdot t trees.

  • heatmap plots heatmap results based on top features. Only numeric features can be presented on the heatmap.

size

number of features to plot.

ri_permutations

if type = "ri" and ri_permutations = "max", then it additionally shows horizontal lines that correspond to max RI values obtained from each single permutation experiment.

diff_bars

if type = "ri" or type = "id" and diff_bars = T, then it shows difference values for RI or ID values.

features_margin

if type = "features", then it determines the size of the left margin of the plot.

cv_measure

if type = "cv", then it determines the type of accuracy shown in the plot: weighted or unweighted accuracy ("wacc" or "acc"). If target attribute is numeric it is possible to review one of the following prediction quality measures: ("pearson", "MAE", "RMSE", "SMAPE")

heatmap_norm

if type = "heatmap", then it defines type of input data normalization 'none' - without any normalization, 'norm' - normalization within range [-1,1], 'scale' - standardization/centering by mean and stdev.

heatmap_fun

if type = "heatmap", then it determines calculation 'mean' or 'median' within the class to be shown as heatmap color intensity.

color

it defines main color of the following type of plots: 'ri', 'id', 'heatmap', 'features' and 'cmatrix'.

gg

if gg = TRUE use ggplot2.

cex

size of fonts.

...

additional plotting parameters.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
  ## Not run: ###dontrunbegin

  # Create input data.
  adata <- artificial.data(rnd_features = 10)
  showme(adata)
  
  # Parametrize and run MCFS-ID procedure.
  result <- mcfs(class~., adata, cutoffPermutations = 0, featureFreq = 10,
                  finalCV = FALSE, finalRuleset = TRUE, threadsNumber = 2)

  # Plot & print out distances between subsequent projections. 
  # These are convergence MCFS-ID statistics.
  plot(result, type = "distances")
  print(result$distances)
  
  # Plot & print out 50 most important features and show max RI values from 
  # permutation experiment.
  plot(result, type = "ri", size = 50)
  print(head(result$RI, 50))
  
  # Plot & print out 50 strongest feature interdependencies.
  plot(result, type = "id", size = 50)
  print(head(result$ID, 50))
  
  # Plot features ordered by RI. Parameter 'size' is the number of 
  # top features in the chart. By default it is set on cutoff_value + 10
  plot(result, type = "features", cex = 1)

  # Here we set 'size' at fixed value 10.
  plot(result, type = "features", size = 10)
  
  # Plot cv classification result obtained on top features.
  # In the middle of x axis red label denotes cutoff_value.
  # plot(result, type = "cv", measure = "wacc", cex = 0.8)
  
  # Plot & print out confusion matrix. This matrix is the result of 
  # all classifications performed by all decision trees on all s*t datasets.
  plot(result, type = "cmatrix")
  
  
## End(Not run)###dontrunend

Example output

Loading required package: rJava

  ########################
  ##   rmcfs   1.2.15   ##
  ########################
  If used please cite the following paper: 
  M. Draminski, J. Koronacki (2018), 
  rmcfs: An R Package for Monte Carlo Feature Selection and Interdependency Discovery,
  Journal of Statistical Software, vol 85(12), 1-28, doi:10.18637/jss.v085.i12.
           X1         X2         X3         X4         X5        X6        X7
1  0.20274639 0.11598857 0.61478547 0.42402600 0.14691274 0.3236284 0.8001307
2  0.73928474 0.57860234 0.61486531 0.38133400 0.33880215 0.1292732 0.2453541
3  0.63103498 0.78679008 0.82438874 0.95815380 0.29065329 0.8079233 0.4660874
4  0.49468632 0.01079839 0.84465378 0.68775093 0.71638810 0.1435464 0.8322140
5  0.21478014 0.76021624 0.76233393 0.82596737 0.90614322 0.2186508 0.3729568
6  0.01792957 0.33310221 0.45889654 0.35253950 0.09473219 0.5895532 0.3774475
7  0.70883512 0.33131018 0.05235094 0.95805453 0.49311441 0.5956836 0.9795866
8  0.58961368 0.90688919 0.34141376 0.93895753 0.23073050 0.3482507 0.1696028
9  0.03917050 0.70926578 0.18136785 0.11804200 0.61835473 0.4444953 0.1753490
10 0.42077821 0.80953743 0.98001714 0.05287438 0.44337865 0.6824827 0.8374790
          X8         X9        X10
1  0.9568731 0.10960519 0.96426010
2  0.6124745 0.91389510 0.98430731
3  0.4601556 0.59884861 0.78023657
4  0.1745020 0.01131831 0.78055051
5  0.6778638 0.47877982 0.14869384
6  0.1215115 0.90418703 0.68087372
7  0.2969220 0.27685542 0.44335726
8  0.9159207 0.14095074 0.26334868
9  0.6974066 0.80049172 0.09963931
10 0.4777905 0.11793562 0.72466045


           X7          X8        X9        X10 A1 A2 B1 B2 C1 C2 class
60 0.67379483 0.224148397 0.8012928 0.24526872  0  0  B  B  0  0     B
61 0.09043089 0.841599274 0.1409634 0.43970449  0  0  0  0  C  C     C
62 0.29398599 0.815386671 0.5796068 0.56613591  0  0  0  0  C  C     C
63 0.47855031 0.002660131 0.7085816 0.99932247  0  0  0  0  C  C     C
64 0.42547938 0.827036177 0.9966765 0.48032494  0  0  0  0  0  0     C
65 0.80997500 0.304217760 0.2671738 0.76005871  0  0  0  0  0  0     C
66 0.59737957 0.172429372 0.5357804 0.71559289  0  0  0  0  C  C     C
67 0.34165553 0.532795396 0.5213760 0.65944794  0  0  0  0  C  C     C
68 0.33499330 0.716705573 0.9691289 0.22773926  0  0  0  0  0  0     C
69 0.11139761 0.466204482 0.2114007 0.06602295  0  0  0  0  C  C     C
70 0.74393240 0.841848927 0.7438138 0.61254859  0  0  0  0  0  0     C
class: 'data.frame' size: 70 x 17Checking input data...
Exporting params...
Exporting input data...
Running MCFS-ID...
*****************************************
********          dmLab           *******
***     ver. 2.2.1     2016.10.27     ***
*****************************************
Created by Michal Draminski [mdramins@ipipan.waw.pl]
http://www.ipipan.eu/staff/m.draminski/
Polish Academy of Sciences - Institute of Computer Science
**************************************************************************
'MCFS-ID' and 'ADX' are developed by Michal Draminski
'rmcfs' developed by Michal Draminski & Julian Zubek
'SLIQ' developed by Mariusz Gromada
**************************************************************************
If you want to use dmLab or 'MCFS-ID' in your work, please cite the paper:
M.Draminski, A.Rada-Iglesias, S.Enroth, C.Wadelius, J. Koronacki, J.Komorowski
'Monte Carlo feature selection for supervised classification', BIOINFORMATICS 24(1): 110-117 (2008)
**************************************************************************

Warning! Value of cutoffPermutations = 0 and cutoffMethod = 'permutations'. Using cutoffMethod = 'mean'.

**************************
*** MCFS-ID Experiment ***
**************************
Loading data: 'input.adx'...
attributes: 17 events: 70
Data loaded.
Nominal target detected - using J48 model.
MCFS-ID param: ID-Graph is ON
MCFS-ID param: finalRuleset is ON
MCFS-ID param:  balance classes is AUTO
Classes = [B, C, A], Sizes = [20, 10, 40], classSizeRatio = 0.25, balanceValue = 1.0
Starting MCFS-ID Procedure: projectionSize(m) = 4, projections(s) = 40, splits(t) = 5
Start time: Sun Dec 15 19:25:31 UTC 2019
Running: 2 threads.
50% 75% 100% 
All 2 threads are finished.
200 trees built in 1.0 s.
Confusion Matrix
			predicted
class	B	C	A	other
B	1119.0	101.0	250.0	0.0
C	171.0	392.0	277.0	0.0
A	110.0	52.0	2778.0	0.0
other	0.0	0.0	0.0	0.0


Accuracy = 0.8169
WeightedAccuracy = 0.7242
True Positive Rate
	B: 0.7612
	C: 0.4666
	A: 0.9448
False Positive Rate
	B: 0.0743
	C: 0.0346
	A: 0.2281

Minimal (based on linear regression angle) RI = 0.0465030
Minimal (based on k-means clustering) RI = 0.4039356
Minimal important (mean based on cutoff methods) RI = 0.0465030
Size of important (mean based on cutoff methods) attributes set = 6


*** Building RIPPER ruleset on top 6 attributes ***
JRIP rules:
===========

(A1 = 0) and (B1 = 0) => class=C (12.0/2.0)
(A1 = 0) => class=B (18.0/0.0)
 => class=A (40.0/0.0)

Number of Rules : 3

RIPPER CV Result (10 folds repeated 3 times)
Confusion Matrix
			predicted
class	B	C	A
B	54.0	6.0	0.0
C	6.0	24.0	0.0
A	0.0	0.0	120.0

Accuracy = 0.9428
WeightedAccuracy = 0.9
True Positive Rate
	B: 0.9
	C: 0.8
	A: 1.0
False Positive Rate
	B: 0.04
	C: 0.0333
	A: 0.0

*** Saving filtered data ***
*** Calculations for input data: 'input' are finished! Processing time: 1.6 s. ***
Reading results...
Done.
  projection distance commonPart mAvg beta1
1         30    0.625          1    0     0
2         40    0.500          1    0     0
   position attribute projections classifiers nodes     RI_norm
11        1        A1          13          56    56 0.750915770
12        2        A2          12          54    54 0.746437400
14        3        B2          13          65    65 0.493405730
13        4        B1           9          45    45 0.403935670
15        5        C1          10          37    37 0.230295510
16        6        C2           6          18    18 0.208132670
2         7        X2          14          39    67 0.046503060
3         8        X3          12          20    40 0.016226858
6         9        X6          10          19    29 0.015134501
5        10        X5           9          13    25 0.013276609
10       11       X10          11          10    13 0.010786221
9        12        X9          11          15    22 0.009913590
1        13        X1          11          15    17 0.008722575
4        14        X4          10          12    17 0.005713617
7        15        X7           6           5     6 0.003090629
8        16        X8          11           7     9 0.002973641
   position edge_a edge_b    weight
1         1     A1     B1 5.8667207
2         2     B2     C1 4.4847684
3         3     A2     B2 4.2930510
4         4     B1     C1 3.2717352
5         5     B1     C2 3.0316665
6         6     A2     B1 2.8349319
7         7     A1     C1 2.3182650
8         8     X4     X9 1.7246404
9         9     X6     X1 1.5309514
10       10     A1     X2 1.4965894
11       11     X6     X3 1.4152018
12       12     X2     X3 1.3423343
13       13     A1     B2 1.3263541
14       14     A2     X2 1.2538404
15       15     A2     C1 1.2396010
16       16     X3     X9 1.2111881
17       17     C1     X2 1.1298822
18       18     X2     X1 1.1089367
19       19     X2     X9 1.0154294
20       20     X5     X2 0.9870153
21       21     X6     X2 0.9807923
22       22     X8     X3 0.9502878
23       23     A2     C2 0.8949746
24       24     B2    X10 0.8338920
25       25     X2     X6 0.8311915
26       26     X3     X8 0.7551038
27       27     C1     B1 0.7397414
28       28     B2     A2 0.7111111
29       29     X5     X3 0.6887644
30       30     C2     B1 0.6780198
31       31     X2    X10 0.6702008
32       32     X3    X10 0.5207051
33       33     C1     X9 0.5171953
34       34     X4     X2 0.5090396
35       35     X1     X2 0.5000000
36       36     B2     X3 0.4437181
37       37     X2     X4 0.4320035
38       38     X6     X5 0.4110417
39       39     X4     X3 0.4109206
40       40     X3     X6 0.4108423
41       41     A2     X5 0.4100743
42       42     X3     X5 0.4061762
43       43     X1     X5 0.4002047
44       44     X7     X4 0.3833016
45       45     X3     X4 0.3677971
46       46     X4     X7 0.3566939
47       47     X7     X9 0.3509262
48       48     A1    X10 0.3494854
49       49     X5     X6 0.3453436
50       50     X5     X4 0.3387522

rmcfs documentation built on Sept. 18, 2021, 5:07 p.m.

Related to plot.mcfs in rmcfs...