Description Usage Arguments Details Value Note Author(s) References See Also Examples
Measure input importance (including sensitivity analysis) given a supervised data mining model.
1 2 3 4 5 |
M |
fitted model, typically is the object returned by |
data |
training data (the same data.frame that was used to fit the model, currently only used to add data histogram to VEC curve). |
RealL |
the number of sensitivity analysis levels (e.g. 7). Note: you need to use |
method |
input importance method. Options are:
|
measure |
sensitivity analysis measure (used to measure input importance). Options are:
|
sampling |
for numeric inputs, the sampling scan function. Options are:
|
baseline |
baseline vector used during the sensitivity analysis. Options are:
|
responses |
if |
outindex |
the output index (column) of |
task |
the |
PRED |
the prediction function of |
interactions |
numeric vector with the attributes (columns) used by Ith-D sensitivity analysis (2-D or higher, "GSA" method):
|
Aggregation |
numeric value that sets the number of multi-metric aggregation function (used only for "DSA", ""). Options are:
|
LRandom |
number of samples used by DSA and MSA methods. The default value is -1, which means: use a number equal to training set size. If a different value is used (1<= value <= number of training samples), then LRandom samples are randomly selected. |
MRandom |
sampling type used by MSA: "discrete" (default discrete uniform distribution) or "continuous" (from continuous uniform distribution). |
Lfactor |
sets the maximum number of sensitivity levels for discrete inputs. if FALSE then a maximum of up to RealL levels are used (most frequent ones), else (TRUE) then all levels of the input are used in the SA analysis. |
This function provides several algorithms for measuring input importance of supervised data mining models and the average effect of a given input (or pair of inputs) in the model. A particular emphasis is given on sensitivity analysis (SA), which is a simple method that measures the effects on the output of a given model when the inputs are varied through their range of values. Check the references for more details.
A list
with the components:
$value – numeric vector with the computed sensitivity analysis measure for each attribute.
$imp – numeric vector with the relative importance for each attribute (only makes sense for 1-D analysis).
$sresponses – vector list as described in the Value documentation of mining
.
$data – if DSA or MSA, store the used data samples, needed for visualizations made by vecplot.
$method – SA method
$measure – SA measure
$agg – Aggregation value
$nclasses – if task="prob" or "class", the number of output classes, else nclasses=1
$inputs – indexes of the input attributes
$Llevels – sensitivity levels used for each attribute (NA means output attribute)
$interactions – which attributes were interacted when method=GSA.
See also http://www3.dsi.uminho.pt/pcortez/rminer.html
Paulo Cortez http://www3.dsi.uminho.pt/pcortez/
To cite the Importance function, sensitivity analysis methods or synthetic datasets, please use:
P. Cortez and M.J. Embrechts.
Using Sensitivity Analysis and Visualization Techniques to Open Black Box Data Mining Models.
In Information Sciences, Elsevier, 225:1-17, March 2013.
http://dx.doi.org/10.1016/j.ins.2012.10.039
vecplot
, fit
, mining
, mgraph
, mmetric
, savemining
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
### dontrun is used when the execution of the example requires some computational effort.
### 1st example, regression, 1-D sensitivity analysis
## Not run:
data(sa_ssin) # x1 should account for 55
M=fit(y~.,sa_ssin,model="ksvm")
I=Importance(M,sa_ssin,method="1D-SA") # 1-D SA, AAD
print(round(I$imp,digits=2))
L=list(runs=1,sen=t(I$imp),sresponses=I$sresponses)
mgraph(L,graph="IMP",leg=names(sa_ssin),col="gray",Grid=10)
mgraph(L,graph="VEC",xval=1,Grid=10,data=sa_ssin,
main="VEC curve for x1 influence on y") # or:
vecplot(I,xval=1,Grid=10,data=sa_ssin,datacol="gray",
main="VEC curve for x1 influence on y") # same graph
vecplot(I,xval=c(1,2,3),pch=c(1,2,3),Grid=10,
leg=list(pos="bottomright",leg=c("x1","x2","x3"))) # all x1, x2 and x3 VEC curves
## End(Not run)
### 2nd example, regression, DSA sensitivity analysis:
## Not run:
I2=Importance(M,sa_ssin,method="DSA")
print(I2)
# influence of x1 and x2 over y
vecplot(I2,graph="VEC",xval=1) # VEC curve
vecplot(I2,graph="VECB",xval=1) # VEC curve with boxplots
vecplot(I2,graph="VEC3",xval=c(1,2)) # VEC surface
vecplot(I2,graph="VECC",xval=c(1,2)) # VEC contour
## End(Not run)
### 3th example, classification (pure class labels, task="cla"), DSA:
## Not run:
data(sa_int2_3c) # pair (x1,x2) is more relevant than x3, all x1,x2,x3 affect y,
# x4 has a null effect.
M2=fit(y~.,sa_int2_3c,model="mlpe",task="class")
I4=Importance(M2,sa_int2_3c,method="DSA")
# VEC curve (should present a kind of "saw" shape curve) for class B (TC=2):
vecplot(I4,graph="VEC",xval=2,cex=1.2,TC=2,
main="VEC curve for x2 influence on y (class B)",xlab="x2")
# same VEC curve but with boxplots:
vecplot(I4,graph="VECB",xval=2,cex=1.2,TC=2,
main="VEC curve with box plots for x2 influence on y (class B)",xlab="x2")
## End(Not run)
### 4th example, regression, DSA:
## Not run:
data(sa_psin)
# same model from Table 1 of the reference:
M3=fit(y~.,sa_psin,model="ksvm",search=2^-2,C=2^6.87,epsilon=2^-8)
# in this case: Aggregation is the same as NY
I5=Importance(M3,sa_psin,method="DSA",Aggregation=3)
# 2D analysis (check reference for more details), RealL=L=7:
# need to aggregate results into a matrix of SA measure
cm=agg_matrix_imp(I5)
print("show Table 8 DSA results (from the reference):")
print(round(cm$m1,digits=2))
print(round(cm$m2,digits=2))
# show most relevant (darker) input pairs, in this case (x1,x2) > (x1,x3) > (x2,x3)
# to build a nice plot, a fixed threshold=c(0.05,0.05) is used. note that
# in the paper and for real data, we use threshold=0.1,
# which means threshold=rep(max(cm$m1,cm$m2)*threshold,2)
fcm=cmatrixplot(cm,threshold=c(0.05,0.05))
# 2D analysis using pair AT=c(x1,x2') (check reference for more details), RealL=7:
# nice 3D VEC surface plot:
vecplot(I5,xval=c(1,2),graph="VEC3",xlab="x1",ylab="x2",zoom=1.1,
main="VEC surface of (x1,x2') influence on y")
# same influence but know shown using VEC contour:
par(mar=c(4.0,4.0,1.0,0.3)) # change the graph window space size
vecplot(I5,xval=c(1,2),graph="VECC",xlab="x1",ylab="x2",
main="VEC surface of (x1,x2') influence on y")
# slower GSA:
I6=Importance(M3,sa_psin,method="GSA",interactions=1:4)
cm2=agg_matrix_imp(I6)
# compare cm2 with cm1, almost identical:
print(round(cm2$m1,digits=2))
print(round(cm2$m2,digits=2))
fcm2=cmatrixplot(cm2,threshold=0.1)
## End(Not run)
### If you want to use Importance over your own model (different than rminer ones):
# 1st example, regression, uses the theoretical sin1reg function: x1=70% and x2=30%
data(sin1reg)
mypred=function(M,data)
{ return (M[1]*sin(pi*data[,1]/M[3])+M[2]*sin(pi*data[,2]/M[3])) }
M=c(0.7,0.3,2000)
# 4 is the column index of y
I=Importance(M,sin1reg,method="sens",measure="AAD",PRED=mypred,outindex=4)
print(I$imp) # x1=72.3% and x2=27.7%
L=list(runs=1,sen=t(I$imp),sresponses=I$sresponses)
mgraph(L,graph="IMP",leg=names(sin1reg),col="gray",Grid=10)
mgraph(L,graph="VEC",xval=1,Grid=10) # equal to:
par(mar=c(2.0,2.0,1.0,0.3)) # change the graph window space size
vecplot(I,graph="VEC",xval=1,Grid=10,main="VEC curve for x1 influence on y:")
### 2nd example, 3-class classification for iris and lda model:
## Not run:
data(iris)
library(MASS)
predlda=function(M,data) # the PRED function
{ return (predict(M,data)$posterior) }
LDA=lda(Species ~ .,iris, prior = c(1,1,1)/3)
# 4 is the column index of Species
I=Importance(LDA,iris,method="1D-SA",PRED=predlda,outindex=4)
vecplot(I,graph="VEC",xval=1,Grid=10,TC=1,
main="1-D VEC for Sepal.Lenght (x-axis) influence in setosa (prob.)")
## End(Not run)
### 3rd example, binary classification for setosa iris and lda model:
## Not run:
data(iris)
library(MASS)
iris2=iris;iris2$Species=factor(iris$Species=="setosa")
predlda2=function(M,data) # the PRED function
{ return (predict(M,data)$class) }
LDA2=lda(Species ~ .,iris2)
I=Importance(LDA2,iris2,method="1D-SA",PRED=predlda2,outindex=4)
vecplot(I,graph="VEC",xval=1,
main="1-D VEC for Sepal.Lenght (x-axis) influence in setosa (class)",Grid=10)
## End(Not run)
### Example with discrete inputs
## Not run:
data(iris)
ir1=iris
ir1[,1]=cut(ir1[,1],breaks=4)
ir1[,2]=cut(ir1[,2],breaks=4)
M=fit(Species~.,ir1,model="mlpe")
I=Importance(M,ir1,method="DSA")
# discrete example:
vecplot(I,graph="VEC",xval=1,TC=1,main="class: setosa (discrete x1)",data=ir1)
# continuous example:
vecplot(I,graph="VEC",xval=3,TC=1,main="class: setosa (cont. x1)",data=ir1)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.