Description Objects from the Class Slots See Also
A Set of Configuration Settings for the Subgroup and Pattern Mining Algorithms
Objects are created by calls of the form
SDTaskConfig(...)
.
attributes
:The list of attributes to consider for mining. Either a vector of attribute names, or NULL (the default), which includes all attributes.
discretize
:Boolean, indicating whether to (automatically)
discretize numeric attributes (default discretize=TRUE
. Depends on
parameter nbins. Either creates distinct values, if their number in the
dataset is <= nbins, or applies equal-frequency discretization for the
respective numeric attribute.
method
:A mining method; one of
Beam-Search beam
,
BSD bsd
,
SD-Map sdmap
,
SD-Map enabling internal disjunctions sdmap-dis
.
The default is method = "sdmap"
.
nbins
:Specifies the number of bins to be used when
discretizing numeric attributes (see discretize
above).
qf
:A quality function; one of:
Adjusted Residuals ares
,
Binomial Test bin
,
Chi-Square Test chi2
,
Gain gain
,
Lift lift
,
Piatetsky-Shapiro ps
,
Relative Gain relgain
,
Weighted Relative Accuracy wracc
.
The default is qf = "ps"
.
k
:The maximum number (top-k) of patterns
to discover, i.e., the best k rules according to the selected
quality function. The default is k = 20
minqual
:The minimal quality (default minqual = 0
).
minsize
:The minimal size of a subgroup (as an integer)
(minimal coverage of database records, default minsize = 0
).
mintp
:The minimal true positive (tp) threshold, an integer
(minimal (absolute) number of true positives in a subgroup, relevant for
binary target concepts only), defaults to mintp = 0
.
maxlen
:The maximal length of a description of
a pattern, i.e., the maximal number of conjunctions. This impacts both
understandability and efficiency. Simpler rules are easier to understand,
and a small maxlen
will restrict the search space (default maxlen = 7
).
nodefaults
:Ignore default values, i.e.,
do not include the respective first value (with index 0) of each
attribute (default nodefaults=FALSE
, i.e., include all values).
relfilter
:Controls, whether irrelevant
patterns are filtered during pattern mining; negatively
impacts performance (default relfilter = FALSE
)).
postfilter
:Controls, whether a post-processing
filter is applied; one (or a vector) of:
Minimum Improvement (Global) min-improve-global
,
checks the patterns against all possible generalizations,
Minimum Improvement (Pattern Set) min-improve-set
,
checks the patterns against all their generalizations
in the result set,
Relevancy Filter relevancy
, removes patterns that
are strictly irrelevant,
Significant Improvement (Global) sig-improve-global
,
removes patterns that do not significantly improve
(default 0.01 level) w.r.t. all their possible generalizations,
Significant Improvement (Set) sig-improve-set
,
removes patterns that do not significantly improve
(default 0.01 level) w.r.t. all generalizations in the result set,
Weighted Covering weighted-covering
, performs weighted
covering on the data in order to select a covering set of
subgroups while reducing the overlap on the data.
By default no postfilter is set, i.e., postfilter = ""
.
parfilter
:Provides the minimal improvement value for the postfilter (for min-improve-* filters), or the significance level (P) for sig-improve-* filters.
DiscoverSubgroups
.
DiscoverSubgroupsByTask
CreateSDTask
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.