searchrules: Searching for fuzzy association rules

View source: R/searchrules.R

searchrulesR Documentation

Searching for fuzzy association rules

Description

This function searches the given fsets() object d for all fuzzy association rules that satisfy defined constraints. It returns a list of fuzzy association rules together with some statistics characterizing them (such as support, confidence etc.).

Usage

searchrules(
  d,
  lhs = 2:ncol(d),
  rhs = 1,
  tnorm = c("goedel", "goguen", "lukasiewicz"),
  n = 100,
  best = c("confidence"),
  minSupport = 0.02,
  minConfidence = 0.75,
  maxConfidence = 1,
  maxLength = 4,
  numThreads = 1,
  trie = (maxConfidence < 1)
)

Arguments

d

An object of class fsets() - it is basically a matrix where columns represent the fuzzy sets and values are the membership degrees. For creation of such object, use fcut() or lcut() function.

lhs

Indices of fuzzy attributes that may appear on the left-hand-side (LHS) of association rules, i.e. in the antecedent.

rhs

Indices of fuzzy attributes that may appear on the right-hand-side (RHS) of association rules, i.e. in the consequent.

tnorm

A t-norm to be used for computation of conjunction of fuzzy attributes. (Allowed are even only starting letters of "lukasiewicz", "goedel" and "goguen").

n

The non-negative number of rules to be found. If zero, the function returns all rules satisfying the given conditions. If positive, only n best rules are returned. The criterium of what is “best” is specified with the best argument.

best

Specifies measure accordingly to which the rules are ordered from best to worst. This argument is used mainly in combination with the n argument. Currently, only single value ("confidence") can be used.

minSupport

The minimum support degree of a rule. Rules with support below that number are filtered out. It must be a numeric value from interval [0, 1]. See below for details on how the support degree is computed.

minConfidence

The minimum confidence degree of a rule. Rules with confidence below that number are filtered out. It must be a numeric value from interval [0, 1]. See below for details on how the confidence degree is computed.

maxConfidence

Maximum confidence threshold. After finding a rule that has confidence degree above the maxConfidence threshold, no other rule is resulted based on adding some additional attribute to its antecedent part. I.e. if "Sm.age & Me.age => Sm.height" has confidence above maxConfidence threshold, no another rule containing "Sm.age & Me.age" will be produced regardless of its interest measures.

If you want to disable this feature, set maxConfidence to 1.

maxLength

Maximum allowed length of the rule, i.e. maximum number of predicates that are allowed on the left-hand + right-hand side of the rule. If negative, the maximum length of rules is unlimited.

numThreads

Number of threads used to perform the algorithm in parallel. If greater than 1, the OpenMP library (not to be confused with Open MPI) is used for parallelization. Please note that there are known problems of using OpenMP together with another means of parallelization that may be used within R. Therefore, if you plan to use the searchrules function with some of the external parallelization mechanisms such as library doMC, make sure that numThreads equals 1. This feature is available only on systems that have installed the OpenMP library.

trie

Whether or not to use internal mechanism of Tries. If FALSE, then in the output may appear such rule that is a descendant of a rule that has confidence above maxConfidence threshold.

Tries consume very much memory, so if you encounter problems with insufficient memory, set this argument to FALSE. On the other hand, the size of result (if n is set to 0) can be very high if trie is set to FALSE.

Details

The function searches data frame d for fuzzy association rules that satisfy conditions specified by the parameters.

Value

A list of the following elements: rules and statistics.

rules is a list of mined fuzzy association rules. Each element of that list is a character vector with consequent attribute being on the first position.

statistics is a data frame of statistical characteristics about mined rules. Each row corresponds to a rule in the rules list. Let us consider a rule "a & b => c", let \otimes be a t-norm specified with the tnorm parameter and i goes over all rows of a data table d. Then columns of the statistics data frame are as follows:

  • support: a rule's support degree: 1/nrow(d) * ∑_{\forall i} a(i) \otimes b(i) \otimes c(i)

  • lhsSupport: a support of rule's antecedent (LHS): 1/nrow(d) * ∑_{\forall i} a(i) \otimes b(i)

  • rhsSupport: a support of rule's consequent (RHS): 1/nrow(d) * ∑_{\forall i} c(i)

  • confidence: a rule's confidence degree: support / lhsSupport

Author(s)

Michal Burda

See Also

fcut(), lcut(), farules(), fsets(), pbld()

Examples


  d <- lcut(CO2)
  searchrules(d, lhs=1:ncol(d), rhs=1:ncol(d))


beerda/lfl documentation built on Feb. 15, 2023, 8:15 a.m.