FUGEPSD: Fuzzy Genetic Programming-based learning for Subgroup...

Description Usage Arguments Author(s) References Examples

View source: R/FuGePSD.R

Description

Make a subgroup discovery task using the FuGePSD algorithm.

Usage

1
2
3
4
5
6
7
8
FUGEPSD(paramFile = NULL, training = NULL, test = NULL,
  output = c("optionsFile.txt", "rulesFile.txt", "testQM.txt"), seed = 0,
  nLabels = 3, t_norm = "product_t-norm", ruleWeight = "Certainty_Factor",
  frm = "Normalized_Sum", numGenerations = 300,
  numberOfInitialRules = 100, crossProb = 0.5, mutProb = 0.2,
  insProb = 0.15, dropProb = 0.15, tournamentSize = 2,
  globalFitnessWeights = c(0.7, 0.1, 0.05, 0.2), minCnf = 0.6,
  ALL_CLASS = TRUE, targetVariable = NA)

Arguments

paramFile

The path of the parameters file. NULL If you want to use training and test SDEFSR_Dataset variables

training

A SDEFSR_Dataset class variable with training data.

test

A SDEFSR_Dataset class variable with test data.

output

Character vector with the paths where store information file, rules file and test quality measures file, respectively. For rules and quality measures files, the algorithm generate 4 files, each one with the results of a given filter of fuzzy confidence.

seed

An integer to set the seed used for generate random numbers.

nLabels

Number of linguistic labels for numerical variables. By default 3. We recommend an odd number between 3 and 9.

t_norm

A string with the t-norm to use when computing the compatibilty degree of the rules. Use 'Minimum/Maximum' to specify the minimum t-norm, if not, we use product t-norm that is the default method.

ruleWeight

String with the method to calculate the rule weight. Possible values are:

  • Certainty_Factor: It uses the Classic Certainty Factor Weight method.

  • Average_Penalized_Certainty_Factor: It uses Penalized Certainty Factor weight II by Ishibuchi.

  • No_Weights: There are no weight calculation.

  • Default: If none of this are specificied, the default method is Penalized Certainty Factor Weight IV by Ishibuchi.

frm

A string specifying the Fuzzy Reasoning Method to use. Possible Values are:

  • Normalized_Sum: It uses the Normalized Sum or Additive Combination Fuzzy Reasoning Method.

  • Arithmetic_Mean: It uses the Arithmetic Mean Fuzzy Reasoning Method.

  • Default: By default, Winning Rule Fuzzy Reasoning Method are selected.

numGenerations

An integer to set the number of generations to perfom before stop the evolutionary process.

numberOfInitialRules

An integer to set the number individuals or rules in the initial population.

crossProb

Sets the crossover probability. We recommend a number in [0,1].

mutProb

Sets the mutation probability. We recommend a number in [0,1].

insProb

Sets the insertion probability. We recommend a number in [0,1].

dropProb

Sets the dropping probability. We recommend a number in [0,1].

tournamentSize

Sets the number of individuals that will be chosen in the tournament selection procedure. This number must be greater than or equal to 2.

globalFitnessWeights

A numeric vector of length 4 specifying the weights used in the computation of the Global Fitness Parameter.

minCnf

A value in [0,1] to filter rules with a minimum confidence

ALL_CLASS

if TRUE, the algorithm returns, at least, the best rule for each target class, even if it does not pass the filters. If FALSE, it only returns, at least, the best rule if there are not rules that passes the filters.

targetVariable

The name or index position of the target variable (or class). It must be a categorical one.

@details This function sets as target variable the last one that appear in SDEFSR_Dataset object. If you want to change the target variable, you can set the targetVariable to change this target variable. The target variable MUST be categorical, if it is not, throws an error. Also, the default behaviour is to find rules for all possible values of the target varaible. targetClass sets a value of the target variable where the algorithm only finds rules about this value.

If you specify in paramFile something distinct to NULL the rest of the parameters are ignored and the algorithm tries to read the file specified. See "Parameters file structure" below if you want to use a parameters file.

@return The algorithm shows in console the following results:

  1. Information about the parameters used in the algorithm.

  2. Results for each filter:

    1. Rules generated that passes the filter.

    2. The test quality measures for each rule in that filter.

Also, this results are saved in a file with rules and other with the quality measures, one file per filter.

@section How does this algorithm work?: This algorithm performs a EFS based on a genetic programming algorithm. This algorithm starts with an initial population generated in a random manner where individuals are represented through the "chromosome = individual" approach includind both antecedent and consequent of the rule. The representation of the consequent has the advantage of getting rules for all target class with only one execution of the algorithm.

The algorithm employs a cooperative-competition approach were rules of the population cooperate and compete between them in order to obtain the optimal solution. So this algorithm performs to evaluation, one for individual rules to competition and other for the total population for cooperation.

The algorithm evolves generating an offspring population of the same size than initial generated by the application of the genetic operators over the main population. Once applied, both populations are joined a token competition is performed in order to mantain the diversity of the rules generated. Also, this token competition reduce the population sice deleting those rules that are not competitive.

After the evolutionary process a screening function is applied over the best population. This screening function filter the rules that have a minimium level of confidence and sensitivity. Those levels are 0.6 for sensitivy and four filters of 0.6, 0.7, 0.8 and 0.9 for fuzzy confidence are performed.

Also, the user can force the algorithm return at least one rule for all target class values, even if not pass the screening function. This behaviour is specified by the ALL_CLASS parameter.

@section Parameters file structure: The paramFile argument points to a file which has the neccesary parameters to execute FuGePSD. This file must be, at least, this parameters (separated by a carriage return):

  • algorithm Specify the algorithm to execute. In this case. "MESDIF"

  • inputData Specify two paths of KEEL files for training and test. In case of specify only the name of the file, the path will be the working directory.

  • seed Sets the seed for the random number generator

  • nLabels Sets the number of fuzzy labels to create when reading the files

  • nEval Set the maximun number of evaluations of rules for stop the genetic process

  • popLength Sets number of individuals of the main population

  • eliteLength Sets number of individuals of the elite population. Must be less than popLength

  • crossProb Crossover probability of the genetic algorithm. Value in [0,1]

  • mutProb Mutation probability of the genetic algorithm. Value in [0,1]

  • Obj1 Sets the objetive number 1.

  • Obj2 Sets the objetive number 2.

  • Obj3 Sets the objetive number 3.

  • Obj4 Sets the objetive number 4.

  • RulesRep Representation of each chromosome of the population. "can" for canonical representation. "dnf" for DNF representation.

  • targetClass Value of the target variable to search for subgroups. The target variable is always the last variable. Use null to search for every value of the target variable

An example of parameter file could be:

 algorithm = FUGEPSD
 inputData = "banana-5-1tra.dat" "banana-5-1tst.dat"
 outputData = "Parameters_INFO.txt" "Rules.txt" "TestMeasures.txt"
 seed = 23783
 Number of Labels = 3
 T-norm/T-conorm for the Computation of the Compatibility Degree = Normalized_Sum
 Rule Weight = Certainty_Factor
 Fuzzy Reasoning Method = Normalized_Sum
 Number of Generations = 300
 Initial Number of Fuzzy Rules = 100
 Crossover probability = 0.5
 Mutation probability = 0.2
 Insertion probability = 0.15
 Dropping Condition probability = 0.15
 Tournament Selection Size = 2 
 Global Fitness Weight 1 = 0.7
 Global Fitness Weight 2 = 0.1 
 Global Fitness Weight 3 = 0.05
 Global Fitness Weight 4 = 0.2
 All Class = true

Author(s)

Written on R by Angel M. Garcia <[email protected]>

References

A fuzzy genetic programming-based algorithm for subgroup discovery and the application to one problem of pathogenesis of acute sore throat conditions in humans, Carmona, C.J., Ruiz-Rodado V., del Jesus M.J., Weber A., Grootveld M., Gonzalez P., and Elizondo D. , Information Sciences, Volume 298, p.180-197, (2015)

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
FUGEPSD(training = habermanTra,
         test = habermanTst,
         output = c("parametersFile.txt", "rulesFile.txt", "testQM.txt"),
         seed = 23783,
         nLabels = 3,
         t_norm = "Minimum/Maximum",
         ruleWeight = "Certainty_Factor",
         frm = "Normalized_Sum",
         numGenerations = 20,
         numberOfInitialRules = 15,
         crossProb = 0.5,
         mutProb = 0.2,
         insProb = 0.15,
         dropProb = 0.15,
         tournamentSize = 2,
         globalFitnessWeights = c(0.7, 0.1, 0.3, 0.2),
         ALL_CLASS = TRUE)
## Not run: 
Execution with a parameters file called 'ParamFile.txt' in the working directory:

FUGEPSD("ParamFile.txt")


## End(Not run)

SDEFSR documentation built on May 29, 2017, 10:59 a.m.