# Genetic Algorithm for Generalized Biclustering

### Description

A flexible framework for finding submatrices that are good manifestations of a user-specified pattern from within a numeric (often binary) matrix. The user-defined pattern is specified via feature selection and bicluster desirability evaluation functions (see details).

### Usage

1 2 3 4 5 | ```
GABi(x,nSols=0,convergenceGens=40,popsize=256,mfreq=1,xfreq=0.5,
maxNgens=200,keepBest=FALSE,identityThreshold=0.75,
nsubpops=4,experiod=10,diffThreshold=0.9,verbose=FALSE,maxLoop=1,
fitnessArgs=list(consistency=0.8,featureWeights = rowMeans(x, na.rm = TRUE)),
fitnessFun=getFitnesses.entropy,featureSelFun=featureSelection.basic)
``` |

### Arguments

`x` |
Numeric data input array used to generate binary output array. Each row of the array represents a different variable. |

`nSols` |
Number of solutions at which to terminate loop. |

`convergenceGens` |
Number of generations after which to terminate the GA process within each loop if no improvement to the best solution's fitness is seen. |

`popsize` |
Total number of solutions to be evolved in GA (divided across |

`mfreq` |
Mutation frequency: probability of flipping each bit in each GA solution is |

`xfreq` |
Crossover frequency: probability of each pair of solutions having the crossover operator being applied. |

`maxNgens` |
Maximum number of generations in GA process within each loop. |

`keepBest` |
Boolean specifying whether or not to pass the best solution from each generation unchanged into the next. |

`identityThreshold` |
Numeric value specifying the proportion of shared columns from |

`nsubpops` |
Numeric value specifying the number of distinct subpopulations across which to distribute the GAs population of solutions. For more details on the Island Model of GAs, see Whitley 1995. If |

`experiod` |
Number of generations after which to exchange solutions between the distinct GA subpopulations. If |

`diffThreshold` |
Numeric value specifying minimum proportion of values in each row of |

`verbose` |
Boolean indicating whether or not to print diagnostic messages to R console. |

`maxLoop` |
Numeric value specifying maximum number of runs of the GA, after which GABi will terminate and return all recovered solutions, even if |

`fitnessArgs` |
List containing arguments to be used in |

`fitnessFun` |
Function taking argument |

`featureSelFun` |
Function taking argument |

### Details

GABi uses flexible user-defined (or preset) functions to perform generalized biclustering of a numeric or binary data matrix `x`

. It implements a number of features, including an Island Model of population evolution (in which a number of distinct subpopulations are kept isolated for the purposes of selection and crossover) and an iterative loop of solution generation (in which the GA process is rerun with a 'tabu' list, ensuring that previously returned solutions are not selected for in subsequent runs of the GA). Given an appropriate fitness function `fitnessFun`

and feature selection function `featureSelFun`

, which take a binary chromosome (in which a `1`

denotes that the corresponding column of `x`

is included in the bicluster) and return a desirability score and a list of the features fitting the bicluster pattern across the specified columns, respectively.

### Value

List of biclusters. Each bicluster represents a submatrix satisfying the conditions of the specified pattern, and contains the elements:

`features` |
Which rows of the input array |

`samples` |
Which columns of the input array |

`score` |
Fitness evaluation of this bicluster (can be used to compare the different biclusters output by the algorithm) |

### Author(s)

Ed Curry e.curry@imperial.ac.uk

### Examples

1 2 3 4 5 6 7 |