BigBang: Represents the ensemble of the results of evolving several...

Description Usage Arguments Class Fields and Methods Author(s) References See Also Examples

Description

The BigBang object is an attempt to use more the information of a large collection of solutions instead of a unique solution. Perhaps we are studying the solution landscape or we would like to “ensemble” solutions from other “small” solutions. For complex problems (or even simple problems), the number of “solutions” may be very large and diverse. In the context of classification for microarray data, we have seen that models assembled from many solution could be used as “general models” and that the most frequent genes in solutions provide insights for biological phenomena.

Therefore, we designed the BigBang object, which implements methods to run a Galgo object several times recording relevant information from individual galgos for further analysis. Running a BigBang takes commonly several minutes, hours or perhaps days depending on the complexity of the fitness function, the data, the goalFitness, the stopping rules in Galgo, and the number of solutions to collect. Parallelism is not explicity implemented but some methods has been implemented to make this task easy and possible.

As in a Galgo object, there are three stopping methods: maxBigBangs, maxSolutions and callBackFunc. maxBigBangs controls the maximum number of galgo evolutions to run; when the current evolution-cycle reaches this value, the process ends. Sometimes evolutions do not end up with a goalFitness reached, this is not called a “solution”. Therefore, maxSolutions controls the maximum number of solutions desired. If onlySolutions==FALSE, all galgo evolutions are saved and considered as “solution”, nevertheless the solution variable save the real status in the BigBang object. callBackFunc may ends the process if it returns NA. It must be considered that any R-program can be broken typing Ctrl-C (Esc in Windows). If for some reason the process has been interrupt, the BigBang process can continue processing the same cycle just calling the method blast again. However the object integrity may be risked if the process is broken in critical parts (when the object is being updated at the end of each cycle). Thus, it is recommended to break the process in the galgo “evolution”.

In the case of variable selection for microarray data, some methods has been proposed that use several independent solutions to design a final solution (or set of better solutions, see XXX references *** MISSING ***).

There is configBB.VarSel and configBB.VarSelMisc functions that configure a BigBang object together with all sub-objects for common variable selection problems (e.g. classification, regression, etc.)

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
BigBang(id=0,
	galgo=NULL,
	maxBigBangs=10,
	maxSolutions=1,
	collectMode=c("bigbang", "galgos", "chromosomes"),
	onlySolutions=TRUE,
	verbose=1,
	callPreFunc=function(bigbang, galgo) TRUE,
	callBackFunc=function(bigbang, galgo) TRUE,
	callEnhancerFunc=function(chr, parent) NULL,
	data=NULL,
	saveFile=NULL,
	saveFrequency=100,
	saveVariableName=collectMode,
	saveMode=c("unObject+compress", "unObject", "object", "object+compress"),
	saveGeneBreaks=NULL,
	geneNames=NULL,
	sampleNames=NULL,
	classes=NULL,
	gcFrequency=123,
	gcCalls=5,
	call=NULL,
	...)

Arguments

id

A way to identify the object.

galgo

The prototype Galgo object that will be used to run and collect solutions.

maxBigBangs

The maximum number of BigBangs. A bigbang is the evolution of a Galgo object using the method evolve. When the current number of bigbangs has reached maxBigBangs value, the process ends.

maxSolutions

The maximum number of solutions. If the total number of solutions collected achieve maxSolutions value the process ends. A solution is defined when the goalFitness has been reach. When the Galgo object ends and goalFitness has not been reached, The best chromosome is NOT saved unless onlySolutions is FALSE, in this case maxSolutions and maxBigBangs are equivalent.

collectMode

The type of result to collect for further analysis. "galgos" saves every evolved galgo object, thus it consumes a lot of memory; more than 100 is perhaps not recommendable. "chromosomes" and "bigbangs" save the best chromosome, its fitness, and fitness evolution in the BigBang object. "bigbang" saves the BigBang object to disk whereas "chromosome" saves only the list of chromosomes.

onlySolutions

If TRUE only solutions that has been reach the goalFitness are saved. Otherwise, all solutions are saved and counted as “solution” and $solutions variable contains the real status.

verbose

Instruct the BigBang to display the general information about the process. When verbose==1 this information is printed every evolution. In general every verbose number of generation would produce a line of output. Of course if verbose==0 would not display a thing at all.

callPreFunc

A user-function to be called before every evolution. It should receive the BigBang and Galgo objects. If the result is NA, the process ends.

callBackFunc

A user-function to be called after every evolution. It should receive the BigBang and Galgo objects. If the result is NA, the process ends. When callBackFunc is for instance plot the trace of the evolution is nicely viewed in a plot; however, in long runs it can consume time and memory.

callEnhancerFunc

A user-function to be called after every evolution to improve the solution. It should receive a Chromosome and the BigBang objects as parameters, and must return a new Chromosome object. If the result is NULL nothing is saved. The result replace the original evolved chromosomes, which is saved in evolvedChromosomes list variable in the BigBang object. For functional genomics data, we have included two general routines called geneBackwardElimination and robustGeneBackwardElimination to generate “enhanced” chromosomes.

data

Any user-data can be stored in this variable (but it is not limited to data, the user can insert any other like myData, mama.mia or whatever in the ... argument).

saveFile

The file name where the objects would be saved (see collectMode).

saveFrequency

How often the operation of saving would occur. Saving is a time-consuming operation, low values may degradate the performance.

saveVariableName

The prefereable variable name used for saving (this will be needed when loading).

saveMode

Any combinations of the two options compress and unObject. It can be character vector length 1 or larger. For example, saveMode=="compress+unObject" would call unObject and save the file using compress=TRUE. The vector c("object","compress") (or shorter c("compress")) would save the BigBang object and compressed. It is not recommended to save the crude object because the functions varibles are stuck to environments and R will try to save those environments together, the result can be a waste of disk space and saving time. We strongly recommend saveMode="unObject+compress".

geneNames

Gene names (if they are discrete and finite).

sampleNames

Sample names (if any).

classes

Class of the original samples (useful for classification problems only).

saveGeneBreaks

In the case of variable selection for microarray data (and other problems with discrete and finite genes), a summary on the genes selected is computed and saved in each evolution. It is used to facilitate the computation for some plots and others methods. For no-finite gene applications, it may be useful interpreting saveGeneBreaks as the breaks needed to create an histogram based on the genes included in the “best”.

gcFrequency

How often the garbage collector would be called. Useful if memory needs to be collected during the process.

gcCalls

How many calls to garbage collector (we have seen that many consecutive calls to gc() is better [R < 2.0]).

call

Internal use.

...

Other user named values to include in the object.

Class

Package: galgo
Class BigBang

Object
~~|
~~+--BigBang

Directly known subclasses:

public static class BigBang
extends Object

Fields and Methods

Methods:

activeChromosomeSet Focus the analysis to different sets of chromosomes.
addCount Add a chromosome to rank and frequency stability counting.
addRandomSolutions Adds random pre-existed solutions.
as.matrix Prints the representation of the BigBang object.
assignParallelFile Assigns a different saveFile value for parallelization.
blast Evolves Galgo objects saving the results for further analysis.
buildCount Builds the rank and frequency stability counting.
classPredictionMatrix Predicts class for samples from chromosomes.
computeCount Compute the counts for every gene from a set of chromosomes..
confusionMatrix Computes the class confusion matrix from a class prediction matrix.
distanceImportanceNetwork Converts geneImportanceNetwork matrix to distance matrix.
filterSolution Filters solutions.
fitnessSplits Computes the fitness function from chromosomes for different splits.
formatChromosome Converts chromosome for storage in BigBang object.
forwardSelectionModels Gets the ``best'' models using top-ranked genes and a forward-selection strategy.
geneCoverage Computes the fraction of genes present in the top-rank from the total genes present in chromosomes.
geneFrequency Computes the frequency of genes based on chromosomes.
geneImportanceNetwork Computes the number of times a couple of top-ranked-genes are present in models.
geneRankStability Computes the rank history for top-ranked genes.
getFrequencies Computes gene freqencies.
heatmapModels Plots models using heatmap plot.
loadParallelFiles Load all files saved during the parallelization.
meanFitness Computes the ``mean'' fitness from several solutions.
meanGeneration Computes the mean number of generations requiered to reach a given fitness value.
mergeBangs Merges the information from other BigBang objects.
pcaModels Plots models in principal components space.
plot Plots about the collected information in a BigBang object.
predict Predicts the class or fitting of new set of samples.
print Prints the representation of a BigBang object.
saveObject Saves the BigBang object into a file in a suitable format.
sensitivityClass Computes the sensitivity of class prediction.
specificityClass Computes the specificity of class prediction.
summary Prints the representation of the BigBang object.

Methods inherited from Object:
as.list, unObject, $, $<-, [[, [[<-, as.character, attach, clone, detach, equals, extend, finalize, getFields, getInstanciationTime, getStaticInstance, hasField, hashCode, ll, load, objectSize, print, save

Author(s)

Victor Trevino. Francesco Falciani Group. University of Birmingham, U.K. http://www.bip.bham.ac.uk/bioinf

References

Goldberg, David E. 1989 Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Pub. Co. ISBN: 0201157675

See Also

Gene, Chromosome, Niche, World, Galgo, configBB.VarSel(), configBB.VarSelMisc().

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
 ## Not run: 
   cr <- Chromosome(genes=newCollection(Gene(shape1=1, shape2=100),5))
   ni <- Niche(chromosomes=newRandomCollection(cr, 10))
   wo <- World(niches=newRandomCollection(ni,2))
   ga <- Galgo(populations=newRandomCollection(wo,1), goalFitness = 0.75,
				callBackFunc=plot,
               fitnessFunc=function(chr, parent) 5/sd(as.numeric(chr)))
 
   #evolve(ga) ## not needed here

   bb <- BigBang(galgo=ga, maxSolutions=10, maxBigBangs=10, saveGeneBreaks=1:100)
   blast(bb) 
   ## it performs 10 times evolve() onto ga object
   ## every time, it reinitilize and randomize
   ## finally, the results are saved.
   plot(bb)
 
   #it is missing a microarray classification example
   
## End(Not run)
 

galgo documentation built on May 2, 2019, 4:20 a.m.