Gen.Alg: Find Taxa Separating Two Groups using Genetic Algorithm (GA)

Description Usage Arguments Details Value Examples

View source: R/Gen.Alg.R

Description

GA-Mantel is a fully multivariate method that uses a genetic algorithm to search over possible taxa subsets using the Mantel correlation as the scoring measure for assessing the quality of any given taxa subset.

Usage

1
2
3
4
	Gen.Alg(data, covars, iters = 50, popSize = 200, earlyStop = 0, 
	dataDist = "euclidean", covarDist = "gower", verbose = FALSE, 
	plot = TRUE, minSolLen = NULL, maxSolLen = NULL, custCovDist = NULL,
	penalty = 0)

Arguments

data

A matrix of taxonomic counts(columns) for each sample(rows).

covars

A matrix of covariates(columns) for each sample(rows).

iters

The number of times to run through the GA.

popSize

The number of solutions to test on each iteration.

earlyStop

The number of consecutive iterations without finding a better solution before stopping regardless of the number of iterations remaining. A value of '0' will prevent early stopping.

dataDist

The distance metric to use for the data. Either "euclidean" or "gower".

covarDist

The distance metric to use for the covariates. Either "euclidean" or "gower".

verbose

While 'TRUE' the current status of the GA will be printed periodically.

plot

A boolean to plot the progress of the scoring statistics by iteration.

minSolLen

The minimum number of columns to select.

maxSolLen

The maximum number of columns to select.

custCovDist

A custom covariate distance matrix to use in place of calculating one from covars.

penalty

A number between 0 and 1 used to penalize the solutions based on the number of selected taxa using the following formula: score - penalty * ((number of selected taxa)/(number of taxa)).

Details

Use a GA approach to find taxa that separate subjects based on group membership or set of covariates.

The data and covariates should be normalized BEFORE use with this function because of distance functions.

This function uses modified code from the rbga function in the genalg package. rbga

Because the GA looks at combinations and uses the raw data, taxa with a small difference in their PIs may be selected and large differences may not be.

The distance calculations use the vegdist package. vegdist

Value

A list containing

scoreSumm

A matrix summarizing the score of the population. This can be used to figure out if the ga has come to a final solution or not. This data is also plotted if plot is 'TRUE'.

solutions

The final set of solutions, sorted with the highest scoring first.

scores

The scores for the final set of solutions.

time

How long in seconds the ga took to run.

selected

The selected columns by name.

nonSelected

The columns that were NOT selected by name.

selectedIndex

The selected taxa by column number.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
	## Not run: 
		data(saliva)
		data(throat)
		
		### Combine the data into a single data frame
		group.data <- list(saliva, throat)
		group.data <- formatDataSets(group.data)
		data <- do.call("rbind", group.data)
		
		### Normalize the data by subject
		dataNorm <- t(apply(data, 1, function(x){x/sum(x)}))
		
		### Set covars to just be group membership
		memb <- c(rep(0, nrow(saliva)), rep(1, nrow(throat)))
		covars <- matrix(memb, length(memb), 1)
		
		### We use low numbers for speed. The exact numbers to use depend
		### on the data being used, but generally the higher iters and popSize 
		### the longer it will take to run.  earlyStop is then used to stop the
		### run early if the results aren't improving.
		iters <- 500
		popSize <- 200
		earlyStop <- 250
		
		gaRes <- Gen.Alg(dataNorm, covars, iters, popSize, earlyStop)
	
## End(Not run)

Example output

Loading required package: dirmult

Attaching package: 'HMP'

The following object is masked from 'package:dirmult':

    weirMoM

HMP documentation built on Aug. 31, 2019, 5:05 p.m.