# Gen.Alg: Find Taxa Separating Two Groups using Genetic Algorithm (GA) In HMP: Hypothesis Testing and Power Calculations for Comparing Metagenomic Samples from HMP

## Description

GA-Mantel is a fully multivariate method that uses a genetic algorithm to search over possible taxa subsets using the Mantel correlation as the scoring measure for assessing the quality of any given taxa subset.

## Usage

 ```1 2 3 4``` ``` Gen.Alg(data, covars, iters = 50, popSize = 200, earlyStop = 0, dataDist = "euclidean", covarDist = "gower", verbose = FALSE, plot = TRUE, minSolLen = NULL, maxSolLen = NULL, custCovDist = NULL, penalty = 0) ```

## Arguments

 `data` A matrix of taxonomic counts(columns) for each sample(rows). `covars` A matrix of covariates(columns) for each sample(rows). `iters` The number of times to run through the GA. `popSize` The number of solutions to test on each iteration. `earlyStop` The number of consecutive iterations without finding a better solution before stopping regardless of the number of iterations remaining. A value of '0' will prevent early stopping. `dataDist` The distance metric to use for the data. Either "euclidean" or "gower". `covarDist` The distance metric to use for the covariates. Either "euclidean" or "gower". `verbose` While 'TRUE' the current status of the GA will be printed periodically. `plot` A boolean to plot the progress of the scoring statistics by iteration. `minSolLen` The minimum number of columns to select. `maxSolLen` The maximum number of columns to select. `custCovDist` A custom covariate distance matrix to use in place of calculating one from covars. `penalty` A number between 0 and 1 used to penalize the solutions based on the number of selected taxa using the following formula: score - penalty * ((number of selected taxa)/(number of taxa)).

## Details

Use a GA approach to find taxa that separate subjects based on group membership or set of covariates.

The data and covariates should be normalized BEFORE use with this function because of distance functions.

This function uses modified code from the rbga function in the genalg package. rbga

Because the GA looks at combinations and uses the raw data, taxa with a small difference in their PIs may be selected and large differences may not be.

The distance calculations use the vegdist package. vegdist

## Value

A list containing

 `scoreSumm` A matrix summarizing the score of the population. This can be used to figure out if the ga has come to a final solution or not. This data is also plotted if plot is 'TRUE'. `solutions` The final set of solutions, sorted with the highest scoring first. `scores` The scores for the final set of solutions. `time` How long in seconds the ga took to run. `selected` The selected columns by name. `nonSelected` The columns that were NOT selected by name. `selectedIndex` The selected taxa by column number.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27``` ``` ## Not run: data(saliva) data(throat) ### Combine the data into a single data frame group.data <- list(saliva, throat) group.data <- formatDataSets(group.data) data <- do.call("rbind", group.data) ### Normalize the data by subject dataNorm <- t(apply(data, 1, function(x){x/sum(x)})) ### Set covars to just be group membership memb <- c(rep(0, nrow(saliva)), rep(1, nrow(throat))) covars <- matrix(memb, length(memb), 1) ### We use low numbers for speed. The exact numbers to use depend ### on the data being used, but generally the higher iters and popSize ### the longer it will take to run. earlyStop is then used to stop the ### run early if the results aren't improving. iters <- 500 popSize <- 200 earlyStop <- 250 gaRes <- Gen.Alg(dataNorm, covars, iters, popSize, earlyStop) ## End(Not run) ```

### Example output ```Loading required package: dirmult

Attaching package: 'HMP'

The following object is masked from 'package:dirmult':

weirMoM
```

HMP documentation built on Aug. 31, 2019, 5:05 p.m.