GenAlgForSubsetSelection: Genetic algorithm for subset selection

Description Usage Arguments Value Note Author(s) Examples

View source: R/GenAlgForSubsetSelection.R

Description

It uses a genetic algorithm to select n_{Train} individuals so that optimality criterion is minimum.

Usage

1
2
3
4
5
6
7
GenAlgForSubsetSelection(P, Candidates, Test, ntoselect, npop = 100, nelite =
                 5, keepbest = TRUE, tabu = T, tabumemsize = 1, mutprob
                 = 0.8, mutintensity = 1, niterations = 500,
                 minitbefstop = 200, niterreg = 5, lambda = 1e-06,
                 plotiters = FALSE, plottype=1,errorstat = "PEVMEAN", C = NULL,
                 mc.cores = 1, InitPop = NULL, tolconv = 1e-07, Vg =
                 NULL, Ve = NULL)

Arguments

P

depending on the criterion this is either a numeric data matrix or a symmetric similarity matrix. When it is a data matrix, the union of the identifiers of the candidate (and test) individuals should be put as rownames (and column names in case of a similarity matrix). For methods using the relationships, this is the inverse of the relationship matrix with row and column names as the the identifiers of the candidate (and test) individuals.

Candidates

vector of identifiers for the individuals in the candidate set.

Test

vector of identifiers for the individuals in the test set.

ntoselect

n_{Train}: number of individuals to select in the training set.

npop

genetic algorithm parameter, number of solutions at each iteration

nelite

genetic algorithm parameter, number of solutions selected as elite parents which will generate the next set of solutions.

keepbest

genetic algorithm parameter, TRUE or FALSE. If TRUE then the best solution is always kept in the next generation of solutions (elitism).

tabu

genetic algorithm parameter, TRUE or FALSE. If TRUE then the solutions that are saved in tabu memory will not be retried.

tabumemsize

genetic algorithm parameter, integer>0. Number of generations to hold in tabu memory.

mutprob

genetic algorithm parameter, probability of mutation for each generated solution.

mutintensity

mean of the poisson variable that is used to decide the number of mutations for each cross.

niterations

genetic algorithm parameter, number of iterations.

minitbefstop

genetic algorithm parameter, number of iterations before stopping if no change is observed in criterion value.

niterreg

genetic algorithm parameter, number of iterations to use regressions, an integer with minimum value of 1

lambda

scalar shrinkage parameter (λ>0).

plotiters

plot the convergence: TRUE or FALSE. Default is TRUE.

plottype

type of plot, default is 1. possible values 1,2,3.

errorstat

optimality criterion: One of the optimality criterion. Default is "PEVMEAN". It is possible to use user defined functions as shown in the examples.

mc.cores

number of cores to use.

InitPop

a list of initial solutions

tolconv

if the algorithm cannot improve the errorstat more than tolconv for the last minitbefstop iterations it will stop.

C

Contrast Matrix.

Vg

covariance matrix between traits generated by the relationship K (only for multi-trait version of PEVMEANMM).

Ve

residual covariance matrix for the traits (only for multi-trait version of PEVMEANMM).

Value

A list of length nelite+1. The first nelite elements of the list are optimized training samples of size n_{train} and they are listed in increasing order of the optimization criterion. The last item on the list is a vector that stores the minimum values of the objective function at each iteration.

Note

The GA does not guarantee convergence to globally optimal solutions and it is highly recommended that the algorithm is replicated to obtain ”good” training samples.

Author(s)

Deniz Akdemir

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
	## Not run: 
####################################
library(EMMREML)
library(STPGA)
data(WheatData)

svdWheat<-svd(Wheat.K, nu=5, nv=5)
PC50WHeat<-Wheat.K%*%svdWheat$v
plot(PC50WHeat[,1],PC50WHeat[,2])
rownames(PC50WHeat)<-rownames(Wheat.K)
DistWheat<-dist(PC50WHeat)
TreeWheat<-hclust(DistWheat)
TreeWheat<-cutree(TreeWheat, k=4)

Test<-rownames(PC50WHeat)[TreeWheat==4]
length(Test)
Candidates<-setdiff(rownames(PC50WHeat), Test)


###instead of using the algorithm directly using a wrapper to 
###implement an for multiple starting points for genetic algorithm.
repeatgenalg<-function(numrepsouter,numrepsinner){
  StartingPopulation2=NULL 
  for (i in 1:numrepsouter){
    print("Rep:")
    print(i)
    StartingPopulation<-lapply(1:numrepsinner, function(x){
    	GenAlgForSubsetSelection(P=PC50WHeat,Candidates=Candidates, 
    	Test=Test, ntoselect=50, InitPop=StartingPopulation2,
 npop=50, nelite=5, mutprob=.5, mutintensity = rpois(1,4),
 niterations=10,minitbefstop=5, tabumemsize = 2,plotiters=TRUE, 
 lambda=1e-9,errorstat="CDMEAN", mc.cores=1)})
    StartingPopulation2<-vector(mode="list", length = numrepsouter*1)
    ij=1
    for (i in 1:numrepsinner){
      for (j in 1:1){
        StartingPopulation2[[ij]]<-StartingPopulation[[i]][[j]]
        ij=ij+1
      }
    }
  }
  ListTrain<-GenAlgForSubsetSelection(P=PC50WHeat,Candidates=Candidates, 
    	Test=Test,ntoselect=50, InitPop=StartingPopulation2,npop=100, 
    	nelite=10, mutprob=.5, mutintensity = 1,niterations=300,
    	minitbefstop=100, tabumemsize = 1,plotiters=T,
    	lambda=1e-9,errorstat="CDMEAN", mc.cores=1)
  return(ListTrain)
}


ListTrain<-repeatgenalg(20, 3)

###test sample
deptestopt<-Wheat.Y[Wheat.Y$id%in%Test,]

##predictions by optimized sample
deptrainopt<-Wheat.Y[(Wheat.Y$id%in%ListTrain[[1]]),]

Ztrain<-model.matrix(~-1+deptrainopt$id)
Ztest<-model.matrix(~-1+deptestopt$id)

modelopt<-emmreml(y=deptrainopt$plant.height,X=matrix(1, nrow=nrow(deptrainopt), ncol=1), 
Z=Ztrain, K=Wheat.K)
predictopt<-Ztest%*%modelopt$uhat

corvecrs<-c()
for (rep in 1:300){
###predictions by a random sample of the same size
  rs<-sample(Candidates, 50)
  
  deptestrs<-Wheat.Y[Wheat.Y$id%in%Test,]
  
  deptrainrs<-Wheat.Y[(Wheat.Y$id%in%rs),]
  
  Ztrain<-model.matrix(~-1+deptrainrs$id)
  Ztest<-model.matrix(~-1+deptestrs$id)
  
  library(EMMREML)
  modelrs<-emmreml(y=deptrainrs$plant.height,X=matrix(1, nrow=nrow(deptrainrs), ncol=1), 
  Z=Ztrain, K=Wheat.K)
  predictrs<-Ztest%*%modelrs$uhat
corvecrs<-c(corvecrs,cor(predictrs, deptestrs$plant.height))

}
mean(corvecrs)
cor(predictopt, deptestopt$plant.height)


plot(PC50WHeat[,1],PC50WHeat[,2], col=rownames(PC50WHeat)%in%ListTrain[[1]]+1,
pch=2*rownames(PC50WHeat)%in%Test+1, xlab="pc1", ylab="pc2")

## End(Not run)

STPGA documentation built on May 29, 2017, 3:44 p.m.