Optimal design for genetical genomics experiments
Description
Main function to search and display A and D optimal designs for single or twochannel genetical genomics experiments. Simulated annealing or Metropolis Hastings used to find the best design.
Usage
1 2 3 4 5 6  designGG ( genotype, nSlides, nTuple, nEnvFactors, nLevels,
Level=NULL, bTwoColorArray=TRUE, initial=NULL, weight=1,
region=NULL, optimality="A", method="SA", nIterations=3000,
n.search=2, endTemp=1e10, startTemp=1, maxTempStep=0.9,
plotScores=TRUE, directory=NULL, fileName=NULL,
envFactorNames=NULL, writingProcess=TRUE )

Arguments
genotype 
genotype data: a nMarkerbynRILs matrix with two allels being 0 and 1 (or A and B) or three allels being 0, 0.5 and 1 (or, A, H, and B), where 0.5 (or H) represents heterozygous allele. 
nSlides 
total number of slides available for the experiment. 
nTuple 
average number of RILs (or strains) to be assigned onto each condition. 
nEnvFactors 
number of environmental factors, an integer bewteen 1 and 3.
When 
nLevels 
number of levels for each factor, a vector with each
component being an integer. The length of it should equal

Level 
a list which specifies the levels for each factor in the
experiment. There are in total 
bTwoColorArray 
binary variable indicating experiment type: 
initial 
the starting design matrix for the algorithm. If specified, this should
be a list with 2 matrices: 
weight 
a vector with length of 
region 
genome region of biological interest. Default = 
optimality 
type of optimality, i.e. "A" (Aoptimality) or "D" (Doptimality). Aoptimality minimizes $Trace((X'X)^1)$, which corresponds to minimum average variance of the parameter estimates. Doptimality minimizes $det(X'X)^1$, which corresponds to minimum generalized variance of the parameter estimates. 
method 
method for searching for an optimal design. "SA" uses simulated annealing. "MH" uses Metropolis Hasting. Default = "SA". 
nIterations 
number of iterations of the simulated annealing method. Default = 3000. 
n.search 
number of times for simulated annealing optimaization with different initial design, default = 2. Here it is suggested to be between 1 and 5. It should not to be too large because of the reaching computational burden. 
endTemp 
ending temperature of simulated annealing process. An important optimization parameter. Default = $1e^10$. 
startTemp 
starting temperature of simulated annealing process. Default = 1. 
maxTempStep 
maximum temperature decreasing step for simulated annealing process.
The parameter ensures that the multiplicative cooling factor is not
smaller than that. If 
plotScores 
If 
directory 
It tells where the resulting optimal design tables are to be stored.
If 
fileName 
the final optimal design table(s) in 
envFactorNames 
a vector with names for all environmental factor(s). For example, for the
experiment with two environmental factors of temperature and drug treatment:

writingProcess 
If TRUE, it prints how much computation work has been finished in a
file called 
Details
Given the genetic information of samples available for the experiment
(genotype) and the information about experimental settings (nEnvFactors
,
nSlides
,nLevels
etc.), the algorithm searches for an Aoptimal or Doptimal
(see optimality
) using simulated annealing (see method
). A plot of
the scores at each iterations can also be given using the plotAllScores
function.
It also contains a number of the arguments:
region
is used to specify the
genome region that are of major interest to experimenters.
weight
is used to define
the weight of genetic and environmental factors, and interaction terms. Prior
knowledge about expected effect sizes of interesting factors can also be
incorporated as weight
parameters for the algorithm. The weight is
inversely proportional to the expected effect size of the corresponding parameter.
Example parameter settings:
Suppose to design an experiment with two environmental factors (F1, F2) and
there are two diffferent levels for each environment. The levels are 16
and 24 for F1, and 5 and 10 for F2. Thus the following command can be used:
nEnvFactors < 2
nLevels < c ( 2, 2 )
levels < list ( c(16, 24), c(5, 10) )
The length of parameter weight
is dependent on the number of environmental
factors:
When nEnvFactor
= 0,
weight
is 1 as there is only one parameter of interest (genotype).
When nEnvFactor
= 1,
weight
= c( $w_Q$, $w_F1$, $w_QF1$ )
When nEnvFactor
= 2,
weight
= c( $w_Q$, $w_F1$, $w_F2$, $w_QF1$, $w_QF2$, $w_F1F2$, $w_QF1F2$)
When nEnvFactor
= 3,
weight
= c( $w_Q$, $w_F1$, $w_F2$, $w_F2$,
$w_QF1$, $w_QF2$, $w_QF3$, $w_F1F2$, $w_F1F3$, $w_F2F3$,
$w_QF1F2$, $w_QF1F3$, $w_QF2F3$, $w_QF1F2F3$ )
Here $w_Q$ represents the weight for genotype effect, $w_F1$ represent the
weight for F1 effect and $w_QF1$ represent the weight for interaction between
genotype and F1 effect, etc.
It should be noted that the simulated annealing algorithm might find a
locally and not globally optimal design. Running the optimization process
multiple times is recommended. When nSearch
> 1, the simulated annealing
optimization will be run nSearch times, each run starts with a different
initial design and will provide a (near)optimal design. If the optimization
problem is simple, all runs will converge to the same (optimal) design.
Otherwise, the best one among all nearoptimal designs will be selected as
the output of the function. One can run the algorithm multiple times with
nSearch
= 1 to review a few (near)optimal designs.
Value
An array design table (arrayDesign.csv) and a condition design table ( conditionDesign.csv) are generated.
Author(s)
Yang Li <yang.li@rug.nl>, Gonzalo Vera <gonzalo.vera.rodriguez@gmail.com>
Rainer Breitling <r.breitling@rug.nl>, Ritsert Jansen <r.c.jansen@rug.nl>
References
Y. Li, M. Swertz, G. Vera, J. Fu, R. Breitling, and R.C. Jansen. designGG:
An Rpackage and Web tool for the optimal design of genetical genomics
experiments. BMC Bioinformatics 10:188(2009)
http://gbic.biol.rug.nl/designGG
Y. Li, R. Breitling and R.C. Jansen. Generalizing genetical
genomics: the added value from environmental perturbation, Trends Genet
(2008) 24:518524.
E. Wit and J. McClure. Statistics for Microarrays: Design, Analysis
and Inference. (2004) Chichester: Wiley.
See Also
initialDesign
, designScore
,
updateDesign
, acceptanceProbability
,
experimentDesignTable
, plotAllScores
,
exampleArrayDesignTable
,exampleConditionDesignTable
,
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31  library(designGG)
#load genotype data
data(genotype)
#Example: singlechannel experiment with 2 environmental factors,
#each with 2 levels, and there will be four samples per condition(nTuple=4).
optimalDesign < designGG ( genotype, nSlides=NULL, nTuple=4, nEnvFactors=2,
nLevels=c(2,2),Level=list(c(16,24),c(5,10)), bTwoColorArray=FALSE,
initial=NULL, weight=1, region=seq(1,20), optimality="A",
method="SA", nIterations=100, n.search=2, endTemp=1e10,
startTemp=1, maxTempStep=0.9, plotScores=TRUE,
directory=NULL, fileName=NULL, envFactorNames=NULL,
writingProcess=FALSE )
#Example 2: dualchannel experiment with 2 environmental factors,
#each with 2 levels. There are 50 slides available.
optimalDesign < designGG ( genotype, nSlides=50, nTuple=NULL, nEnvFactors=2,
nLevels=c(2,2),Level=list(c(16,24),c(5,10)), bTwoColorArray=TRUE,
initial=NULL, weight=1, region=seq(1,20), optimality="A",
method="SA", nIterations=100, n.search=2, endTemp=1e10,
startTemp=1, maxTempStep=0.9, plotScores=TRUE,
directory=NULL, fileName=NULL, envFactorNames=NULL,
writingProcess=FALSE )
#result
optimalDesign$arrayDesign
optimalDesign$conditionDesign
plotAllScores(optimalDesign$plot.obj)
#Use the following commands to see example output tables:
data(exampleArrayDesignTable)
exampleArrayDesignTable
data(exampleConditionDesignTable)
exampleConditionDesignTable
