Dopt.design: Function for creating D-optimal designs with or without...

View source: R/Dopt.design.R

Dopt.designR Documentation

Function for creating D-optimal designs with or without blocking from package AlgDesign

Description

Function for comfortably creating a D-optimal design with or without blocking based on functions optFederov or optBlock from package AlgDesign; this functionality is still somewhat experimental.

Usage

Dopt.design(nruns, data=NULL, formula=~., factor.names=NULL, nlevels=NULL, 
    digits=NULL, constraint=NULL, center=FALSE, nRepeats=5, seed=NULL, randomize=TRUE, 
    blocks=1, block.name="Blocks", wholeBlockData=NULL, qual=NULL, ...)

Arguments

nruns

number of runs in the requested design

data

data frame or matrix of candidate design points;
if data is specified, factor.names and levels are ignored

formula

a model formula (starting with a tilde), for the estimation of which a D-optimal design is sought;
it can contain all column names from data or elements or element names from factor.names, respectively;
usage of the “.”-notation for “all variables” from data or factor.names is possible.
The default formula linearly includes all main effects for columns of data or factors from factor.names respectively, by using the “.”-notation. Note that the variables from wholeBlockData must be explicitly included into the formula and are not covered by the “.”-notation for “all variables”. (Thus, the default formula does not work, if wholeBlockData is used.) For quantitative factors, functions quad() and cubic describe the full quadratic or full cubic model in the listed variables (cf. examples and the expand.formula-function from package AlgDesign).

factor.names

is used for creating a candidate set (for the within Block factors) with the help of function fac.design, if data is not specified. It is a list of vectors which contain
- individual levels
- or (in case of numerical values combined with nlevels) lower and upper scale end values
for each factor.
The element names are used as variable names;
if the list is not named, the variable names are A, B and so forth (from function fac.design).
factor.names can also be a character vector. In this case, nlevels must be specified, and levels are automatically assigned as integers starting with 1, which implies quantitative factors, unless qual=TRUE is specified.

nlevels

can be omitted if the list factor.names explicitly lists all factor levels (which of course defines the number of levels).
For numeric factors for which factor.names only specifies the two scale ends, these are filled with equally-spaced intermediate points, using the nlevels entry as the length.out argument to function seq.
If factor.names is a character vector of factor names only, nlevels is required, and default levels are created.

digits

is used for creating a candidate set if data is not specified.
It specifies the digits to which numeric design columns are rounded in case of automatic creation of intermediate values. It can consist of one single value (the same for all such factors) or a numeric vector of the same length as factor.names with integer entries.

constraint

a condition (character string!) used for reducing the candidate set to admissible points only. constraint is evaluated on the specified data set or after automatic creation of a full factorial candidate data set.
The variable names from data or factor.names can be used by the constraint. The variable names from wholePlotData can NOT be used.
See Syntax and Logic for an explanation of the syntax of general and especially logical R expressions.

center

requests that optimization is run for the centered model; the design is nevertheless output in non-centered coordinates

nRepeats

number of independent repeats of the design optimization process; increasing this number may improve the chance of finding a global optimum, but will also increase search time

seed

seed for generation and randomization of the design (integer number);
here, the seed is needed even if the design is not randomized, because the generation process for the optimum design involves random numbers, even if the order of the final design is not randomized;
if a reproducible design is needed, it is therefore recommended to specify a seed.

In R version 3.6.0 and later, the default behavior of function sample has changed. If you work in a new (i.e., >= 3.6.-0) R version and want to reproduce a randomized design from an earlier R version (before 3.6.0), you have to change the RNGkind setting by
RNGkind(sample.kind="Rounding")
before running function Dopt.design.
It is recommended to change the setting back to the new recommended way afterwards:
RNGkind(sample.kind="default")
For an example, see the documentation of the example data set VSGFS.

randomize

logical deciding whether or not the design should be randomized; if it is TRUE, the design (or the additional portion of the design) returned by the workhorse function optFederov is brought into random order after generation. Note that the generation process itself contains a random element per default; if exact repeatability for the returned design is desired, it is necessary to specify a seed (option seed) if in the case randomize=FALSE.

blocks

a single integer giving the number of blocks (default 1, if no blocking is needed)
OR
a vector of block sizes which enable blocks of different sizes;
for a scalar value, nruns must be divisible into blocks equally-sized blocks; for a vector value, the block sizes must add up to nruns.
If blocking is requested, the following two options are potentially important.

block.name

character string: name of the blocking variable (used only if blocks are requested)

wholeBlockData

optional matrix or data frame that specifies the whole block characteristics;
can only be used if blocks are requested; if used, it must have as many rows as there are block sizes.
If this is specified, the resulting design is a split-plot design with the whole-plot factors specified in wholeBlockData, the split-plot factors specified in data. Note that usage of this option makes it necessary to explicitly specify a formula.

Since wholeBlockData must be completely specified by the user, optimization is for the split-plot portion of the design only. The rationale is (assumably) that the characteristics of the available blocks are known. If this is not the case, users may want to try out various possible whole block setups, or to proceed sequentially by first optimizing a whole block design for a model with the whole block factors only and subsequently using this model for adding split-plot factors.

qual

optional logical (length 1 or same as number of factors); ignored, if data is specified; overrides automatic determination of whether or not factors are quantitative;
if neither qual nor data are specified, factors are per default quantitative, unless they have non-numeric levels in a list-valued factor.names

...

additional arguments to functions optFederov or optBlock (if blocking is requested) from package AlgDesign;
interesting arguments for optFederov: maxIteration, nullify (calculate good starting design, especially set to 1, in which case nRepeats is set to 1);
arguments criterion and augment are not available, neither are evaluateI, space, or rows, and args does not have an effect.

Details

Function Dopt.design creates a D-optimal design, optionally with blocking, and even as a split-plot design. If no blocks are required, calculations are carried out through function optFederov from package AlgDesign. In case of blocked designs, function optBlock from package AlgDesign is behind the calculations. By specifying wholeBlockData, a blocked design becomes a split-plot design. The model formula can refer to both the within block data (only those are referred to by the “.” notation) and the whole block data and interactions between both.
In comparison to direct usage of package AlgDesign, the function adds the possibility of automatically creating the candidate points on the fly, with or without constraints. Furthermore, it embeds the D-optimal designs into the class design. On the other hand, it sacrifices some of AlgDesigns flexibility; of course, users can still use AlgDesign directly.

The D-optimal designs are particularly useful, if the classical regular designs are too demanding in run size requirements, or if constraints preclude automatic generation of orthogonal designs. Note, however, that the best design in few runs can still be very bad in absolute terms!

When specifying the design without the data option, a full factorial in the requested factors is the default candidate set of design points. For some situations - especially with many factors - it may be better to start from a restricted candidate set. Such a candidate set can be produced with another R function, e.g. oa.design or FrF2, or can be manually created.

If there are doubts, whether the process has delivered a design close to the absolute optimum, nRepeats can be increased.

For unblocked designs, it is additionally possible to increase maxIteration. Also, improving the starting value by nullify=1 or nullify=2 may lead to an improved design. These options are handed through to function optFederov from package AlgDesign and are documented there.

Value

The function returns a data frame of S3 class design with attributes attached. The data frame contains the experimental settings. The matrix desnum attached as attribute desnum contains the model matrix of the design, using the formula as specified in the call.
Function Dopt.augment preserves additional variables (e.g. responses) that have been added to the design design before augmenting. Note, however, that the response data are NOT used in deciding about which points to augment the design with.

The attribute run.order provides the run number in standard order (as returned from function optFederov in package AlgDesign) as well as the randomized actual run order. The third column is always identical to the first.

The attribute design.info is a list of various design properties, with type resolving to “Dopt”, “Dopt.blocked”, “Dopt.splitplot”. In addition to the standard list elements (cf. design), the element quantitative is a vector of nfactor logical values or NAs, and the optional digits elements indicates the number of digits to which the data were rounded. For blocked and splitplot designs, the list contains additional information on numbers and sizes of blocks or plots, as well as the number of whole plot factors (which are always the first few factors) and split-plot factors.
The list contains a list of optimality criteria as calculated by function optFederov, see documentation there) with elements D, Dea, A and G.

(Note that replications is always 1 and repeat.only is always FALSE; these elements are only present to fulfill the formal requirements for class design. Note however, that blocked designs do in fact repeat experimental runs if nruns and blocks imply this.)

Warning

Since R version 3.6.0, the behavior of function sample has changed (correction of a biased previous behavior that should not be relevant for the randomization of designs). For reproducing a design that was produced with an earlier R version, please follow the steps described with the argument seed.

Note

This package is still under (slow) development. Reports about bugs and inconveniences are welcome.

Author(s)

Ulrike Groemping

References

Atkinson, A.C. and Donev, A.N. (1992). Optimum experimental designs. Clarendon Press, Oxford.

Federov, V.V. (1972). Theory of optimal experiments. Academic Press, New York.

Wheeler, R.E. (2004). Comments on algorithmic design. Vignette accompanying package AlgDesign. ../../AlgDesign/doc/AlgDesign.pdf.

See Also

See also optFederov, fac.design, quad, cubic, Dopt.augment. Furthermore, unrelated to function Dopt.design, see also function gen_design from package skpr for a new general R package for creating D-optimal or other letter optimal designs.

Examples

   ## a full quadratic model with constraint in three quantitative factors 
   plan <- Dopt.design(36,factor.names=list(eins=c(100,250),zwei=c(10,30),drei=c(-25,25)),
                          nlevels=c(4,3,6), 
                          formula=~quad(.), 
                          constraint="!(eins>=200 & zwei==30 & drei==25)")
   plan
   cor(plan)
   y <- rnorm(36)
   r.plan <- add.response(plan, y)
   plan2 <- Dopt.augment(r.plan, m=10)
   plot(plan2)
   cor(plan2)
   
   ## designs with qualitative factors and blocks for
   ## an experiment on assessing stories of social situations
   ## where each subject is a block and receives a deck of 5 stories
   plan.v <- Dopt.design(480, factor.names=list(cause=c("sick","bad luck","fault"), 
             consequences=c("alone","children","sick spouse"),
             gender=c("Female","Male"),
             Age=c("young","medium","old")),
             blocks=96,
             constraint="!(Age==\"young\" & consequences==\"children\")",
             formula=~.+cause:consequences+gender:consequences+Age:cause)
   ## an experiment on assessing stories of social situations
   ## with the whole block (=whole plot) factor gender of the assessor
   ##    not run for saving test time on CRAN
   ## Not run: plan.v.splitplot <- Dopt.design(480, factor.names=list(cause=c("sick","bad luck","fault"), 
             consequences=c("alone","children","sick spouse"),
             gender.story=c("Female","Male"),
             Age=c("young","medium","old")),
             blocks=96,
             wholeBlockData=cbind(gender=rep(c("Female","Male"),each=48)),
             constraint="!(Age==\"young\" & consequences==\"children\")",
             formula=~.+gender+cause:consequences+gender.story:consequences+
                 gender:consequences+Age:cause+gender:gender.story)
## End(Not run)

DoE.wrapper documentation built on Aug. 21, 2023, 5:10 p.m.