RUMdesignSimulator

The package RUMdesignSimulator proposes convenient tools for generating synthetic data for decision theory.
Firstly, Alternatives, Decision Makers and Preference Coefficients are easily generated. Then, experimental designs are generated in format long or wide. In addition, the effect of each variable can be visualized in a 3D graph.

Installation:

devtools::install_github("AntoineDubois/RUMdesignSimulator")

Tutorial

This is an R Markdown present the package RUMdesignSimulator. This package generates experimental designs from real data or from probabilistic distributions. First of all, we need to install and load the package. One should install the package devtools if necessary.

knitr::opts_chunk$set(echo = TRUE)
#devtools::install_github("AntoineDubois/RUMdesignSimulator")
library(RUMdesignSimulator)

Now, we can define the setup of the experiment. Thus, we define the name of the alternatives as well as their attributes. In addition, we define the number of decision makers as well as their characteristics.

DM_att_names <- list("X1", "X2", "X3") # the list of the decision makers' characteristics
AT_names <- list("good1", "good2", "good3", "good4") # the list of the alternatives' names
AT_att_names <- list("Z1", "Z2", "Z3") # the list of the alternatives' attributes
groups <- c(10, 20) # the groups of decision makers

Then, we initialize the instance of the class Experiment. Furthermore, Experiment is a \textit{reference class}. This type of class is the most flexible embeded in R.

FD <- Experiment(DM_att_names=DM_att_names, AT_att_names=AT_att_names, AT_names=AT_names,
                 groups=groups, no_choice=TRUE) # creation of an instance of the call Experiment

Since the instance FD is an instance of the class Experiment, we can use methods to generate decision makers characteristics according to distributions or from data. To know which laws are implemented in this package, we use the function \textbf{information}.

information()

We have chosen the distributions underneath for generating decision makers characteristics.

 # the characteristics of X1 are drawn from a data set
FD$gen_DM_attributes("empirical", data = data.frame(X1=c(0.5, 0, 12, 6, 7.3)), which = "X1")

# the characteristics X2 and X3 follow a standardized normal distribution within the group 1
FD$gen_DM_attributes("normal", which=c("X2", "X3"), group=1) 

# the characteristics X2 and X3 follow a normal distribution with mean 1 and 2 standard deviation within the group 2
FD$gen_DM_attributes("normal", mu=1, sd=2, which=c("X2","X3"), group=2)

FD$X

In addition, we can observe cross efffects between the decision makers' characteristics.

FD$gen_DM_attributes(observation=~X1+X2+X3+I(X1*X2))
FD$X

Similarly, we generate alternatives' attributes

# generation of a random covariance matrix of size 3
sigma <- clusterGeneration::genPositiveDefMat(3)$sigma

# all the attributes are generated by a multivariate normal distribution of mean (-1, 2, 0) and covariance matrix sigma
FD$gen_AT_attributes(mu=c(-1,2,0), sd=sigma) 

# observation of complex effects between the alternatives' attributes
FD$gen_AT_attributes(observation=~Z1+Z2+Z3+I(Z1^2))

FD$Z

and decision makers' preferences

#Generation of beta whose components law's are different:

# generation of the variables from 1 to 4 of the alternatives within the group 1
FD$gen_preference_coefficients("student", heterogeneity=TRUE, location=-2,  scale=1, df=4, which=c(1:4), group=1)

# generation of the variables from 1 to 4 of the alternatives within the group 2
FD$gen_preference_coefficients("student", heterogeneity=FALSE, location=2,  scale=1, df=4, which=c(1:4), group=2) 

# generation of the fifth variable within every group
FD$gen_preference_coefficients("normal", heterogeneity=FALSE, mu=0, sd=2, which=5) 

# rectification, the variable Z2 follows a discrete uniform distribution
FD$gen_preference_coefficients("discrete_uniform", heterogeneity=TRUE, a=1, b=5, which="Z2") 

# generation of the variable Z3 and I(Z1^2) according to the default distribution: the standardized normal distribution
FD$gen_preference_coefficients(heterogeneity=TRUE, which=c("Z3", "I(Z1^2)")) 

FD$beta

Finally, we compute the utility provided to each decision makers by each alternative. To do so, we generate measurement error.

# computation of the decision makers' utility according to the standardized Gumbel distribution
FD$utility()

# computation of the decision makers' utility according to the discrete uniform distribution
FD$utility("discrete_uniform") 

# It is possible to have correlation between alternatives preference (for both student and normal distributions)
FD$utility("normal", mu=0, sd=2) 

# computation of the decision makers' utility according to a student distribution
FD$utility("student", location=0, scale=2, df=4)

Here, we take a look at:

FD$V # the representative utility
FD$Epsilon # the measurement error
FD$U # the utility of each alternative for each decision maker
FD$choice_order # the order of alternative preference for each decision maker

FD$choice # the most usefull alternative for each decision maker

A good advantage of the package RUMdesignSimulator consists in its plot method. The method \textbf{$map(...)} returns a scatter plot. On this graph, the x-axis, y-axis and z-axis represent the value of two parameters (attributes and characteristics) and the utility provided by the optimal alternative for any decision maker.

# Drawing a 3D preference mapping:

# Map representing the choice of the decision makers and the utility provided by this choice according to the value of Z1 and Z3
FD$map("Z1", "Z3") 

FD$map("X1", "Z3")

FD$map("X1", "X2")
# Generation of designs:

# generation of the full factorial design with row data
FFD <- FD$design(choice_set_size=2, clustered=0) 
#by default, name="FuFD", choice_set_size = nb_alternatives
View(FFD)

Henceforth, the alternatives, decision makers, preference coefficients and associated utility are entirely setup. In consequence, we can draw experimental designs. Below, we build a full factorial experimental design where the number of alternatives within each choice set is 2. Moreover, the attributes and characteristics are not treated.

Often, the data is treated. Sometimes, the data is clustered. The number of each cluster is called \textit{level}. Furthermore, the clusters are formed by running k-means algorithms. Finally, after clustering, the new value of an attribute or a characteristic is the average of its cluster.

FFD <- FD$design(name="FuFD",choice_set_size=2, clustered=1, nb_levels_DM=c(3, 3, 4, 2), nb_levels_AT=c(3, 2, 2, 4)) # generation of the full factorial design with glustered data

In addition, after clustering, the new value of an attribute or a characteristic may be the numero of its cluster. This is done by defining \textbf{clustered=2}.

# generation of the full factorial design with categorial data
FFD1 <- FD$design(choice_set_size=2, clustered=2, nb_levels_DM=c(2, 3, 4, 2), nb_levels_AT=c(2, 2, 2, 2))

Unfortunately, the number of questions asked to each decision maker is most of the time too big to be realistic. In consequence, only a random subset of questions can be asked to the decision makers. The result is a \textit{random fractional factorial design}. The number of question asked to each decision maker is \textbf{nb_question=2}.

FFD2 <- FD$design(name="FrFD", choice_set_size=2, clustered=2, nb_levels_DM=c(2, 3, 4, 2), nb_levels_AT=c(2, 2, 2, 2), nb_questions = 2) # Generation a a random fractional factorial design with categorial data

Yet, we want to express this design in wide format.

FFD3 <- FD$design(name="FrFD", choice_set_size=2, clustered=2, nb_levels_DM=c(2, 3, 4, 2), nb_levels_AT=c(2, 2, 2, 2), nb_questions = 2, format="wide") 

Finally, a small summary function calls some elements back.

summary.Exepriment(FD) # a summary of the experimental design

Developpement

Files

Adding new features

Some user may need more tools than the actual ones. Anticipating future needs, we organized the R files so that only one file need to be altered.

To add new distributions: open the file distribution.R add a new distribution reference the new distribution into the function generation*, give it a relevant name for calling

To add new designs: open the file designs.R implement a new design reference the new design into the function call_design*, give it a relevant name for calling

For more information, do not hesitate to contact me at antoine.dubois.fr@gmail.com



AntoineDubois/RUMdesignSimulator documentation built on Dec. 17, 2021, 8:53 a.m.