Description Usage Arguments Details Value References Examples
evolve_model
uses a genetic algorithm to estimate a finitestate
machine model, primarily for understanding and predicting decisionmaking.
1 2 3 4 5 6  evolve_model(data, test_data = NULL, drop_nzv = FALSE,
measure = c("accuracy", "sens", "spec", "ppv"), states = NULL,
cv = FALSE, max_states = NULL, k = 2, actions = NULL, seed = NULL,
popSize = 75, pcrossover = 0.8, pmutation = 0.1, maxiter = 50,
run = 25, parallel = FALSE, priors = NULL, verbose = TRUE,
return_best = TRUE, ntimes = 1)

data 
data.frame that has columns named "period" and "outcome" (period
is the time period that the outcome action was taken), and the rest of the
columns are predictors, ranging from one to three predictors. All of the
(35 columns) should be named. The period and outcome columns should be
integer vectors and the columns with the predictor variable data should be
logical vectors 
test_data 
Optional data.frame that has "period" and "outcome" columns
and rest of columns are predictors, ranging from one to three predictors.
All of the (35 columns) should be named. Outcome variable is the decision
the decisionmaker took for that period. This data.frame should be in the
same format and have the same order of columns as the data.frame passed to
the required 
drop_nzv 
Optional logical vector length one specifying whether
predictors variables with variance in provided data near zero should be
dropped before model building. Default is 
measure 
Optional length one character vector that is either:
"accuracy", "sens", "spec", or "ppv". This specifies what measure of
predictive performance to use for training and evaluating the model. The
default measure is 
states 
Optional numeric vector with the number of states.
If not provided, will be set to 
cv 
Optional logical vector length one for whether crossvalidation
should be conducted on training data to select optimal number of states.
This can drastically increase computation time because if 
max_states 
Optional numeric vector length one only relevant if

k 
Optional numeric vector length one only relevant if cv==TRUE, specifying number of folds for crossvalidation. 
actions 
Optional numeric vector with the number of actions. If not provided, then actions will be set as the number of unique values in the outcome vector. 
seed 
Optional numeric vector length one. 
popSize 
Optional numeric vector length one specifying the size of the GA population. A larger number will increase the probability of finding a very good solution but will also increase the computation time. This is passed to the GA::ga() function of the GA package. 
pcrossover 
Optional numeric vector length one specifying probability of crossover for GA. This is passed to the GA::ga() function of the GA package. 
pmutation 
Optional numeric vector length one specifying probability of mutation for GA. This is passed to the GA::ga() function of the GA package. 
maxiter 
Optional numeric vector length one specifying max number of
iterations for stopping the GA evolution. A larger number will increase the
probability of finding a very good solution but will also increase the
computation time. This is passed to the GA::ga() function of the GA
package. 
run 
Optional numeric vector length one specifying max number of consecutive iterations without improvement in best fitness score for stopping the GA evolution. A larger number will increase the probability of finding a very good solution but will also increase the computation time. This is passed to the GA::ga() function of the GA package. 
parallel 
Optional logical vector length one. For running the GA evolution in parallel. Depending on the number of cores registered and the memory on your machine, this can make the process much faster, but only works for Unixbased machines that can fork the processes. 
priors 
Optional numeric matrix of solutions strings to be included in the initialization. User needs to use a decoder function to translate prior decision models into bits and then provide them. If this is not specified, then random priors are automatically created. 
verbose 
Optional logical vector length one specifying whether helpful messages should be displayed on the user's console or not. 
return_best 
Optional logical vector length one specifying whether to return just the best model or all models. Only relevant if ntimes > 1. Default is TRUE. 
ntimes 
Optional integer vector length one specifying the number of times to estimate model. Default is 1 time. 
This is the main function of the datafsm package. It relies on the
GA package for genetic algorithm optimization. evolve_model
takes data on predictors and data on the outcome. It automatically creates a
fitness function that takes the data, an action vector evolve_model
generates, and a state matrix evolve_model
generates as input and
returns numeric vector of the same length as the outcome
.
evolve_model
then computes a fitness score for that potential solution
FSM by comparing it to the provided outcome
. This is repeated for every
FSM in the population and then the probability of selection for the next
generation is proportional to the fitness scores. The default is also for the
function to call itself recursively while varying the number of states inside
a crossvalidation loop in order to estimate the optimal number of states.
If parallel is set to TRUE, then these evaluations are distributed across the
available processors of the computer using the doParallel package,
otherwise, the evaluations of fitness are conducted sequentially. Because
this fitness function that evolve_model
creates must loop through all
the data every time it is evaluated and we need to evaluate many possible
solution FSMs, the fitness function is implemented in C++ so it is very fast.
evolve_model
uses a stochastic metaheuristic optimization routine to
estimate the parameters that define a FSM model. Generalized simulated
annealing, or tabu search could work, but they are more difficult to
parallelize. The current version uses the GA package's genetic
algorithm because GAs perform well in rugged search spaces to solve integer
optimization problems, are a natural complement to our binary string
representation of FSMs, and are easily parallelized.
This function evolves the models on training data and then, if a test set is provided, uses the best solution to make predictions on test data. Finally, the function returns the GA object and the decoded version of the best string in the population. See ga_fsm for the details of the slots (objects) that this type of object will have.
Returns an S4 object of class ga_fsm. See ga_fsm for the
details of the slots (objects) that this type of object will have and for
information on the methods that can be used to summarize the calling and
execution of evolve_model()
, including summary
, print
,
and plot
. Timing measurement is in seconds.
Luca Scrucca (2013). GA: A Package for Genetic Algorithms in R. Journal of Statistical Software, 53(4), 137. URL http://www.jstatsoft.org/v53/i04/.
1 2 3 4 5 6 7 8 9 10 11 12 13  # Create data:
cdata < data.frame(period = rep(1:10, 1000),
outcome = rep(1:2, 5000),
my.decision1 = sample(1:0, 10000, TRUE),
other.decision1 = sample(1:0, 10000, TRUE))
(res < evolve_model(cdata, cv=FALSE))
summary(res)
plot(res, action_label = c("C", "D"))
library(GA)
plot(estimation_details(res))
# In scripts, it can makes sense to set parallel to
# 'as.logical(Sys.info()['sysname'] != 'Windows')'.

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.