stableSpec: Stable specifications of constrained structural equation...

Description Usage Arguments Details Value Author(s) References Examples

Description

Search stable specifications (structures) of constrained structural equation models.

Usage

1
2
3
4
stableSpec(theData = NULL, nSubset = NULL, iteration = NULL,
  nPop = NULL, mutRate = NULL, crossRate = NULL, longitudinal = NULL,
  numTime = NULL, seed = NULL, co = NULL, consMatrix = NULL,
  threshold = NULL, toPlot = NULL, mixture = NULL, log = NULL)

Arguments

theData

a data frame containing the data to which the model will be be fit. If argument longitudinal is TRUE, the data frame should be reshaped such that the first n data points contain the relations that occur in the first two time slices t_0 and t_1. The next n data points contain the relations that occur in time slices t_1 and t_2. The i-th subset of n data points contain the relations in time slices t_i-1 and t_i. One can use function dataReshape to reshape longitudinal data. Uses the foreach package for parallel computation. You need to register a parallel backend before calling stableSpec if you want to parallize computation. For details see the foreach package.

nSubset

number of subsets to draw. In practice, it is suggested to have at least 25 subsets. The default is 10.

iteration

number of iterations/generations for NSGA-II.

nPop

population size (number of models) in a generation. The default is 50.

mutRate

mutation rate. The default is 0.075.

crossRate

crossover rate. The default is 0.85.

longitudinal

TRUE for longitudinal data, and FALSE for cross-sectional data.

numTime

number of time slices. If the data is cross-sectional, this argument must be set to 1.

seed

integer vector representing seeds that are used to subsample data. The default is an integer vector with range 100:1000 with length equal to nSubset.

co

whether to use "covariance" or "correlation" matrix. The default is "covariance".

consMatrix

m by 2 binary matrix representing constraint/prior knowledge, where m is the number of constraint. For example, known that variables 2 and 3 do not cause variable 1, then constraint <- matrix(c(2, 1, 3, 1), 2, 2, byrow=TRUE)) will be the constraint matrix. If NULL, then it is assumed that there is no constraint.

threshold

threshold of stability selection. The default is 0.6.

toPlot

if TRUE a plot of inferred causal model is generated, otherwise a graph object is returned. The default is TRUE.

mixture

if the data contains both continuous and categorical (or ordinal) variables, this argument can be set to TRUE. This implies the use of polychoric and polyserial correlation in the SEM computation. Note that, the categorical variables should be represented as factor or logical.

log

an optional logfile to monitor the progress of the algorithm.

Details

This function performs exploratory search over recursive (acyclic) SEM models. Models are scored along two objectives: the model fit and the model complexity. Since both objectives are often conflicting we use NSGA-II to search for Pareto optimal models. To handle the instability of small finite data samples, we repeatedly subsample the data and select those substructures that are both stable and parsimonious which are then used to infer a causal model.

Value

a list of the following elements:

Author(s)

Ridho Rahmadi r.rahmadi@cs.ru.nl, Perry Groot, Tom Heskes. Christoph Stich is the contributor for parallel support.

References

Rahmadi, R., Groot, P., Heins, M., Knoop, H., and Heskes, T. (2016) Causality on cross-sectional data: Stable specification search in constrained structural equation modeling. Applied Soft Computing, ISSN 1568-4946, http://www.sciencedirect.com/science/article/pii/S1568494616305130.

Rahmadi, R., Groot, P., Heins, M., Knoop, H., & Heskes, T. (2015). Causality on Longitudinal Data: Stable Specification Search in Constrained Structural Equation Modeling. Proceedings of AALTD 2015, 101.

Fox, J., Nie, Z., and Byrnes, J. (2015). sem: Structural Equation Models. R package version 3.1-6. https://CRAN.R-project.org/package=sem

Ching-Shih Tsou (2013). nsga2R: Elitist Non-dominated Sorting Genetic Algorithm based on R. R package version 1.0. https://CRAN.R-project.org/package=nsga2R

Kalisch, M., Machler, M., Colombo, D., Maathuis, M. H., and Buehlmann, P. (2012). Causal inference using graphical models with the R package pcalg. Journal of Statistical Software, 47(11), 1-26.

Meinshausen, N., and Buehlmann, P. (2010). Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4), 417-473.

Deb, K., Pratap, A., Agarwal, S., and Meyarivan, T. (2002), A fast and elitist multiobjective genetic algorithm: NSGA-II, IEEE Transactions on Evolutionary Computation, 6(2), 182-197.

Chickering, D. M. (2002). Learning equivalence classes of Bayesian-network structures. The Journal of Machine Learning Research, 2, 445-498.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# Cross-sectional data example,
# with an artificial data set of six continuous variables.
# Detail about the data set can be found in the documentation.
# As an example, we only run one subset.
# Note that stableSpec() uses foreach to support
# parallel computation, which could issue a warning
# when running sequentially as the following example. However
# the warning can be just ignored.

the_data <- crossdata6V
numSubset <- 1
num_iteration <- 5
num_pop <- 10
mut_rate <- 0.075
cross_rate <- 0.85
longi <- FALSE
num_time <- 1
the_seed <- NULL
the_co <- "covariance"
#assummed that variable 5 does not cause variables 1, 2, and 3
cons_matrix <- matrix(c(5, 1, 5, 2, 5, 3), 3, 2, byrow=TRUE)
th <- 0.1
to_plot <- FALSE
mix <- FALSE

result <- stableSpec(theData=the_data, nSubset=numSubset,
iteration=num_iteration,
nPop=num_pop, mutRate=mut_rate, crossRate=cross_rate,
longitudinal=longi, numTime=num_time, seed=the_seed,
co=the_co, consMatrix=cons_matrix, threshold=th,
toPlot=to_plot, mixture = mix)

##########################################################
## Parallel computation is possible by
## registering parallel backend, e.g., package doParallel.
## For example, add the following lines on top of
## the example above.
#
# library(parallel)
# library(doParallel)
# cl <- makeCluster(detectCores())
# registerDoParallel(cl)
#
## Then call stableSpec() as normal.
##
## Note that makeCluster() and detectCores() are
## from package parallel, and registerDoParallel()
## is from package doParallel. For more detail
## check the aforementioned packages' documentations.
###########################################################

stablespec documentation built on May 2, 2019, 10:14 a.m.