generate.dag.data: Generate nonlinear data from DAGs

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

These funtions create distributions on directed acyclic graphs. rdag generates a random DAG with a given number of edges by selecting 'nedges' at random from (p choose 2) possible edges, moralize gets the moral graph from a DAG, and generate.dag.data generates non-linear data by assigning each edge a cubic polynomial basis with random coefficients.

Usage

1
2
3
4
rdag(p, nedges)
moralize(g)
generate.dag.data(g, n, basesd = 1, basemean = 0, bfuns = function(x){cbind(x, x^2, x^3)}, 
	funclist = NULL, usenorm = T)

Arguments

g

a directed graph, as an 'igraph' object

p

number of vertices

n

number of observations

nedges

number of edges

bfuns

the basis functions for the structural equations. Note that when the basis functions generated, they are centered and scaled to have variance 1, so similar coefficients correspond to similar amounts of variance explained. Ignored if 'funclist' is supplied.

basesd

standard deviation of the random coefficients assigned to the basis functions. Ignored if 'funclist' is supplied.

basemean

means of the (random) coefficients assigned to the basis functions. Ignored if 'funclist' is supplied.

funclist

p by p list of functions determining the structural equations. funclist[[i]][[j]](x) is the effect of feature j on feature i. If this is omitted, the functions are generated at random from the bases supplied in bfuns.

usenorm

logical. whether to use normal or uniform errors

Details

Multivariate distributions with complicated conditional dependence structures corresponding to a particular graph are difficult to construct in general. However, constructing complicated distributions from a DAG is straighforward. These functions are meant to facilitate construction of complicated distributions on a DAG, and obtain the coresponding conditional independence structure.

generate.dag.data generates Normal(basemean,basesd) coefficients for the basis functions coresponding to each edge. This gives a function f_ij(x_j), which is standardized to have variance 1. Then data is generated for a feature conditioned on its parents by

x_i = sum_(j in parents(i)) f_ij + noise

where ‘noise’ is by default N(0,1). These data are returned, along with the generated functions which can be used in subsequent simulations.

Value

‘rdag’ returns an 'igraph' object.

‘generate.dat.data’ returns a list with two elements: an n by p matrix 'X' and a p by p list ‘funclist’, where funclist[[i]][[j]] is the effect of feature j on feature i.

'moralize' returns an undirected igraph object.

Author(s)

Arend Voorman

References

Voorman, Shojaie and Witten (2013). Graph Estimation with Joint Additive Models. Submitted to Biometrika. available on ArXiv or from authors upon request

See Also

igraph spacejam SJ plot.SJ

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#########Create graph and distribution used in Figure 2 of Voorman, Shojaie and Witten (2013):
p <- 100 #variables
n <- 50 #observations

#Generate Graph
set.seed(20)
g <- rdag(p,80)
mylayout <- layout.fruchterman.reingold(g)

par(mfrow=c(1,2))
plot(g, layout = mylayout, edge.color = "gray50", 
        vertex.color = "red", vertex.size = 3, vertex.label = NA, 
        edge.arrow.size = 0.4)
plot(moralize(g), layout = mylayout, edge.color = "gray50", 
        vertex.color = "red", vertex.size = 3, vertex.label = NA, 
        edge.arrow.size = 0.4)

#create a distribution on the DAG using cubic polynomials with random normal coefficients 
#with standard deviations of 1, 0.5 and 0.5, (i.e. giving more weight to linear association than quadratic or cubic)
data <- generate.dag.data(g,n,basesd=c(1,0.5,0.5))
X <- data$X

#Fit conditional independence graph at one lambda 
fit1 <- SJ(X, lambda = 0.6)

###For additional replications using the same DAG distribution use e.g.
data <- generate.dag.data(g,n,funclist = data$funclist)

spacejam documentation built on May 2, 2019, 9:13 a.m.