minForest: Minimum forest

Description Usage Arguments Details Value Author(s) References Examples

View source: R/functions.r

Description

Returns the forest that minimises the -2*log-likelihood, AIC, or BIC using Chow-Liu lgorithm.

Usage

1
2
minForest(dataset,homog=TRUE,forbEdges=NULL,stat="BIC",
          cond=NULL,...)

Arguments

dataset

matrix or data frame (nrow(dataset) observations and ncol(dataset) variables).

homog

TRUE for homogeneous covariance structure, FALSE for heterogeneous. This is only meaningful with mixed models. Default is homogeneous (TRUE).

forbEdges

matrix specifying edges that should not be considered. Matrix with 2 columns, each row representing one edge, and each column one of the vertices in the edge. Default is NULL.

stat

measure to be minimized: LR (-2*log-likelihood), AIC, or BIC. Default is LR. It can also be a user defined function with format: FUN(newEdge, numCat, dataset); where numCat is a vector with number of levels for each variable (0 if continuous); newEdge is a vector with length two; and dataset is a matrix (n by p).

cond

list with complete sets of vertices, to specify mandatory edges.

...

arguments to be passed to the user function in stat.

Details

Returns for the tree or forest that minimizes the -2*log-likelihood, AIC, or BIC. If the log-likelihood is used, the result is a tree, if AIC or BIC is used, the result is a tree or forest. The dataset contains variables (vertices) in the columns, and observations in the rows. The result has vertices numbered according to the column indexes in vertNames.
All discrete variables must be factors. All factor levels must be represented in the data. Missing values are not allowed.

Value

A list containing:

edges

matrix with 2 columns, each row representing one edge, and each column one of the vertices in the edge. Column 1 contains the vertex with lower index.

p

number of variables (vertices) in the model.

stat.minForest

measure used (LR, AIC, or BIC).

statSeq

vector with value of stat.minForest for each edge.

vertNames

vector with the original vertices' names. If there are no names in dataset then the vertices will be named according to the original column indexes in dataset.

numCat

vector with number of levels for each variable (0 if continuous).

homog

TRUE if the covariance is homogeneous.

numP

vector with number of estimated parameters for each edge.

minForest

first and last edges found with minForest.

Author(s)

Gabriel Coelho Goncalves de Abreu (abreu_ga@yahoo.com.br)
Rodrigo Labouriau (Rodrigo.Labouriau@math.au.dk)
David Edwards (David.Edwards@agrsci.dk)

References

Chow, C.K. and Liu, C.N. (1968) Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, Vol. IT-14, 3:462-7.
Edwards, D., de Abreu, G.C.G. and Labouriau, R. (2010). Selecting high- dimensional mixed graphical models using minimal AIC or BIC forests. BMC Bioinformatics, 11:18.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
set.seed(7,kind="Mersenne-Twister")
dataset <- matrix(rnorm(1000),nrow=100,ncol=10)
m <- minForest(dataset,stat="BIC")

#######################################################################
# Example with continuous variables
data(dsCont)
# m1 <- minForest(dataset,homog=TRUE,forbEdges=NULL,stat="LR")
#          1. in this case, there is no use for homog
#          2. no forbidden edges
#          3. the measure used is the LR (the result is a tree)
m1 <- minForest(dsCont,homog=TRUE,forbEdges=NULL,stat="LR")
plot(m1,numIter=1000)

#######################################################################
# Example with discrete variables
data(dsDiscr)
# m1 <- minForest(dataset,homog=TRUE,forbEdges=NULL,stat="LR")
#          1. in this case, there is no use for homog
#          2. no forbidden edges
#          3. the measure used is the LR (the result is a tree)
m1 <- minForest(dsDiscr,homog=TRUE,forbEdges=NULL,stat="LR")
plot(m1,numIter=1000)

#######################################################################
# Example with mixed variables
data(dsMixed)
m1 <- minForest(dataset,homog=TRUE,forbEdges=NULL,stat="LR")
#          1. it is to be considered homogeneous
#          2. no forbidden edges
#          3. the measure used is the LR (the result is a tree)
m1 <- minForest(dsMixed,homog=TRUE,forbEdges=NULL,stat="LR")
plot(m1,numIter=1000)

#######################################################################
# Example using a user defined function
#   The function userFun calculates the same edges weigths as the 
# option stat="LR". It means that the final result, using either 
# option, is the same.
userFun <- function(newEdge,numCat,dataset)
{
  sigma <- var(dataset[,newEdge])
  v <- nrow(dataset)*log(prod(diag(sigma))/det(sigma))
  return(c(v,1))
}

data(dsCont)
m <- minForest(dsCont,stat="LR")
m1 <- minForest(dsCont,stat=userFun)
identical(m@edges,m1@edges)

#######################################################################
# Example with mandatory edges (the so-called conditional Chow-Liu 
# algorithm).  The edges (1,2), (1,3) and (2,3) are specified as 
# mandatory. The algorithm returns the optimal graph containing the 
# mandatory edges such that only cycles with mandatory edges are 
# allowed.
data(dsCont)
m1 <- minForest(dsCont,cond=list(1:3))
## Not run: plot(m1)  

gRapHD documentation built on Feb. 9, 2018, 6:05 a.m.

Related to minForest in gRapHD...