Network Search for Given Node Order

Description

The function implements a MLE based algorithm to search for optimal networks complying with a given node order. It returns a list of networks, with complexities up to some maximal value, that best fit the data.

Usage

1
2
3
4
5
cnSearchOrder(data, pert=NULL, 
	maxParentSet=0, parentSizes=NULL, maxComplexity=0, 
	nodeOrder=NULL, 
	nodeCats=NULL, parentsPool=NULL, fixedParents=NULL, edgeProb=NULL,
	echo=FALSE, softmode=FALSE, dagsOnly = FALSE, classes = NULL, clsdist=1)

Arguments

data

a matrix in row-nodes format or a data.frame in column-nodes format

pert

a binary matrix with the dimensions of data. A value 1 marks that the node in the corresponding sample as perturbed

maxParentSet

an integer, maximal number of parents for all nodes

parentSizes

an integer vector, maximal number of parents per node

maxComplexity

an integer, the maximal network complexity for the search

nodeOrder

a vector specifying a node order; the search is among the networks consistent with this topological order

nodeCats

a list of node categories

parentsPool

a list of parent sets to choose from

fixedParents

a list of parent sets to choose from

edgeProb

a square matrix of length the number of nodes specifying prior edge probabilities

echo

a logical, turns on/off some progress information

softmode

a logical, turns on/off the soft quantization mode

dagsOnly

a logical, selects between catNetwork and DAG only search

classes

a binary matrix with the dimensions of data that assigns a class to each node-observation

clsdist

class separation distance function, currently 1(‘chisq’) and 2(‘kl’) are supported

Details

The data can be a matrix of character categories with rows specifying the node-variables and columns assumed to be independent samples from an unknown network, or a data.frame with columns specifying the nodes and rows being the samples.

The number of node categories are obtained from the sample. If given, the nodeCats is used as a list of categories. In that case, nodeCats should include the node categories presented in the data.

The function returns a list of networks, one for each admissible complexity within the specified range. The networks in the list are the Maximum Likelihood estimates in the class of networks having the given topological order of the nodes and complexity. When maxComplexity is not given, thus zero, its value is reset to the maximum possible complexity for the given parent set size. When nodeOrder is not given or NULL, the order of the nodes in the data is taken, 1,2,....

The parameters parentsPool and fixedParents allow the user to put some exclusion/inclusion constrains on the possible parenthood of the nodes. They should be given as lists of index vectors, one for each node.

The rows in edgeProb correspond to the nodes in the sample. The [i,j]-th element in edgeProb specifies a prior probability for the j-th node to be a parent of the i-th one. In calculating the prior probability of a network all edges are assumed independent Bernoulli random variables. The elements of edgeProb are cropped in the range [0,1], such that the zero probabilities effectively exclude the corresponding edges, while the ones force them.

Value

A catNetworkEvaluate object

Author(s)

N. Balov

Examples

1
2
3
4
5
6
7
8
  cnet <- cnRandomCatnet(numnodes=12, maxpars=3, numcats=2)
  psamples <- cnSamples(object=cnet, numsamples=100)
  nodeOrder <- sample(1:12)
  nets <- cnSearchOrder(data=psamples, pert=NULL, 
		maxParentSet=2, maxComplexity=36, nodeOrder)
  ## next we find the network with complexity of the original one and plot it
  cc <- cnComplexity(object=cnet)
  cnFind(object=nets, complx=cc)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.