fit-methods: Fit a Conditional Gaussian Bayesian Network or Discrete...

Description Usage Arguments Details Value Author(s) See Also Examples

Description

Learn the structure of a genotype-phenotype network from quantitative trait loci (QTL) data and the conditional probability table for each node in the network.

Usage

1
2
3
4
5
## Fit a conditional gaussian or a discrete bayesian network using RHugin.
fit.gnbp(geno,pheno,constraints,learn="TRUE",graph,type ="cg",
                  alpha=0.001,tol=1e-04,maxit=0)
## Fit a discrete bayesian network using bnlearn.
fit.dbn(geno,pheno,graph,learn="TRUE",method="hc",whitelist,blacklist)

Arguments

geno

a data frame of column vectors of class factor (or one that can be coerced to that class) and non-empty column names.

pheno

a data frame of column vectors of class numeric for fit.gnpb if type = "cg" or class factor if type = "db" and for fit.dbn. Non-empty column names.

constraints

an optional list of constraints on the edges for specifying required and forbidden edges for fit.dbn. See details.

learn

a boolean value. If TRUE (default), the network structure will be learnt. If FALSE, only conditional probabilities will be learnt (a graph must be provided in this case.)

graph

graph structure of class "graphNEL" or a data frame with two columns of (labeled "from" and "to"), containing a set of edges to be included in the graph to be provided if learn == FALSE. See details.

type

specify the type of network for fit.gnbp. "cg" for Conditional Gaussian (default) and "db" for Discrete Bayesian.

method

a character string. The score-based or constraint-based algorithms available in the package bnlearn. Valid options are "hc", "tabu", "gs", "iamb", "fast.iamb", "inter.iamb", "mmhc". See details below.

whitelist

a data frame with two columns of (labeled "from" and "to"), containing a set of edges to be included in the graph.

blacklist

a data frame with two columns (labeled "from" and "to"), containing a set of edges NOT to be included in the graph.

alpha

a single numeric value specifying the significance level (for use with RHugin). Default is 0.001.

tol

a positive numeric value (optional) specifying the tolerance for EM algorithm to learn conditional probability tables (for use with RHugin). Default value is 1e-04. See learn.cpt for details.

maxit

a positive integer value (optional) specifying the maximum number of iterations of EM algorithm to learn conditional probability tables (for use with RHugin). See learn.cpt for details.

Details

The function fit.gnbp fits a conditional gaussian bayesian network or a discrete bayesian network at the specified level of significance alpha, to genotype-phenotype (QTL) data by the PC algorithm implemented in the RHugin package. The conditional probability tables are learnt for each node in the domain by the EM algorithm implemented in the RHugin package.

Edges between the genotypes at SNP markers are not allowed and the genotypes are constrained to precede the phenotypes. The phenotypes should be either all numeric or all discrete. The function does not currently support mixture of discrete and continuous phenotypes. Additional domain knowledge in terms of edges should be provided as a list of constraints, the structure of which is described in detail in learn.structure. Briefly, the constraints argument is a list of two elements: directed and undirected. Each of these elements in turn should be a list with two elements: required and forbidden. The elements of required and forbidden must be a character vector of length two specifying the names of the nodes. See learn.cpt for details.

Note that this function works on Hugin domains. Since Hugin domains are external pointers and cannot be saved in R workspace, the RHugin package provides functions read.rhd and write.rhd for loading and saving the Hugin domains. See RHugin documentation for more information.

The function fit.dbn infers a discrete bayesian network structure from genotype-phenotype (QTL) categorical data by implementing score based and constraint based algorithms from the bnlearn package. The conditional probability tables are learnt for each node in the inferred network. The phenotypes should be ALL discrete variables. Additional domain knowledge in terms of edges should be provided as a whitelist and blacklist.Edges between the genotypes at SNP markers are not allowed and the genotypes are constrained to precede the phenotypes.

The supported algorithms from bnlearn are

  1. Score-based: Hill-Climbing (hc,default), Tabu Search (tabu)

  2. Constraint-based: Grow-Shrink (gs), Incremental Association (iamb), Fast Incremental Association (fast.iamb), Interleaved Incremental Association (inter.iamb)

  3. Hybrid: Max-Min Hill-Climbing (mmhc).

The algorithm can be specified by method. Structure learning functions are implemented with their default parameters. If different parameter values are desired, it is recommended to learn the network structure independently using the bnlearn package.The inferred structure can be input as a graph object to fit.dbn and then set learn="FALSE".

Value

fit.gnbp returns an object of class "gpfit" containing the following components.

gp

a pointer to a compiled RHugin domain that is the inferred network structure and the conditional probability tables for each node in the network.

marginal

a list of marginal probabilities for phenotypes (pheno) and genotypes (geno)

gp_nodes

a data frame containing information about nodes for internal use with other functions.

gp_flag

a character string specifying the type of network : "cg" for Conditional Gaussian or "db" for Discrete Bayesian.

fit.dbn returns an object of class "dbnfit" containing the following components.

dbn

an object of class bn. See bn-class for details. This object contains the inferred network structure and the conditional probability tables for each node in the network.

marginal

a list of marginal probabilities for phenotypes (pheno) and genotypes (geno)

dbn_nodes

a data frame containing information about nodes for internal use with other functions.

dbn_flag

a character string specifying the type of network "dbn" for Discrete Bayesian.

Author(s)

Janhavi Moharil <janhavim@buffalo.edu>

See Also

plot.gpfit, plot.dbnfit, absorb.gnbp, For discrete bayesian networks : fit.dbn, absorb.dbn

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
## Not run: 
## load the mouse kidney eQTL dataset
data(mouse)

## get genotype and phenotype data
mousegeno<-mouse[,1:5]
mousepheno<-mouse[,6:19]

## Simple example : Fit a bayesian network to genotype-phenotype data using the default values
fit.gnbp(mousegeno,mousepheno)

## Fit a bayesian network to genotype-phenotype data at a specified significance level and plot it
mouse.cgbn<-fit.gnbp(mousegeno,mousepheno,alpha = 0.1)
plot(mouse.cgbn)

## load yeast dataset
data(yeast)

## get genotype and phenotype data
yeastgeno<-yeast[,1:12]
yeastpheno<-yeast[,13:50]

## Simple example : Fit a discrete bayesian network to genotype-phenotype data
fit.dbn(yeastgeno,yeastpheno)

## Fit a discrete bayesian network by Tabu method and plot it.
yeast.dbn.tabu<-fit.dbn(yeastgeno,yeastpheno,method="tabu")
plot(yeast.dbn.tabu)

## End(Not run)

geneNetBP documentation built on May 2, 2019, 9:41 a.m.