R/bninfo.R

#' Bootstrap Learning Performed on a Gaussian Bayesian Network.
#'
#'  The 15 node network structure was simulated with Melancon's and Philippe's
#'  Uniform Random Acyclic Digraphs algorithm.  This produces a highly connected DAG.
#'  See \code{?random.graph} for more information.
#'
#' \itemize{
#'   \item truth. An object of \code{bn.fit}, a simulated structure and parameter set.
#'   \item boot.  An object of \code{bn.strength} and \code{data.frame}, results of
#'   model averaging applied to data simulated from truth.
#' }
#'
#' @docType data
#' @keywords datasets
#' @name melancon_boot
#' @usage data(melancon_boot)
#' @format A list with 2 items
NULL

#' Bootstrap Learning Performed on a Gaussian Bayesian Network.
#'
#'  The 40 node network structure was simulated with full ordering based generation.
#'  This produces a sparsely connected DAG.  See \code{?random.graph} for more information.
#'
#' \itemize{
#'   \item truth. An object of \code{bn.fit}, a simulated structure and parameter set.
#'   \item boot.  An object of \code{bn.strength} and \code{data.frame}, results of
#'   model averaging applied to data simulated from truth.
#' }
#'
#' @docType data
#' @keywords datasets
#' @name ordered_boot
#' @usage data(ordered_boot)
#' @format A list with 2 items
NULL

#' Dream 4 signaling data
#'
#' @docType data
#' @keywords datasets
#' @name dream_net
#' @usage data(dream_net)
#' @format An object of the class bn.fit

#' T-cell signaling data
#'
#' This data consists of simultaneous measurements of 11 phosphorylated proteins and phospholypids derived from
#' thousands of individual primary immune system cells, specifically T cells.  When T cells are stimulated,
#' the signal flows across a series of physical interactions between the measured proteins.  The network of these
#' interactions forms the T cell signalling pathway.
#'
#' Causal Bayesian networks can be to represent or learn the signalling pathway from this data.
#' Not all the proteins involved in the modelling pathways are observed.
#'
#' The literature-validated Bayesian network representation of this network is available by loading the
#' \code{tcell_examples} dataset.
#'
#' Two versions of this data are included in bninfo.  The first is the raw data from the 2005 publication.
#' This list of datasets is commonly used for learning graphical models of cell signalling. Each element of the list is a dataset
#' having undergone a different perturbation. The perturbations are intended to reveal causal influences between
#' proteins.  Each column of each dataset represents a signalling protein.  The values in each column
#' correspond to the abundance of the activate state of that protein, or more generally, the level of activity
#' for that protein.  The variable names are edited for readability.
#'
#'  \itemize{
#'   \item{cd3cd28: Stimulation on CD3 and CD28}
#'   \item{cd3cd28icam2_aktinhib: Stimulation on CD3, CD28, and LFA-1, Akt inhibited}
#'   \item{cd3cd28icam2_g0076: Stimulation on CD3, CD28, and LFA-1, PKC inhibited}
#'   \item{cd3cd28icam2_psit: Stimulation on CD3, CD28, and LFA-1, PIP2 inhibited}
#'   \item{cd3cd28icam2_u0126: Stimulation on CD3, CD28, and LFA-1, Mek inhibited}
#'   \item{cd3cd28icam2_ly: Stimulation on CD3, CD28, and LFA-1, PI13 inhibited (not measured)}
#'   \item{cd3cd28icam2: Stimulation on CD3, CD28, and LFA-1}
#'   \item{cd3cd28_aktinhib: Stimulation on CD3 and CD28, Akt inhibited}
#'   \item{cd3cd28_g0076: Stimulation on CD3 and CD28, PKC inhibited}
#'   \item{cd3cd28_psitect: Stimulation on CD3 and CD28, PIP2 inhibited}
#'   \item{cd3cd28_u0126: Stimulation on CD3 and CD28, Mek inhibited}
#'   \item{cd3cd28_ly: Stimulation on CD3 and CD28, PI13 inhibited (not measured)}
#'   \item{pma: PKC activation}
#'   \item{b2camp}{PKA activation}
#' }
#'
#'  The second dataset is the 2005 dataset with preprocessing into 3 discrete levels for each protein.
#'  Marco Scutari presents this processed dataset with workflows introduced in his books and documentation
#'  that accompany his \code{bnlearn} package.
#'  .
#'
#'  This is a list containing two elements.  The first element '.data' is a data frame with 5400 rows, each corresponding to a cell.
#'  The second element 'interventions' contains an array where each element corresponds to a cell (observation)
#'  in .data, and names the protein (column) in .data that that received an intervention in a given cell.
#'  Subsetting by these interventions you get:
#'
#'  \itemize{
#'    \item{oservational data}{1800 cells with only general stimulatory cues, so that the protein signalling paths are active}
#'    \item{PKC activation}{1200 cells with activation on PKC}
#'    \item{PKA activation}{600 cells with activation on PKA}
#'    \item{Akt, PIP2, and Mek inhibiton}{600 cells with inhibition on Akt, PIP2, Mek respectively}
#'  }
#'
#'  A key feature of this experiment is that the interventions may not directly change the abundance
#'  (and thus the measurement values in the raw data) of a given protein, just its ability to modify downstream
#'   proteins.  To correct for this, in cells with inhibition or activation interventions the distribution of
#'   the protein has one level with probability one and the other two with probability zero, making it similar
#'   to doing a knock-out or spiking.  Simply put, this abstracts away some of the biology, though for
#'   more considered incorporation of this information in the modely may improve results.
#'
#' @examples
#' library(bnlearn)
#' library(magrittr)
#' data(tcells)
#' str(tcells)
#'
#' # Visualize the associated network
#' factorization = paste("[PKC][PKA|PKC][Raf|PKC:PKA][Mek|PKC:PKA:Raf]",
#' "[Erk|Mek:PKA][Akt|Erk:PKA][P38|PKC:PKA]",
#' "[Jnk|PKC:PKA][Plcg][PIP3|Plcg][PIP2|Plcg:PIP3]")
#' net <- model2network(factorization)
#' graphviz.plot(net)
#'
#' # Replicate Scutari's network inference in R
#'
#' data(tcells)
#' df <- tcells$processed
#' int_array <- as.numeric(df$INT); df$INT <- NULL  # Pull out the intervetions, so only the proteins remain.
#' int_arg <- lapply(seq_along(df), function(i){
#'    which(int_array == i)}) %>%
#'    structure(names =  names(df))
#'    averaging_results <- random.graph(nodes = names(df), # Generate random graph
#'                                method = "melancon",
#'                                num = 500,
#'                                burn.in = 10^5,
#'                                every = 100) %>%
#'      lapply(function(net){ # Fit Tabu search to each graph
#'          tabu(df, score = "mbde", exp = int_arg, iss = 10, start = net, tabu = 50)
#'      }) %>%
#'      custom.strength(nodes = names(df)) %>% # Compute averaging statistics
#'      averaged.network(nodes = names(df)) %>%
#'      graphviz.plot
#'
#' @references Sachs, Karen, et al. "Causal protein-signalling networks derived from multiparameter single-cell data." Science 308.5721 (2005): 523-529.
#' @source \url{http://www.sciencemag.org/content/308/5721/523.short}
#' @references Nagarajan, Radhakrishnan, Marco Scutari, and Sophie Lèbre. Bayesian Networks in R. Springer, 2013.
#' @references Scutari, Marco, and Jean-Baptiste Denis. Bayesian Networks: With Examples in R. CRC Press, 2014.
#'
#' @seealso \code{\link{tcell_examples}}
#'
#' @docType data
#' @keywords datasets
#' @name tcells
#' @usage data(tcells)
#' @format A list of 2 objects.  The first object is raw data consisting of a list of 14 data frames
#' each with the same 11 signalling protein variables.  The second object is a processed subset of the
#' raw data aggregated into a single data frame.  The first 11 columns correspond to signalling proteins, the
#' last column contains column numbers for the protein that received an intervention in a given cell.
NULL

#' Example networks based on T cell data
#'
#' Example networks based on the T cell data.  These are useful for inference related operations. Loading the
#' data set presents a list of the following objects:
#'
#' \itemize{
#'    \item{net:}{ the validated network}
#'    \item{net_fit:}{ a fitted Bayesian network.  Fit on raw data from the cd3cd28, cd3cd28icam2, b2camp, pma,
#'     subsets.  Data was logged, centered, and discretized in the 3 levels prior to fitting.}
#'    \item{net_gauss:}{ a fitten Gaussian Bayesian network.  Fit on same data as net_fit.  Data was first
#'    logged then standarized.}
#'    \item{averaging_tabu:}{ model averaging results of Scutari's workflow, which uses tabu search and models
#'    interventions.}
#'    \item{averaging_hc: }{ model averaging results of Scutari's workflow, which uses hill-climbing search and models
#'    interventions.}
#' }
#' @seealso \code{\link{tcells}}
#'
#' @source \url{http://www.sciencemag.org/content/308/5721/523.short}
#' @docType data
#' @keywords datasets
#' @name tcell_examples
#' @usage data(tcell_examples)
#' @format A list of modeling results
NULL
robertness/bninfo documentation built on May 27, 2019, 10:32 a.m.