Introduction

This package implements a high-dimensional linear State Space Model (SSM) with a new Expectation-Regularization-Network (ERM) algorithm to construct the dynamic gene regulatory network (GRN). The new ERM algorithm employs the idea of the adaptive LASSO-based variable selection method, which reserves the sparsity property of GRN.
A state space model (SSM) is a special case of dynamic Bayesian networks (DBNs). The linear SSM, also referred to linear dynamic systems, for dynamic gene regulatory network is employed in this package. For the time-series gene expression data, let $y_t$ represent a p-dimensional vector of gene expression observations of p genes at time t. In linear SSMs, $y_t$ is assumed to be generated from a k-dimensional real-valued hidden state viable vector $x_t$, and the sequence of evolving $x_t$ follows a first-order Markov process, $t=1,...,T$. A general linear SSM can be written as, [x_t = Ax_{t-1}+w_t] [y_t = Cx_t+v_t] where A is the $(k\times k)$ system matrix, C is the $(p\times k)$ observation matrix, and $w_t \sim N(0,Q)$ and $v_t \sim N(0,R)$ are independent system noise and measurement noise. Both $Q$ and $R$ are assumed to be diagnol in many practical applications. the initial state vector $x_0$ is usually assumed to be normally distributed with mean $\mu$ and covariance matrix $\Sigma$.

Method

Using a linear SSM with an ERM algorithm to efficiently construct dynamic GRNs. Set the observation matrix C as an identity matrix, then our parameters is $\theta = (A,Q,R,\mu,\Sigma)$ and $\theta^{\ast} = (Q,R,\mu,\Sigma)$

ERM algorithm

A three-step procedure to estimate the high-dimensional sparse system matrix A and other parameters as well as the state variables for linear SSMs. Kalman filter and smoother can be used to estimate the hidden states.
E step: calculate the conditional expectation of the log likelihood. The sufficient statistics, state variables and their functions, required by latter steps, are obtained through the Kalman fileter and smoother.
R step: the L1 regularization, adaptive LASSO method, is employed to obtain the estimate of the sparse system matrix A. The adaptive LASSO estimates is implemented by using the LARS algorithm.
* M Step: the MLE of $\theta^{\ast}$ is obtained by maximizing the conditional expectation of log likelihood.

Example

Design simulation study to validate the methodology and the implementation procedure. Our microbiome data has 12 baterium and 29 time points. The system matrix A is $12\times 12$. The observations matrix C is identity matrix with dimension as $12\times 12$. The variance matrices Q and R are diagnol matrices with dimension as $12\times 12$. The initial state $x_0 \sim N_8(0.1,10^{-5})$ The maximum iteration is set as 100.

library(HDSSM)
initial.Arow <- Ainit(subject15)
Arow <- initial.Arow$Ainit

### run ERM algorithm
os <- ss <- nrow(Arow)
C0 <- diag(os)
Q0 <- diag(ss)
R0 <- diag(os)
initx0 <- matrix(rep(0.1,ss),ncol=1)
initV0 <- diag(1e-5,ss,ss)
max_iter <- 100
diagQ<-1
diagR<-0
ARmode<-0
y <- initial.Arow$norm.data

Here is the (part of) normalized data

y[1:8,1:8]

Here is the initial system matrix.

Arow

Here are some input parameters. max_iter is set as 100, the boolean parameters diagQ is set as TRUE, diagR is set as FALSE and ARmode is set as FALSE.

max_iter
diagQ
diagR
ARmode

Call the main function: row-based ERM algorithm

ss<-rowbaseEMR(y,Arow,C0,Q0,R0,initx0,initV0,max_iter,diagQ,diagR,ARmode)

Some results: the estimated system matrix A, the log likelihood LL and the BIC value bic

ss$estA
ss$LL
ss$bic

call the main function: matrix-based ERM algorithm

lk <- matbaseERM(y,Arow,C0,Q0,R0,initx0,initV0,max_iter,diagQ,diagR,ARmode)

some results: the estimated system matrix A, the log likelihood LL and the BIC value bic

lk$estA
lk$LL
lk$bic


ygu427/HDSSM documentation built on May 4, 2019, 2:33 p.m.