lmeNB: Performs the maximum likelihood estimation for the negative...
In lmeNB: Compute the Personalized Activity Index Based on a Negative Binomial Model

Description Usage Arguments Details Value Author(s) References See Also Examples

Let Y[ij] be the response count at jth repeated measure from ith subject. The negative binomial mixed-effect independent model assumes that given the random effect G[i]=g[i], the count response Y[ij] follows the negative binomial distribution:

Y[ij] | G[i] =g[i] ~ NB(r[ij],p[i]),

where p[i], the failure probability of subject i at each time point j is parametrized as p[i]= 1/(g[i]*α+1) and α>0. The model assumes E(G[i]) = 1 so that E(Y[ij]|G[i]=g[i])=r[ij]*g[i]*α and E(Y[ij])= r[ij]*g[i]. This assumption allows the interpretation of the latent random variable G[i] as the subject i's activity rate relative to the overall cohort. The marginal mean μ[ij] = E(Y[ij]) is modeled with fixed effect coefficients, β: μ[ij] = exp(X[ij]^T β). Furthermore, let Var(G[i])=theta, then Var(Y[ij])=mu[ij]^2theta+mu[ij](1+(theta + 1)alpha).

——————————————————————————————————————

Regarding the dependence structures of Y[ij] and Y[ij'] conditional on the random effect G[i], we consider two models, namely independent and AR(1) models.

[1]: Independent model Y[ij] and Y[ij'] are independent conditionally on G[i]. This assumption leads to Cov(Y[ij],Y[ij']|G[i]=g[i])=0 and Cov(Y[ij],Y[ij'])=mu[ij] mu[ij']theta

[2]: AR(1) model Autoregressive (1) structures for Y[ij] and Y[ij'] conditionally on G[i]. That is given the random effect G[i]=g[i], Y[ij] depends on Y[i(j-1)] through the beta binomial thinning and is conditionally independent on Y[ij'] given Y[i(j-1)] for all j' < j-1 . The beta binomial thinning operator depends on a parameter δ and models to have Cov(Y[ij],Y[ij']|G[i]=g[i])=δ^{j-j'} μ[ij](α*g[i]^2+g[i]) . This means that δ measures the strength of the positive AR(1) association between repeated measures of a subject: the larger δ, the stronger the positive correlations between the repeated measures of the same subject are.

—————————————————————————————————————–

Regarding the random effect G[i] distribution, lmeNB allows three models, namely log-normal, gamma and semiparametric models. (All models assume E(G[i])=1 and Var(G[i])=θ.)

(1) The log-normal model That is regular log-normal parameters are restricted as meanlog=-log(theta+1)/2, sdlog = sqrt(log(theta+1)).

(2) The gamma model That is regular gamma parameters are restricted as shape=1/theta, scale=theta.

(3) The semiparametric model No distributional assumption and the random effect distribution is approximated by the estimated values of the quantity: γ[i] = w[i] (y[i+]/ μ[i+]) + (1 - w[i]) , where: y[i+] = ∑[j=1]^{n[i]} y[ij] , μ[i+] = ∑[j=1]^{n[i]} μ[ij] and, w[i] = sqrt( Var(G[i])/Var(Y[i+]/μ[i+]) ) . See Zhao et al. (2013) for more details. The missing count responses, if assumed to be missing at random, can be igored. Other types of missing data are currently not accepted.

lmeNB(formula,data,ID,p.ini=NULL,IPRT=FALSE,AR=FALSE,
      RE=c("G","N","NoN"),deps=1e-03,Vcode=NULL,C=FALSE,
      i.tol=1e-7,o.tol=sqrt(.Machine$double.eps),labelnp,
      maxit=100,semi.boot=0,u.low=0)

`formula`	An object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The formula must contain an intercept term.
`data`	A data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. The each row must contains the data corresponding to the repeated measure j of subjects and the rows (i,j)s must be ordered in a way that measurements from a subject is clustered together as (1,1),...,(1,n[1]),(2,1),...,(2,n[2]),...,(N,n[N]). Missing values are accepted of the response variables are treated as missing at random and simply removed from the data when `AR=FALSE`. When `AR=TRUE`, then the missing values are allowed. See the reference for missing value treatments. Missing values in covariates are not currently accepted.
`ID`	A vector of length ∑[i=1]^N n[i] , containing the patient IDs that corresponds to `data`. i.e., `c(rep(ID_1,n_1),rep(ID_2,n_2),...,rep(ID_N,n_N))`. The length must be the same as the number of rows of `data`. Missing ID values are NOT accepted.

`p.ini`	The initial values of the parameters: If `AR=0`, `p.ini`= (\log(α), \log(Var(G[i])) ,beta[0], beta[1],...) . If `AR=1`, `p.ini`= (\log(α), \log(Var(G[i])),logit(δ),β[0], β[1],...) . `NULL` is accepted.
`IPRT`	A logical, passed to Iprt of function `optim`. If `TRUE` then print iterations.
`AR`	A logical, if `TRUE`, then the AR(1) model is employed. If `FALSE`, then the independent model is employed.
`RE`	The distribution of random effects G[i]. If `RE="G"`, then the conditional probability is computed by assuming the random effect is from a gamma distribution with mean 1 and variance theta (gamma model). If `RE="N"`, then the conditional probability is computed by assuming the random effect is from a log-normal distribution with mean 1 and variance theta (log-normal model). If `RE="NoN"`, then the conditional probability is computed based on the semi-parametric model with mean 1 (semiparametric model).
`deps`	In the semiparametric models, the algorithms are terminated when the maximum difference of fixed effect coefficients between the current and previous iterations is less than `deps`. Passed to `fitSemiIND` and `fitSemiAR1`.
`Vcode`	Necessary only if the AR(1) model is fit. A vector of length the total number of repeated measures, containing the indices of time point. For example, there are three subjects and the first two subjects do not have missing visits and completed five visits while the last subject missed the third visit and have four visits in total, then `Vcode=c(1,2,3,4,5,1,2,3,4,5,1,2,4,5)`.
`i.tol`	The absolute tolerance of `integrate` function, which is used to integrate out the random effect of every patient. Used only in parametric methods. `i.tol` should be about `1E-3` for `C=TRUE` option.
`o.tol`	The relative tolerance for `optim` function which is used to search for the maximum likelihood. Used only in parametric methods.
`labelnp`	See `index.batch`. Necessary only for semiparametric random effect model and `semi.boot > 1`. To account for the varying follow-up times of the patients, the bootstrap sampling is stratified according to the follow-up time.
`maxit`	The maximum number of iterations. Necessary only for semiparametric random effect model.
`C`	If `C=TRUE`, then the function uses the likelihood written in `C`. The integration of the random effect is done using Cubature (Multi-dimensional integration) package written by Steven G. Johnson. This option could make computation faster in some scenario. If `C=FALSE`, then the function the likelihood is likelihood written in R language. The integration of the random effect is done using `integrate` function in the `stats` package.
`semi.boot`	The number of bootstrap samples to construct the bootstrap empirical confidence intervals for the fixed effect coefficients. If the value is less than 1 then the bootstrap confidence intervals are not computed. Necessary only for semiparametric random effect model.
`u.low`	For the semiparametric procedures, we observed that the algorithm could behave very unstable when factor covariates are employed in the dataset, and data contains "few" repeated measures of one of the corresponding factor groups: The algorithm could yield unacceptably small estimate of mu[ij]. As Var(Y[ij]) and Cov(Y_{ij},Y_{ij'}) are both multiples of mu[ij] (see description above), small mu[ij] leads to singular Var(Y[i]) matrix. As a result, the algorithm breakdown when computing the weighted matrix of the weighted least square, W[i]=Var(Y[i])^{-1}. To prevent this issue, the current algorithm takes add-hoc treatment, and replaces small mu[ij] i.e. those with mu[ij]< `u.low` with `u.low` when calculating the weight matrix W[i]. `u.low`=0 means that there is no modification, and the smaller `u.low`, the "closer" the modified algorithm is to the original one proposed in Zhao et al. (2013). Our preliminary study indicates that `u.low> 1E-4` prevents the breakdown problem and the performance of the algorithm are similar when `1E-3 < u.low < 1E-1` in terms of the root mean square error of the conditional probability index.

fitParaIND calls optim to minimize the negative log-likelihood of the negative binomial model with respect to the model parameters: \log(α), \log(θ),β[0], β[1],... . The Nelder-Mead algorithm is employed. The log-likelihood is obtained by marginalizing out the random effects. The numerical integration is carried out using adaptive quadrature. The missing count responses, if assumed to be missing at random, can be igored. Other types of missing data are currently not accepted.

When the estimated over-dispersion parameter (α) is close to zero, the negative binomial model reduces to the poisson model, suggesting that the negative binomial mixed-effect model might not be appropriate. When AR=1 and the estimated auto-correlation parameter (δ) is close to zero, the model is suggesting that there is no AR(1) structure among the sequentially collected responses. Hence user might use AR=0 setting which assumes no AR(1) structure.

We note that the results could be sensitive to initial values.

`call`	The input of the function.
`opt`	If `RE="G"` or `RE="N"`, then `opt` contains the results directly from the `optim` function, which is used to minimize the negative of the log-likelihood. If `RE="NoN"`, then `opt` contains the results directly from the `optimize` function, which is used to minimize the negative of the psudo-profile log-likelihood with respect to the dispersion parameter alpha, `a`.
`nlk`	The value of the negative log-likelihood corresponding to `opt$par`.
`V`	If `RE="G"` or `RE="N"`, the approximated asymptotic covariance matrix of the maximum likelihood estimators, i.e. `V`=`solve(opt$hessian)`. If `opt$hessian` is not invertible, then `V` = `matrix(NA,length(p.ini),length(p.ini))`. If `RE="NoN"`, `AR=FALSE` and `semi.boot>0`, `V` contains the bootstrap covariance matrix based on `semi.boot` samples.
`est`	The matrix of the number of fixed-effect parameters (i.e. `length(p.ini)`) by 2. The first column contains the estimates of the model parameters. The second column contains the standard error of the estimators, i.e., `sqrt(diag(V))`. If `V` is not evaluated, then `est` only has one column.

Zhao, Y. and Kondo, Y.

Detection of unusual increases in MRI lesion counts in individual multiple sclerosis patients. (2013) Zhao, Y., Li, D.K.B., Petkau, A.J., Riddehough, A., Traboulsee, A., Journal of the American Statistical Association.

A flexible mixed effect negative binomial regression model for detecting unusual increases in MRI lesion counts in individual multiple sclerosis patients. Kondo, Y., Zhao, Y., Petkau, A.J.

The subroutines of this function is: fitParaIND, fitParaAR1, fitSemiIND, fitSemiAR1,

The subroutines of index.batch to compute the conditional probability index: jCP.ar1, CP1.ar1, MCCP.ar1, CP.ar1.se, CP.se, jCP,

The functions to generate simulated datasets: rNBME.R.

## Not run: 
## ================================================================================ ##
## generate a simulated dataset from the negative binomial mixed-effect 
## independent model:
## Y_ij | G_i = g_i ~ NB(r_ij,p_i) where r_ij = exp(X^T beta)/a , p_i =1/(a*g_i+1)
## with G_i ~ log-normal(E(G_i)=1,var(G_i)=th)
set.seed(1)
sn <- 5 # the number of repeated measures of each patient
n <- 80 ## the number of patients
loga <- - 0.5 
a <- exp(loga) ## dispersion parameter 
logtheta <- 1.3
th <- exp(logtheta) ## the variance of the gamma distributed random effect g_i


## No difference between the means of groups 
## The model only has an intercept term beta0 <- 0.5
b0 <- 0.5
u1 <- rep(exp(b0),sn)  ## the mean vector for group 1 at time point 1,...,sn
u2 <- rep(exp(b0),sn) ## the mean vector for group 2 at time point 1,...,sn

## DT.ind is generated from the IND model
DT.ind<- rNBME.R(gdist="N", n=n, a=a, th=th, u1=u1, u2=u2, sn=sn)
## DT.ar is generated from AR(1) model with delta=0.4
DT.ar<- rNBME.R(gdist="N", n=n, a=a, th=th, u1=u1, u2=u2, sn=sn, d=0.4)

dt.ind<-data.frame(CEL=DT.ind$y,Vcode=DT.ind$vn-2,ID=DT.ind$id)
dt.ar<-data.frame(CEL=DT.ar$y,Vcode=DT.ar$vn-2,ID=DT.ar$id)
## ================================================================================ ##
#### ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ####
#### Fit various parametric independent models ####
#### ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ####
tst1 <- lmeNB(CEL~1, data=dt.ind, ID=dt.ind$ID, IPRT=TRUE)            ## gamma Gi
tst2 <- lmeNB(CEL~1, data=dt.ar, ID=dt.ar$ID, IPRT=TRUE)              ## gamma Gi
tst3 <- lmeNB(CEL~1, data=dt.ind, ID=dt.ind$ID, IPRT=TRUE, RE="N")    ## log-normal Gi
tst4 <- lmeNB(CEL~1, data=dt.ar, ID=dt.ar$ID, IPRT=TRUE, RE="N")      ## log-normal Gi
tst5 <- lmeNB(CEL~Vcode, data=dt.ind, ID=dt.ind$ID, IPRT=TRUE, RE="N")## log-normal Gi
## conditional probability index with the fitted results of tst5
Psum5 <- index.batch(olmeNB=tst5, labelnp=dt.ind$Vcode >= 1, data=dt.ind, ID=dt.ind$ID)


#### ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ####
#### Fit the semi-parametric independent model ####
#### ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ####
tst6 <- lmeNB(CEL~1, data=dt.ind, ID=dt.ind$ID, IPRT=TRUE, RE="NoN")
tst7 <- lmeNB(CEL~Vcode, data=dt.ind, ID=dt.ind$ID, IPRT=TRUE, RE="NoN",
              semi.boot=100,labelnp=dt.ind$Vcode >= 1) 
## semi.boot = 100 option computes bootstrap SE and 95
## conditional probability index with fitting result of tst7
Psum7 <- index.batch(olmeNB=tst7,labelnp=dt.ind$Vcode >= 1, data=dt.ind, 
                     ID=dt.ind$ID, Vcode=Vcode)

#### ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ####
#### Fit the parametric AR1 models ####
#### ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ####
tst8 <- lmeNB(CEL~1, data=dt.ind, ID=dt.ind$ID, IPRT=TRUE,AR=TRUE,Vcode=dt.ind$Vcode)
tst9 <- lmeNB(CEL~1, data=dt.ar, ID=dt.ar$ID, IPRT=TRUE,AR=TRUE,Vcode=dt.ar$Vcode)
## conditional probability index
Psum9 <- index.batch(olmeNB=tst9, labelnp=dt.ind$Vcode >= 1, data=dt.ind, 
                     ID=dt.ind$ID,Vcode=dt.ind$Vcode)



## End(Not run)

lmeNB documentation built on May 2, 2019, 3:34 p.m.

lmeNB index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

lmeNB
Compute the Personalized Activity Index Based on a Negative Binomial Model

lmeNB: Performs the maximum likelihood estimation for the negative...
In lmeNB: Compute the Personalized Activity Index Based on a Negative Binomial Model

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to lmeNB in lmeNB...

R Package Documentation

Browse R Packages

We want your feedback!

lmeNB Compute the Personalized Activity Index Based on a Negative Binomial Model

lmeNB: Performs the maximum likelihood estimation for the negative... In lmeNB: Compute the Personalized Activity Index Based on a Negative Binomial Model

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to lmeNB in lmeNB...

R Package Documentation

Browse R Packages

We want your feedback!

lmeNB
Compute the Personalized Activity Index Based on a Negative Binomial Model

lmeNB: Performs the maximum likelihood estimation for the negative...
In lmeNB: Compute the Personalized Activity Index Based on a Negative Binomial Model