#' Fitting penalized frailty models with clustered, multi-event and recurrent data using MM algorithm
#'
#' @description {This formula is used to fit the penalized regression. 3 types of the models can be fitted similar to the function
#' \code{frailtyMM}. In addition, variable selection can be done by three types of penalty, LASSO, MCP and SCAD with the following
#' objective function where \eqn{\lambda} is the tuning parameter and \eqn{q} is the dimension of \eqn{\boldsymbol{\beta}},
#' \deqn{l(\boldsymbol{\beta},\Lambda_0|Y_{obs}) - n\sum_{p=1}^{q} p(|\beta_p|, \lambda).}
#' The BIC is computed using the following equation,
#' \deqn{-2l(\hat{\boldsymbol{\beta}}, \hat{\Lambda}_0) + G_n(\hat{S}+1)\log(n),}
#' where \eqn{G_n=\max\{1, \log(\log(q+1))\}} and \eqn{\hat{S}} is the degree of freedom.
#'
#' Surrogate function is also derived for penalty part for efficient estimation of penalized regression, similar to the notation used
#' in \code{\link{frailtyMM}}, we let \eqn{\boldsymbol{\alpha}} be the collection of all parameters and baseline hazard function. Given that,
#'
#' \if{html}{\figure{fig15.png}{options: style="width:750px;max-width:75\%;"}}
#' \if{latex}{\out{\begin{center}}{\figure{fig15.png}{options: width=5in}}\out{\end{center}}}
#'
#' by local quadratic approximation,
#'
#' \if{html}{\figure{fig16.png}{options: style="width:750px;max-width:75\%;"}}
#' \if{latex}{\out{\begin{center}}{\figure{fig16.png}{options: width=5in}}\out{\end{center}}}
#'
#' And thus, the surrogate function given \eqn{k^{th}} iteration result is as follows,
#'
#' \if{html}{\figure{fig17.png}{options: style="width:750px;max-width:75\%;"}}
#' \if{latex}{\out{\begin{center}}{\figure{fig17.png}{options: width=5in}}\out{\end{center}}}
#' }
#'
#' @param formula Formula where the left hand side is an object of the type \code{Surv}
#' and the right hand side contains the variables and additional specifications.
#' \code{+cluster()} function specify the group id for clustered data or individual id for recurrent data.
#' \code{+event()} function specify the event id for multi-event data (only two events are allowed).
#' @param data The \code{data.frame} where the formula argument can be evaluated.
#' @param frailty The frailty used for model fitting. The default is "lognormal", other choices are
#' "invgauss", "gamma" and "pvf". (Note that the computation time for PVF family will be slow
#' due to the non-explicit expression of likelihood function)
#' @param power The power used if PVF frailty is applied.
#' @param penalty The penalty used for regularization, the default is "LASSO", other choices are "MCP" and "SCAD".
#' @param gam The tuning parameter for MCP and SCAD which controls the concavity of the penalty. For MCP,
#' \deqn{p^{\prime}(\beta, \lambda)=sign(\beta)(\lambda - \frac{|\beta|}{\gamma})} and for "SCAD",
#' \deqn{p^{\prime}(\beta, \lambda)=\lambda\{I(|\beta| \leq \lambda)+\frac{(\gamma \lambda-|\beta|)_{+}}{(\gamma-1) \lambda} I(|\beta|>\lambda)\}.}
#' The default value of \eqn{\gamma} for MCP is 3 and SCAD is 3.7.
#' @param tune The sequence of tuning parameters provided by user. If not provided, the default grid will be applied.
#' @param tol The tolerance level for convergence.
#' @param maxit Maximum iterations for MM algorithm.
#' @param ... additional arguments pass to the function.
#' @export
#' @importFrom Rcpp evalCpp
#' @useDynLib frailtyMMpen, .registration = TRUE
#'
#' @details Without a given \code{tune}, the default sequence of tuning parameters are used to provide the regularization path.
#' The formula is same as the input for function \code{frailtyMM}.
#'
#' @return An object of class \code{fmm} that contains the following fields:
#' \item{coef}{matrix of coefficient estimated from a specific model where each column correponds to an input tuning parameter.}
#' \item{est.tht}{vector of frailty parameters estimated from a specific model with respect to each tuning parameter.}
#' \item{lambda}{list of frailty for each observation estimated from a specific model with respect to each tuning parameter.}
#' \item{likelihood}{vector of the observed log-likelihood given estimated parameters with respect to each tuning parameter.}
#' \item{BIC}{vector of the BIC given estimated parameters with respect to each tuning parameter.}
#' \item{tune}{vector of tuning parameters used for penalized regression.}
#' \item{tune.min}{tuning parameter where minimal of BIC is obtained.}
#' \item{convergence}{convergence threshold.}
#' \item{input}{The input data re-ordered by cluster id. \code{y} is the event time, \code{X} is covariate matrix and \code{d} is the status while 0 indicates censoring.}
#' \item{y}{input stopping time.}
#' \item{X}{input covariate matrix.}
#' \item{d}{input censoring indicator.}
#' \item{formula}{formula applied as input.}
#' \item{coefname}{name of each coefficient from input.}
#' \item{id}{id for individuals or clusters, {1,2...,a}. Note that, since the original id may not be the sequence starting from 1, this output
#' id may not be identical to the original id. Also, the order of id is corresponding to the returned \code{input}.}
#' \item{N}{total number of observations.}
#' \item{a}{total number of individuals or clusters.}
#' \item{datatype}{model used for fitting.}
#'
#' @seealso \code{\link{frailtyMM}}
#'
#' @references
#' \itemize{
#' \item Huang, X., Xu, J. and Zhou, Y. (2022). Profile and Non-Profile MM Modeling of Cluster Failure Time and Analysis of ADNI Data. \emph{Mathematics}, 10(4), 538.
#' \item Huang, X., Xu, J. and Zhou, Y. (2023). Efficient algorithms for survival data with multiple outcomes using the frailty model. \emph{Statistical Methods in Medical Research}, 32(1), 118-132.
#' }
#'
#' @examples
#'
#' data(simdataCL)
#'
#' # Penalized regression under clustered frailty model
#'
#' # Clustered Gamma Frailty Model
#'
#' # Using default tuning parameter sequence
#' gam_cl1 = frailtyMMpen(Surv(time, status) ~ . + cluster(id),
#' simdataCL, frailty = "gamma")
#'
#' \donttest{
#' # Using given tuning parameter sequence
#' gam_cl2 = frailtyMMpen(Surv(time, status) ~ . + cluster(id),
#' simdataCL, frailty = "gamma", tune = 0.1)
#'
#' # Obtain the coefficient where minimum BIC is obtained
#' coef(gam_cl1)
#'
#' # Obtain the coefficient with tune = 0.2.
#' coef(gam_cl1, tune = 0.2)
#'
#' # Plot the regularization path
#' plot(gam_cl1)
#'
#' # Get the degree of freedom and BIC for the sequence of tuning parameters provided
#' print(gam_cl1)
#'
#' }
#'
frailtyMMpen <- function(formula, data, frailty = "gamma", power = NULL, penalty = "LASSO", gam = NULL, tune = NULL, tol = 1e-5, maxit = 200, ...) {
Call <- match.call()
if(!inherits(formula, "formula")) {
stop("please provide formula object for formula")
}
if(!inherits(data, "data.frame")) {
stop("please provide data.frame type for data")
}
m <- model.frame(formula, data)
mx <- model.matrix(formula, data)
lower_frailty = tolower(frailty)
frailty = switch(lower_frailty, "gamma" = "Gamma", "lognormal" = "LogN", "invgauss" = "InvGauss", "pvf" = "PVF",
stop("Invalid frailty specified, please check the frailty input"))
out_frailty = switch(frailty, "Gamma" = "Gamma", "LogN" = "Log-Normal", "InvGauss" = "Inverse Gaussian", "PVF" = "PVF")
if (ncol(m[[1]]) == 2) {
cluster_id <- grep("^cluster\\(", colnames(mx))
event_id <- grep("^event\\(", colnames(mx))
if (length(cluster_id) == 0 && length(event_id) == 0) {
type = "Cluster"
mx1 = mx[, -c(1), drop = FALSE]
coef_name = colnames(mx1)
N = nrow(mx1)
p = ncol(mx1)
newid = seq(0, N-1, 1)
if (N <= 2) {
stop("Please check the sample size of data")
}
y = m[[1]][, 1]
X = mx1
d = m[[1]][, 2]
a = N
neworder = order(y, decreasing = TRUE)
newrank = seq(1, N, 1)[order(neworder)]
y = y[neworder]
X = X[neworder, ]
d = d[neworder]
newid = newid[neworder]
}
if (length(cluster_id) == 1) {
type = "Cluster"
pb = unlist(gregexpr('\\(', colnames(mx)[cluster_id])) + 1
pe = unlist(gregexpr('\\)', colnames(mx)[cluster_id])) - 1
clsname = substr(colnames(mx)[cluster_id], pb, pe)
remove_cluster_id = c(which(colnames(mx) == clsname), cluster_id)
mx1 = mx[, -c(1, remove_cluster_id), drop = FALSE]
mxid = mx[, cluster_id]
coef_name = colnames(mx1)
nord = order(mxid)
mxid = mxid[nord]
N = length(mxid)
p = ncol(mx1)
newid = rep(0, N)
if (N <= 2) {
stop("Please check the sample size of data")
}
for (i in 2:N) {
if (mxid[i] > mxid[i-1]) {
newid[i:N] = newid[i:N] + 1
}
}
y = m[[1]][nord, 1]
X = mx1[nord, , drop = FALSE]
d = m[[1]][nord, 2]
a = max(newid) + 1
neworder = order(y, decreasing = TRUE)
newrank = seq(1, N, 1)[order(neworder)]
y = y[neworder]
X = X[neworder, ]
d = d[neworder]
newid = newid[neworder]
}
if (length(event_id) == 1) {
type = "Multiple"
pb = unlist(gregexpr('\\(', colnames(mx)[event_id])) + 1
pe = unlist(gregexpr('\\)', colnames(mx)[event_id])) - 1
evsname = substr(colnames(mx)[event_id], pb, pe)
remove_event_id = c(which(colnames(mx) == evsname), event_id)
mx1 = mx[, -c(1, remove_event_id), drop = FALSE]
mxid = mx[, event_id]
coef_name = colnames(mx1)
mxid_info = table(mxid)
n = length(mxid_info)
b = min(mxid_info)
p = ncol(mx1)
if (b != max(mxid_info)) {
stop("every subject should have same number of events")
}
nord = order(mxid)
N = length(nord)
mx1 = mx1[nord, ]
X = mx1[nord, , drop = FALSE]
y = m[[1]][nord, 1]
d = m[[1]][nord, 2]
}
}
if (ncol(m[[1]]) == 3) {
type = "Recurrent"
cluster_id <- grep("^cluster\\(", colnames(mx))
pb = unlist(gregexpr('\\(', colnames(mx)[cluster_id])) + 1
pe = unlist(gregexpr('\\)', colnames(mx)[cluster_id])) - 1
clsname = substr(colnames(mx)[cluster_id], pb, pe)
remove_cluster_id = c(which(colnames(mx) == clsname), cluster_id)
mx1 = mx[, -c(1, remove_cluster_id), drop = FALSE]
mxid = mx[, cluster_id]
coef_name = colnames(mx1)
nord = order(mxid)
mxid = mxid[nord]
N = length(mxid)
p = ncol(mx1)
newid = rep(0, N)
if (N <= 2) {
stop("Please check the sample size of data")
}
for (i in 2:N) {
if (mxid[i] > mxid[i-1]) {
newid[i:N] = newid[i:N] + 1
}
}
y = m[[1]][nord, 2]
X = mx1[nord, , drop = FALSE]
d = m[[1]][nord, 3]
a = max(newid) + 1
}
threshold = tol
if (is.null(tune)) {
tuneseq = exp(seq(-5.5, 1, 0.25))
} else {
tuneseq = tune
}
if (type == "Cluster") {
initGam = frailtyMMcal(y, X, d, N, a, newid, frailty = "Gamma", maxit = 10, threshold = threshold, type = 1)
if (sum(abs(initGam$coef) > 1e-6) == 0) {
initGam = frailtyMMcal(y, X, d, N, a, newid, frailty = frailty, maxit = 10, threshold = threshold, type = 1)
}
ini = initGam
coef0 = ini$coef
est.tht0 = ini$est.tht
lambda0 = ini$lambda
likelihood0 = ini$likelihood
coef_all = list()
est.tht_all = list()
lambda_all = list()
likelihood_all = list()
BIC_all = list()
width <- options()$width/2
len_tune = length(tuneseq)
for (z in seq_len(length(tuneseq))) {
cur = frailtyMMcal(y, X, d, N, a, newid,
coef.ini = coef0, est.tht.ini = est.tht0, lambda.ini = lambda0, safe.ini = list(coef = ini$coef, est.tht = ini$est.tht, lambda = ini$lambda),
frailty = frailty, power = power, penalty = penalty, gam.val = gam, tune = tuneseq[z], maxit = maxit, threshold = threshold, type = 1)
coef0 = cur$coef
est.tht0 = cur$est.tht
lambda0 = cur$lambda
likelihood0 = cur$likelihood
coef_all[[z]] = coef0
est.tht_all[[z]] = est.tht0
lambda_all[[z]] = lambda0
likelihood_all[[z]] = likelihood0
BIC_all[[z]] = -2*likelihood0 + max(1, log(log(p + 1)))*(sum(abs(coef0) > threshold) + 1)*log(N)
progress_width = floor(z/len_tune*width)
cat('[', paste0(c(rep('=', progress_width), '>', rep('-', width - progress_width)), collapse=''), ']', round(z/len_tune*100),'%\r')
if (sum(abs(coef0)) < threshold) {
progress_width = width
cat('[', paste0(c(rep('=', progress_width), '>', rep('-', width - progress_width)), collapse=''), ']', round(100),'%\r')
break
}
if (p > N) {
coef0 = ini$coef
est.tht0 = ini$est.tht
lambda0 = ini$lambda
}
}
coef_all = data.frame(matrix(unlist(coef_all), nrow = length(coef0)))
est.tht_all = unlist(est.tht_all)
likelihood_all = unlist(likelihood_all)
BIC_all = unlist(BIC_all)
output = list(coef = coef_all,
est.tht = est.tht_all,
lambda = lambda_all,
likelihood = likelihood_all,
BIC = BIC_all,
tune = tuneseq[seq_len(z)],
tune.min = tuneseq[which.min(BIC_all)],
Ar = ini$Ar,
input = initGam$input,
y = y,
X = X,
d = d,
formula = formula,
coefname = coef_name,
id = newid + 1,
N = N,
a = a,
datatype = "Cluster")
}
if (type == "Multiple") {
initGam = frailtyMMcal(y, X, d, N, b, NULL, frailty = "Gamma", power = NULL, penalty = NULL, maxit = 10, threshold = tol, type = 2)
if (sum(abs(initGam$coef) > 1e-6) == 0) {
initGam = frailtyMMcal(y, X, d, N, b, NULL, frailty = frailty, power = NULL, penalty = NULL, maxit = 10, threshold = tol, type = 2)
}
ini = initGam
coef0 = ini$coef
est.tht0 = ini$est.tht
lambda0 = ini$lambda
likelihood0 = ini$likelihood
coef_all = list()
est.tht_all = list()
lambda_all = list()
likelihood_all = list()
BIC_all = list()
width <- options()$width/2
len_tune = length(tuneseq)
for (z in seq_len(length(tuneseq))) {
cur = frailtyMMcal(y, X, d, N, b, NULL,
coef.ini = coef0, est.tht.ini = est.tht0, lambda.ini = lambda0, safe.ini = list(coef = ini$coef, est.tht = ini$est.tht, lambda = ini$lambda),
frailty = frailty, power = power, penalty = penalty, gam.val = gam, tune = tuneseq[z], maxit = maxit, threshold = tol, type = 2)
coef0 = cur$coef
est.tht0 = cur$est.tht
lambda0 = cur$lambda
likelihood0 = cur$likelihood
coef_all[[z]] = coef0
est.tht_all[[z]] = est.tht0
lambda_all[[z]] = lambda0
likelihood_all[[z]] = likelihood0
BIC_all[[z]] = -2*likelihood0 + max(1, log(log(p + 1)))*(sum(abs(coef0) > 1e-6) + 1)*log(b)
progress_width = floor(z/len_tune*width)
cat('[', paste0(c(rep('=', progress_width), '>', rep('-', width - progress_width)), collapse=''), ']', round(z/len_tune*100),'%\r')
if (p > N) {
coef0 = ini$coef
est.tht0 = ini$est.tht
lambda0 = ini$lambda
}
if (sum(abs(coef0)) < 1e-6) {
progress_width = width
cat('[', paste0(c(rep('=', progress_width), '>', rep('-', width - progress_width)), collapse=''), ']', round(100),'%\r')
break
}
}
coef_all = data.frame(matrix(unlist(coef_all), nrow = length(coef0)))
est.tht_all = unlist(est.tht_all)
likelihood_all = unlist(likelihood_all)
BIC_all = unlist(BIC_all)
output = list(coef = coef_all,
est.tht = est.tht_all,
lambda = lambda_all,
likelihood = likelihood_all,
BIC = BIC_all,
tune = tuneseq[seq_len(z)],
tune.min = tuneseq[which.min(BIC_all)],
Ar = ini$Ar,
input = initGam$input,
y = y,
X = X,
d = d,
formula = formula,
coefname = coef_name,
id = NULL,
N = N,
a = b,
datatype = "Multi-event")
}
if (type == "Recurrent") {
initGam = frailtyMMcal(y, X, d, N, a, newid, frailty = "Gamma", power = NULL, penalty = NULL, maxit = 10, threshold = threshold, type = 3)
if (sum(abs(initGam$coef) > 1e-6) == 0) {
initGam = frailtyMMcal(y, X, d, N, a, newid, frailty = frailty, power = NULL, penalty = NULL, maxit = 10, threshold = threshold, type = 3)
}
ini = initGam
coef0 = ini$coef
est.tht0 = ini$est.tht
lambda0 = ini$lambda
likelihood0 = ini$likelihood
coef_all = list()
est.tht_all = list()
lambda_all = list()
likelihood_all = list()
BIC_all = list()
width <- options()$width/2
len_tune = length(tuneseq)
for (z in seq_len(length(tuneseq))) {
cur = frailtyMMcal(y, X, d, N, a, newid,
coef.ini = coef0, est.tht.ini = est.tht0, lambda.ini = lambda0, safe.ini = list(coef = ini$coef, est.tht = ini$est.tht, lambda = ini$lambda),
frailty = frailty, power = power, penalty = penalty, gam.val = gam, tune = tuneseq[z], maxit = maxit, threshold = threshold, type = 3)
coef0 = cur$coef
est.tht0 = cur$est.tht
lambda0 = cur$lambda
likelihood0 = cur$likelihood
coef_all[[z]] = coef0
est.tht_all[[z]] = est.tht0
lambda_all[[z]] = lambda0
likelihood_all[[z]] = likelihood0
BIC_all[[z]] = -2*likelihood0 + max(1, log(log(p + 1)))*(sum(abs(coef0) > 1e-6) + 1)*log(a)
progress_width = floor(z/len_tune*width)
cat('[', paste0(c(rep('=', progress_width), '>', rep('-', width - progress_width)), collapse=''), ']', round(z/len_tune*100),'%\r')
if (p > N) {
coef0 = ini$coef
est.tht0 = ini$est.tht
lambda0 = ini$lambda
}
if (sum(abs(coef0)) < 1e-6) {
progress_width = width
cat('[', paste0(c(rep('=', progress_width), '>', rep('-', width - progress_width)), collapse=''), ']', round(100),'%\r')
break
}
}
coef_all = data.frame(matrix(unlist(coef_all), nrow = length(coef0)))
est.tht_all = unlist(est.tht_all)
likelihood_all = unlist(likelihood_all)
BIC_all = unlist(BIC_all)
output = list(coef = coef_all,
est.tht = est.tht_all,
lambda = lambda_all,
likelihood = likelihood_all,
BIC = BIC_all,
tune = tuneseq[seq_len(z)],
tune.min = tuneseq[which.min(BIC_all)],
Ar = ini$Ar,
input = initGam$input,
y = y,
X = X,
d = d,
formula = formula,
coefname = coef_name,
id = newid + 1,
N = N,
a = a,
datatype = "Recurrent")
}
attr(output, "call") <- Call
class(output) = "fpen"
output
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.