# logcon: Compute log-concave MLE based on censored or exact data. In logconcens: Maximum likelihood estimation of a log-concave density based on censored data

## Description

Based on independent intervals X_i = [L_i,R_i], where -Inf < L_i <= R_i <= Inf, compute the maximum likelihood estimator of a (sub)probability density phi and the remaining mass p0 at infinity (also known as cure parameter) under the assumption that the former is log-concave. Computation is based on an EM algorithm. For further information see Duembgen, Rufibach, and Schuhmacher (2013, preprint).

## Usage

 ```1 2 3 4 5 6 7``` ```logcon(x, adapt.p0=FALSE, p0=0, knot.prec=IQR(x[x

## Arguments

 `x` a two-column matrix of n >= 2 rows containing the data intervals, or a vector of length n >= 2 containing the exact data points. `adapt.p0` `logical`. Should the algorithm be allowed to adapt p0? In this case an alternating maximization procedure is used that is believed to always yield a joint maximizer (phi,p0). For the much slower (but maybe safer) profile likelihood maximization method, see the function `cure.profile`. `p0` a number from 0 to 1 specifying the mass at infinity. If the algorithm is allowed to adapt p0, this argument only specifies the starting value. Otherwise it is assumed that the true cure parameter p0 is equal to this number. In particular, for the default setting of 0, a proper probability density phi is estimated. `knot.prec` the maximal distance between two consecutive grid points, where knots (points at which the resulting log-subdensity phi may change slope) can be positioned. See details. `reduce` `logical`. Should the domain of the (sub)density be reduced whenever the mass at the left or the right boundary becomes too small? `control` a list of control parameters for the more technical aspects of the algorithm; usually the result of a call to `lc.control`.

## Details

Based on the data intervals X_i = [L_i,R_i] described above, function `logcon` computes a concave, piecewise linear function phi and a probability p0 which satisfy int exp(phi(x)) dx = 1-p0 and jointly maximize the (normalized) log-likelihood.

l(φ, p_0) = (1/n) ∑_{i=1}^n [ 1{L_i = R_i} phi(X_i) + 1{L_i < R_i} log ( int_{L_i}^{R_i} exp phi(x) \, dx + 1{R_i = Inf} p_0 ) ].

If `x` is a two-column matrix, it is assumed to contain the left and right interval endpoints in the correct order. Intervals may have length zero (both endpoints equal) or be unbounded to the right (right endpoint is `Inf`). Computation is based on an EM algorithm, where the M-step uses an active set algorithm for computing the log-concave MLE for exact data with weights. The active set algorithm was described in Duembgen, Huesler, and Rufibach (2007) and Duembgen and Rufibach (2011) and is available in the R package `logcondens`. It has been re-implemented in C for the current package because of speed requirements. The whole algorithm for censored data has been indicated in Duembgen, Huesler, and Rufibach (2007) and was elaborated in Duembgen, Schuhmacher, and Rufibach (2013, preprint).

If `x` is a vector argument, it is assumed to contain the exact data points. In this case the active set algorithm is accessed directly.

In order to obtain a finite dimensional optimization problem the (supposed) domain of phi is subdivided by a grid. Stretches between interval endpoints where for theoretical reasons no knots (points where the slope of phi changes) can lie are left out. The argument `kink.prec` gives the maximal distance we allow between consecutive grid points in stretches where knots can lie. Say `plotint(x)` to see the grid.

The EM algorithm works only for fixed dimensionality of the problem, but the domain of the function phi is not a priori known. Therefore there is an outer loop starting with the largest possible domain, given by the minimal and maximal endpoints of all the intervals, and decreasing the domain as soon as the EM steps let phi become very small towards the boundary. “Very small” means that the integral of exp o phi over the first or last stretch between interval endpoints within the current domain falls below a certain threshold `red.thresh`, which can be controlled via `lc.control`.

Domain reduction tends to be rather conservative. If the computed solution has a suspiciously steep slope at any of the domain boundaries, the recommended strategy is to enforce a smaller domain by increasing the parameters `domind1l` and/or `domind2r` via `lc.control`. The function `loglike` may be used to compare the (normalized) log-likelihoods of the results.

`logConCens` is an alias for `logcon`. It is introduced to provide unified naming with the main functions in the packages `logcondens` and `logcondiscr`.

`logconcure` is the same as `logcon` with `adapt.p0 = TRUE` fixed.

## Value

An object of class `lcdensity` for which reasonable `plot`, `print`, and `summary` methods are available.

If the argument `x` is a two-column matrix (censored data case), such an object has the following components.

 `basedOn ` the string `"censored"` for the type of data the solution is based on. `status ` currently only `0` if the algorithm converged; and `1` otherwise. Note that in most cases even with status `1` the returned solution is very close to the truth. The `1` is often due to the fact that the termination criterion is not so well balanced yet. `x ` the data entered. `tau ` the ordered vector of different interval endpoints. `domind1, domind2` the indices of the `tau`-element at which the domain of the MLE phi starts/ends. `tplus` the grid vector. `tau[domind1:domind2]` augmented by points of subdivision. `isKnot` `0`-`1` value. For the finite elements of `tplus` a `1` if phi has a knot at this position, `0` otherwise. `phi` the vector of phi-values at the finite elements of `tplus`. `phislr` if sup(dom(phi)) = Inf, the slope of phi after the last knot. Otherwise -Inf. `phislr.range` a vector of length 2 specifying a range of possible values for `phislr`. This is for the (rather rare) situations that mass may be shifted between the interval from the rightmost tau-point to infinity and the cure parameter without changing the likelihood. Otherwise `phislr.range` is `NA`. `cure` the cure parameter. Either the original argument `p0` if `adapt.p0` was `FALSE`, otheriwse the estimated cure parameter obtained by the alternating maximization procedure. `cure.range` a vector of length 2 specifying a range of possible values for `cure` or `NA`. See `phislr.range`. `Fhat` the vector of values of the distribution function F of exp o phi at the finite elements of `tplus`. `Fhatfin` the computed value of lim_{t to Inf} F(t).

## Note

If `x` is a vector, this function does the same as the function `logConDens` in the package `logcondens`. The latter package offers additional features such as grid-based computation with weights (for high numerical stability) and smoothing of the estimator, as well as nicer plotting. For exact data we recommend using `logConDens` for everyday data analysis. `logcon` with a vector argument is to be preferred if time is of the essence (for data sets with several thousands of points or repeated optimization in iterative algorithms) or if an additional slope functionality is required.

Two other helpful packages for log-concave density estimation based on exact data are `logcondiscr` for estimating a discrete distribution and `LogConcDEAD` for estimating a multivariate continuous distribution.

## Author(s)

Dominic Schuhmacher [email protected]
Kaspar Rufibach [email protected]
Lutz Duembgen [email protected]

## References

Duembgen, L., Huesler, A., and Rufibach, K. (2007). Active set and EM algorithms for log-concave densities based on complete and censored data. Technical Report 61. IMSV, University of Bern. http://arxiv.org/abs/0707.4643

Duembgen, L. and Rufibach, K., (2011). logcondens: Computations Related to Univariate Log-Concave Density Estimation. Journal of Statistical Software, 39(6), 1-28. http://www.jstatsoft.org/v39/i06

Duembgen, L., Rufibach, K., and Schuhmacher, D. (2013, preprint). Maximum likelihood estimation of a log-concave density based on censored data. http://arxiv.org/pdf/1311.6403v2.pdf

`lc.control`, `lcdensity-methods`, `loglike`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163``` ```# A function for artificially censoring exact data censor <- function(y, timemat) { tm <- cbind(0,timemat,Inf) n <- length(y) res <- sapply(1:n, function(i){ return( c( max(tm[i,][tm[i,] < y[i]]), min(tm[i,][tm[i,] >= y[i]]) ) ) } ) return(t(res)) } # ------------------------ # interval censored data # ------------------------ set.seed(20) n <- 100 # generate exact data: y <- rgamma(n,3) # generate matrix of inspection times: itimes <- matrix(rexp(10*n),n,10) itimes <- t(apply(itimes,1,cumsum)) # transform exact data to interval data x <- censor(y, itimes) # plot both plotint(x, imarks=y) # Compute censored log-concave MLE # (assuming only the censored data is available to us) res <- logcon(x) plot(res) # Compare it to the log-concave MLE for the exact data # and to the true Gamma(3,1) log-density res.ex <- logcon(y) lines(res.ex\$x, res.ex\$phi, lwd=2.5, lty=2) xi <- seq(0,14,0.05) lines(xi,log(dgamma(xi,3,1)), col=3, lwd=2) # ------------------------- # censored data with cure # ------------------------- ## Not run: set.seed(21) n <- 100 # generate exact data: y <- rgamma(n,3) cured <- as.logical(rbinom(n,1,0.3)) y[cured] <- Inf # generate matrix of inspection times: itimes <- matrix(rexp(6*n),n,6) itimes <- t(apply(itimes,1,cumsum)) # transform exact data to interval data x <- censor(y, itimes) # plot both plotint(x, imarks=y) # Compute censored log-concave MLE including cure parameter # (assuming only the censored data is available to us) res <- logcon(x, adapt.p0=TRUE) plot(res) # There is a trade-off between right-hand slope and cure parameter here # (seen by the grey area on the right), but the margin is very small: res\$cure.range # Compare the corresponding CDF to the true CDF plot(res, type="CDF") xi <- seq(0,14,0.05) lines(xi,0.7*pgamma(xi,3,1), col=3, lwd=2) # Note that the trade-off for the right-hand slope is not visible anymore # (in terms of the CDF the effect is too small) ## End(Not run) # ------------------------------------ # real right censored data with cure # ------------------------------------ # Look at data set ovarian from package survival # Gives survival times in days for 26 patients with advanced ovarian carcinoma, # ignoring the covariates # Bring data to right format and plot it ## Not run: library(survival) data(ovarian) sobj <- Surv(ovarian\$futime, ovarian\$fustat) x <- cbind(sobj[,1], ifelse(as.logical(sobj[,2]),sobj[,1],Inf)) plotint(x) # Compute censored log-concave MLE including cure parameter res <- logcon(x, adapt.p0=TRUE) # Compare the corresponding survival function to the Kaplan-Meier estimator plot(res, type="survival") res.km <- survfit(sobj ~ 1) lines(res.km, lwd=1.5) ## End(Not run) # ---------------------- # current status data # ---------------------- ## Not run: set.seed(22) n <- 200 # generate exact data y <- rweibull(n,2,1) # generate vector of inspection times itime <- matrix(rexp(n),n,1) # transform exact data to interval data x <- censor(y, itime) # plot both plotint(x, imarks=y) # Compute censored log-concave MLE # (assuming only the censored data is available to us) res <- logcon(x) plot(res, type="CDF") # Compare it to the true Weibull(2,1) c.d.f. xi <- seq(0,3,0.05) lines(xi,pweibull(xi,2,1), col=3, lwd=2) ## End(Not run) # ---------------------- # rounded/binned data # ---------------------- ## Not run: set.seed(23) n <- 100 # generate data in [0,1] rounded to one digit y <- round(rbeta(n,2,3),1) # bring data to right format and plot it x <- cbind(y-0.05,y+0.05) plotint(x) # Compute censored log-concave MLE res <- logcon(x) plot(res, type="density", xlim=c(0,1)) # Compare it to the true Beta(2,3) density xi <- seq(0,1,0.005) lines(xi,dbeta(xi,2,3), col=3, lwd=2) # The peaks in the estimated density are often considered unsatisfactory # However, they are barely noticeable in the c.d.f. plot(res, type="CDF", xlim=c(0,1)) lines(xi,pbeta(xi,2,3), col=3, lwd=2) # To get rid of them in the density apply the smoothing # proposed in the package logcondens (to be implemented here) ## End(Not run) ```