# calculate non-parametric MLE for interval censored survival function

### Description

This function calculates the the non-parametric maximum likelihood estimate for the distribution from interval
censored data using the self-consistent estimator, so the associated survival distribution generalizes
the Kaplan-Meier estimate to interval censored data. Formulas using `Surv`

are allowed similar to `survfit`

.

### Usage

1 2 3 4 5 |

### Arguments

`L` |
numeric vector of left endpoints of censoring interval (equivalent to first element of |

`R` |
numeric vector of right endpoints of censoring interval (equivalent to second element of |

`initfit` |
an initial estimate as an object of class |

`control` |
list of arguments for controling algorithm (see |

`Lin` |
logical vector, should L be included in the interval? (see details) |

`Rin` |
logical vector, should R be included in the interval? (see details) |

`formula` |
a formula with response a numeric vector (which assumes no censoring) or |

`data` |
an optional matrix or data frame containing the variables in the formula. By default the variables are taken from environment(formula). |

`conf.int` |
logical, estimate confidence interval? For setting conf.level, etc see |

`...` |
values passed to other functions |

### Details

The `icfit`

function fits the nonparametric maximum likelihood estimate (NPMLE) of the
distribution function for interval censored data. In the default case (when Lin=Rin=NULL)
we assume there are n (n=length(L)) failure times, and the ith one is in the interval
between L[i] and R[i]. The default is not to include L[i] in the interval unless L[i]=R[i],
and to include R[i] in the interval unless R[i]=Inf. When Lin and Rin are not NULL they describe
whether to include L and R in the associated interval. If either Lin or Rin is length 1 then it is
repeated n times, otherwise they should be logicals of length n.

The algorithm is basically an EM-algorithm applied to
interval censored data (see Turnbull, 1976); however
first we can define a set of intervals (called the Turnbull intervals)
which are the only intervals where the NPMLE may change. The Turnbull intervals are also called the
innermost intervals, and are the result of the primary reduction (see Aragon and
Eberly, 1992). The starting distribution for the E-M algorithm is given by `initfit`

, which may be either
(1) NULL, in which case a very simple and quick starting distribution is used (see code), (2) a character vector
describing a function with inputs, L,R, Lin, Rin, and A, see for example `initcomputeMLE`

, (3)
a list giving `pf`

and `intmap`

values, e.g., an `icfit`

object. If option (2) is tried and results in an error then
the starting distribution reverts to the one used with option (1).
Convergence is defined when the maximum
reduced gradient is less than epsilon (see `icfitControl`

), and the
Kuhn-Tucker conditions are approximately met,
otherwise a warning will result. (see Gentleman and
Geyer, 1994). There are other faster algorithms (for example see
`EMICM`

in the package
`Icens`

.

The output is of class `icfit`

which is identical to the `icsurv`

class of the
`Icens`

package when there is only one group for which a distribution is needed.
Following that class, there is an `intmap`

element which gives the bounds
about which each drop in the NPMLE survival function can occur.

Since the classes `icfit`

and `icsurv`

are so closely related, one can directly
use of initial (and faster) fits from the `Icens`

package as input in
`initfit`

. Note that when using a non-null `initfit`

, the `Lin`

and `Rin`

values of the
initial fit are ignored. Alternatively, one may give the name of the function used to calculate the initial fit.
The function is assumed to input the transpose of the A matrix (called A in the Icens package). Options can be passed
to initfit function as a list using the initfitOpts variable in `icfitControl`

.

The advantage of the `icfit`

function over those in `Icens`

package is that it allows a call similar
to that used in `survfit`

of the `survival`

package so that different groups may be
plotted at the same time with similar calls.

An `icfit`

object prints as a list (see value below). A `print`

function prints output as a list
except suppresses printing of A matrix. A `summary`

function prints the
distribution (i.e., probabilities and the intervals where those
probability masses are known to reside) for each group in the icfit object. There is also
a plot method, see `plot.icfit`

.

For additional references and background see Fay and Shaw (2010).

The confidence interval method is a modified bootstrap. This can be very time consuming, see warning. The method uses a percentile bootstrap confidence interval with default B=200
replicates (see `icfitControl`

), with modifications that prevent lower intervals of 1 and upper intervals of 0. Specifically, if there are
n observations total, then at any time the largest value of the lower interval for survival is binom.test(n,n,conf.level=control()$conf.level)$conf.int[1] and analogously
the upper interval bounds using binom.test(0,n). The output (CI element of returned list) gives confidence intervals just before and just after each
assessment time (as defined by icfitControl$timeEpsilon).

### Value

An object of class `icfit`

(same as icsurv class, see details).

There are 4 methods for this class: `plot.icfit`

, `print.icfit`

, `summary.icfit`

, and `[.icfit`

. The last method
pulls out individual fits when the right side of the formula of the `icfit`

call was a factor.

A list with elements:

`A` |
this is the n by k matrix of indicator functions, NULL if more than one strata, not printed by default |

`strata` |
a named numeric vector of numbers of observations in each strata, if one strata observation named NPMLE |

`error` |
this is max(d + u - n), see Gentleman and Geyer, 1994 |

`numit` |
number of iterations |

`pf` |
vector of estimated probabilities of the distribution |

`intmap` |
2 by k matrix, where the ith column defines an interval corresponding to the probability, pf[i] |

`converge` |
a logical, TRUE if normal convergence |

`message` |
character text message on about convergence |

`anypzero` |
logical denoting whether any of the Turnbull intervals were set to zero |

`CI` |
if conf.int=TRUE included as a list of lists for each stratum, each one having elements time, lower, upper, confMethod, conf.level |

### Warning

The confidence interval method can be very time consuming because it uses a modified bootstrap and the NPMLE is recalculated for each replication. That is why the default only uses 200 bootstrap replications. A message gives a crude estimate of how long the confidence interval calculation will take (it calculates a per replication value by averaging the time of the first 10 replications), but that estimate can be off by 100 percent or more because the time to calculate each bootstrap replication is quite variable.

### Author(s)

Michael P. Fay

### References

Aragon, J and Eberly, D (1992). On convergence of convex minorant algorithms for distribution estimation with interval-censored data. J. of Computational and Graphical Statistics. 1: 129-140.

Fay, MP and Shaw, PA (2010). Exact and Asymptotic Weighted Logrank Tests for Interval Censored Data: The interval R package. Journal of Statistical Software. http://www.jstatsoft.org/v36/i02/. 36 (2):1-34.

Gentleman, R. and Geyer, C.J. (1994). Maximum likelihood for interval censored data:consistency and computation. Biometrika, 81, 618-623.

Turnbull, B.W. (1976) The empirical distribution function with arbitrarily grouped, censored and truncated data. J. R. Statist. Soc. B 38, 290-295.

### See Also

`ictest`

, `EMICM`

### Examples

1 2 3 4 5 |