Returns a list of “ordered” observations and associated plotting positions based on Type I leftcensored or rightcensored data. These plotting positions may be used to construct empirical cumulative distribution plots or quantilequantile plots, or to estimate distribution parameters.
1 2  ppointsCensored(x, censored, censoring.side = "left",
prob.method = "michaelschucany", plot.pos.con = 0.375)

x 
numeric vector of observations. Missing ( 
censored 
numeric or logical vector indicating which values of 
censoring.side 
character string indicating on which side the censoring occurs. The possible values are

prob.method 
character string indicating what method to use to compute the plotting positions
(empirical probabilities). Possible values are: The default value is The 
plot.pos.con 
numeric scalar between 0 and 1 containing the value of the plotting position constant.
The default value is 
Methods for computing plotting positions for complete data sets
(no censored observations) are discussed in D'Agostino, R.B. (1986a) and
Cleveland (1993). For data sets with censored observations, these methods
must be modified. The function ppointsCensored
allows you to compute
plotting positions based on any of the following methods:
Productlimit method of Kaplan and Meier (1958) (prob.method="kaplanmeier"
).
Hazard plotting method of Nelson (1972) (prob.method="nelson"
).
Generalization of the productlimit method due to Michael and Schucany (1986)
(prob.method="michaelschucany"
) (the default).
Generalization of the productlimit method due to Hirsch and Stedinger (1987)
(prob.method="hirschstedinger"
).
Let \underline{x} denote a random sample of N observations from some distribution. Assume n (0 < n < N) of these observations are known and c (c=Nn) of these observations are all censored below (leftcensored) or all censored above (rightcensored) at k fixed censoring levels
T_1, T_2, …, T_K; \; K ≥ 1 \;\;\;\;\;\; (1)
For the case when K ≥ 2, the data are said to be Type I multiply censored. For the case when K=1, set T = T_1. If the data are leftcensored and all n known observations are greater than or equal to T, or if the data are rightcensored and all n known observations are less than or equal to T, then the data are said to be Type I singly censored (Nelson, 1982, p.7), otherwise they are considered to be Type I multiply censored.
Let c_j denote the number of observations censored below or above censoring level T_j for j = 1, 2, …, K, so that
∑_{i=1}^K c_j = c \;\;\;\;\;\; (2)
Let x_{(1)}, x_{(2)}, …, x_{(N)} denote the “ordered” observations, where now “observation” means either the actual observation (for uncensored observations) or the censoring level (for censored observations). For rightcensored data, if a censored observation has the same value as an uncensored one, the uncensored observation should be placed first. For leftcensored data, if a censored observation has the same value as an uncensored one, the censored observation should be placed first.
Note that in this case the quantity x_{(i)} does not necessarily represent the i'th “largest” observation from the (unknown) complete sample.
Finally, let Ω (omega) denote the set of n subscripts in the
“ordered” sample that correspond to uncensored observations.
ProductLimit Method of Kaplan and Meier (prob.method="kaplanmeier"
)
For complete data sets (no censored observations), the
empirical probabilities estimator of the cumulative distribution
function evaluated at the i'th ordered observation is given by
(D'Agostino, 1986a, p.8):
\hat{F}[x_{(i)}] = \hat{p}_i = \frac{\#[x_j ≤ x_{(i)}]}{n} \;\;\;\;\;\; (3)
where \#[x_j ≤ x_{(i)}] denotes the number of observations less than
or equal to x_{(i)} (see the help file for ecdfPlot
).
Kaplan and Meier (1958) extended this method of computing the empirical cdf to
the case of rightcensored data.
RightCensored Data (censoring.side="right"
)
Let S(t) denote the survival function evaluated at t, that is:
S(t) = 1  F(t) = Pr(X > t) \;\;\;\;\;\; (4)
Kaplan and Meier (1958) show that a nonparametric estimate of the survival function at the i'th ordered observation that is not censored (i.e., i \in Ω), is given by:
\hat{S}[x_{(i)}]  =  \widehat{Pr}[X > x_{(i)}] 
=  \widehat{Pr}[X > x_{(1)}]  
\;\; \widehat{Pr}[X > x_{(2)}  X > x_{(1)}] \;\; \cdots  
\;\; \widehat{Pr}[X > x_{(i)}  X > x_{(i1)}]  
=  ∏_{j \in Ω, j ≤ i} \frac{n_j  d_j}{n_j}, \;\; i \in Ω \;\;\;\;\; (5) 
where n_j is the number of observations (uncensored or censored) with values greater than or equal to x_{(j)}, and d_j denotes the number of uncensored observations exactly equal to x_{(j)} (if there are no tied uncensored observations then d_j will equal 1 for all values of j). (See also Lee and Wang, 2003, pp. 64–69; Michael and Schucany, 1986). By convention, the estimate of the survival function at a censored observation is set equal to the estimated value of the survival function at the largest uncensored observation less than or equal to that censoring level. If there are no uncensored observations less than or equal to a particular censoring level, the estimate of the survival function is set to 1 for that censoring level.
Thus the KaplanMeier plotting position at the i'th ordered observation that is not censored (i.e., i \in Ω), is given by:
\hat{p}_i = \hat{F}[x_{(i)}] = 1  ∏_{j \in Ω, j ≤ i} \frac{n_j  d_j}{n_j} \;\;\;\;\;\; (6)
The plotting position for a censored observation is set equal to the plotting position associated with the largest uncensored observation less than or equal to that censoring level. If there are no uncensored observations less than or equal to a particular censoring level, the plotting position is set to 0 for that censoring level.
As an example, consider the following rightcensored data set:
3, ≥4, ≥4, 5, 5, 6
The table below shows how the plotting positions are computed.
i  x_{(i)}  n_i  d_i  \frac{n_id_i}{n_i}  Plotting Position 
1  3  6  1  5/6  1  (5/6) = 0.167 
2  ≥4  
3  ≥4  
4  5  3  2  1/3  1  (5/6)(1/3) = 0.722 
5  5  0.722  
6  6  1  1  0/1  1  (5/6)(1/3)(0/1) = 1 
Note that for complete data sets, Equation (6) reduces to Equation (3).
LeftCensored Data (censoring.side="left"
)
Gillespie et al. (2010) give formulas for the KaplanMeier estimator for the case of
leftcesoring (censoring.side="left"
). In this case, the plotting position
for the i'th ordered observation, assuming it is not censored, is computed as:
\hat{p}_i = \hat{F}[x_{(i)}] = ∏_{j \in Ω, j > i} \frac{n_j  d_j}{n_j} \;\;\;\;\;\; (7)
where n_j is the number of observations (uncensored or censored) with values less than or equal to x_{(j)}, and d_j denotes the number of uncensored observations exactly equal to x_{(j)} (if there are no tied uncensored observations then d_j will equal 1 for all values of j). The plotting position is equal to 1 for the largest uncensored order statistic.
As an example, consider the following leftcensored data set:
3, <4, <4, 5, 5, 6
The table below shows how the plotting positions are computed.
i  x_{(i)}  n_i  d_i  \frac{n_id_i}{n_i}  Plotting Position 
1  3  1  1  0/1  1(5/6)(3/5) = 0.5 
2  <4  
3  <4  
4  5  5  2  3/5  0.833 
5  5  1(5/6) = 0.833  
6  6  6  1  5/6  1 
Note that for complete data sets, Equation (7) reduces to Equation (3).
Modified KaplanMeier Method (prob.method="modified kaplanmeier"
)
(LeftCensored Data Only.) For leftcensored data, the modified KaplanMeier
method is the same as the KaplanMeier method, except that for the largest
uncensored order statistic, the plotting position is not set to 1 but rather is
set equal to the Blom plotting position: (N  0.375)/(N + 0.25). This method
is useful, for example, when creating QuantileQuantile plots.
Hazard Plotting Method of Nelson (prob.method="nelson"
)
(RightCensored Data Only.) For rightcensored data, Equation (5) can be
rewritten as:
\hat{S}[x_{(i)}] = ∏_{j \in Ω, j ≤ i} \frac{Nj}{Nj+1}, \;\; i \in Ω \;\;\;\;\;\; (8)
Nelson (1972) proposed the following formula for plotting positions for the uncensored observations in the context of estimating the hazard function (see Michael and Schucany,1986, p.469):
\hat{p}_i = \hat{F}[x_{(i)}] = 1  ∏_{j \in Ω, j ≤ i} exp(\frac{1}{Nj+1}) \;\;\;\;\;\; (9)
See Lee and Wang (2003) for more information about the hazard function.
As for the Kaplan and Meier (1958) method, the plotting position for a censored
observation is set equal to the plotting position associated with the largest
uncensored observation less than or equal to that censoring level. If there are
no uncensored observations less than or equal to a particular censoring level,
the plotting position is set to 0 for that censoring level.
Generalization of ProductLimit Method, Michael and Schucany
(prob.method="michaelschucany"
)
For complete data sets, the disadvantage of using Equation (3) above to define
plotting positions is that it implies the largest observed value is the maximum
possible value of the distribution (the 100'th percentile). This may be
satisfactory if the underlying distribution is known to be discrete, but it is
usually not satisfactory if the underlying distribution is known to be continuous.
A more frequently used formula for plotting positions for complete data sets is given by:
\hat{F}[x_{(i)}] = \hat{p}_i = \frac{i  a}{N  2a + 1} \;\;\;\;\;\; (10)
where 0 ≤ a ≤ 1 (Cleveland, 1993, p. 18; D'Agostino, 1986a, pp. 8,25).
The value of a is usually chosen so that the plotting positions are
approximately unbiased (i.e., approximate the mean of their distribution) or else
approximate the median value of their distribution (see the help file for
ecdfPlot
). Michael and Schucany (1986) extended this method for
both left and rightcensored data sets.
RightCensored Data (censoring.side="right"
)
For rightcensored data sets, the plotting positions for the uncensored
observations are computed as:
\hat{p}_i = 1  \frac{Na+1}{N2a+1} ∏_{j \in Ω, j ≤ i} \frac{Nja+1}{Nja+2} \;\; i \in Ω \;\;\;\;\;\; (11)
Note that the plotting positions proposed by Herd (1960) and Johnson (1964) are a special case of Equation (11) with a=0. Equation (11) reduces to Equation (10) in the case of complete data sets. Note that unlike the KaplanMeier method, plotting positions associated with tied uncensored observations are not the same (just as in the case for complete data using Equation (10)).
As for the Kaplan and Meier (1958) method, for rightcensored data the plotting
position for a censored observation is set equal to the plotting position associated
with the largest uncensored observation less than or equal to that censoring level.
If there are no uncensored observations less than or equal to a particular censoring
level, the plotting position is set to 0 for that censoring level.
LeftCensored Data (censoring.side="left"
)
For leftcensored data sets the plotting positions are computed as:
\hat{p}_i = \frac{Na+1}{N2a+1} ∏_{j \in Ω, j ≥ i} \frac{ja}{ja+1} \;\; i \in Ω \;\;\;\;\;\; (12)
Equation (12) reduces to Equation (10) in the case of complete data sets. Note that unlike the KaplanMeier method, plotting positions associated with tied uncensored observations are not the same (just as in the case for complete data using Equation (10)).
For leftcensored data, the plotting position for a censored observation is set
equal to the plotting position associated with the smallest uncensored observation
greater than or equal to that censoring level. If there are no uncensored
observations greater than or equal to a particular censoring level, the plotting
position is set to 1 for that censoring level.
Generalization of ProductLimit Method, Hirsch and Stedinger
(prob.method="hirschstedinger"
)
Hirsch and Stedinger (1987) use a slightly different approach than Kaplan and Meier
(1958) and Michael and Schucany (1986) to derive a nonparametric estimate of the
survival function (probability of exceedance) in the context of leftcensored data.
First they estimate the value of the survival function at each of the censoring
levels. The value of the survival function for an uncensored observation between
two adjacent censoring levels is then computed by linear interpolation (in the form
of a plotting position). See also Helsel and Cohn (1988).
The discussion below presents an extension of the method of Hirsch and Stedinger (1987) to the case of rightcensored data, and then presents the original derivation due to Hirsch and Stedinger (1987) for leftcensored data.
RightCensored Data (censoring.side="right"
)
For rightcensored data, the survival function is estimated as follows.
For the j'th censoring level (j = 0, 1, …, K), write the
value of the survival function as:
S(T_j)  =  Pr[X > T_j] 
=  Pr[X > T_{j+1}] + Pr[T_j < X ≤ T_{j+1}]  
=  S(T_{j+1}) + Pr[T_j < X ≤ T_{j+1}  X > T_j] Pr[X > T_j]  
=  S(T_{j+1}) + Pr[T_j < X ≤ T_{j+1}  X > T_j] S(T_j) \;\;\;\;\;\; (13) 
where
T_0 = ∞, \;\;\;\;\;\; (14)
T_{K+1} = ∞ \;\;\;\;\;\; (15)
Now set
A_j  =  # uncensored observations in (T_j, T_{j+1}] \;\;\;\;\;\; (16) 
B_j  =  # observations in (T_{j+1}, ∞) \;\;\;\;\;\; (17) 
for j = 0, 1, …, K. Then the method of moments estimator of the conditional probability in Equation (13) is given by:
\widehat{Pr}[T_j < X ≤ T_{j+1}  X > T_j] = \frac{A_j}{A_j + B_j} \;\;\;\;\;\; (18)
Hence, by equations (13) and (18) we have
\hat{S}(T_j) = \hat{S}(T_{j+1}) + (\frac{A_j}{A_j + B_j}) \hat{S}(T_{j}) \;\;\;\;\;\; (19)
which can be rewritten as:
\hat{S}(T_{j+1}) = \hat{S}(T_j) [1  (\frac{A_j}{A_j + B_j})] \;\;\;\;\;\; (20)
Equation (20) can be solved interatively for j = 1, 2, …, K. Note that
\hat{S}(T_0) = \hat{S}(∞) = S(∞) = 1 \;\;\;\;\;\; (21)
\hat{S}(T_{K+1}) = \hat{S}(∞) = S(∞) = 0 \;\;\;\;\;\; (22)
Once the values of the survival function at the censoring levels are computed, the plotting positions for the A_j uncensored observations in the interval (T_J, T_{j+1}] (j = 0, 1, …, K) are computed as
\hat{p}_i = [1  \hat{S}(T_j)] + [\hat{S}(T_j)  \hat{S}(T_{j+1})] \frac{ra}{A_j  2a + 1} \;\;\;\;\;\; (23)
where a denotes the plotting position constant, 0 ≤ a ≤ 1, and r denotes the rank of the i'th observation among the A_j uncensored observations in the interval (T_J, T_{j+1}]. (Tied observations are given distinct ranks.)
For the c_j observations censored at censoring level T_j (j = 1, 2, …, K), the plotting positions are computed as:
\hat{p}_i = 1  [\hat{S}(T_j) \frac{ra}{c_j  2a + 1}] \;\;\;\;\;\; (24)
where r denotes the rank of the i'th observation among the c_j
observations censored at censoring level T_j. Note that all the
observations censored at the same censoring level are given distinct ranks,
even though there is no way to distinguish between them.
LeftCensored Data (censoring.side="left"
)
For leftcensored data, Hirsch and Stedinger (1987) modify the definition of the
survival function as follows:
S^*(t) = Pr[X ≥ t] \;\;\;\;\;\; (25)
For continuous distributions, the functions in Equations (4) and (25) are identical.
Hirsch and Stedinger (1987) show that for the j'th censoring level (j = 0, 1, …, K), the value of the survival function can be written as:
S(T_j)  =  Pr[X ≥ T_j] 
=  Pr[X ≥ T_{j+1}] + Pr[T_j ≤ X < T_{j+1}]  
=  S^*(T_{j+1}) + Pr[T_j ≤ X < T_{j+1}  X < T_{j+1}] Pr[X < T_{j+1}]  
=  S^*(T_{j+1}) + Pr[T_j ≤ X < T_{j+1}  X < T_j] [1  S^*(T_j)] \;\;\;\;\;\; (26) 
where T_0 and T_{K+1} are defined in Equations (14) and (15).
Now set
A_j  =  # uncensored observations in [T_j, T_{j+1}) \;\;\;\;\;\; (27) 
B_j  =  # observations in (∞, T_j) \;\;\;\;\;\; (28) 
for j = 0, 1, …, K. Then the method of moments estimator of the conditional probability in Equation (26) is given by:
Pr[T_j ≤ X < T_{j+1}  X < T_{j+1}] = \frac{A_j}{A_j + B_j} \;\;\;\;\;\; (29)
Hence, by Equations (26) and (29) we have
\hat{S}(T_j) = \hat{S}(T_{j+1}) + (\frac{A_j}{A_j + B_j}) \hat{S}(T_{j}) \;\;\;\;\;\; (30)
which can be solved interatively for j = 1, 2, …, K. Note that
\widehat{S^*}(T_{K+1}) = \widehat{S^*}(∞) = S^*(∞) = 0 \;\;\;\;\;\; (31)
\widehat{S^*}(T_0) = \widehat{S^*}(∞) = S^*(∞) = 1 \;\;\;\;\;\; (32)
Once the values of the survival function at the censoring levels are computed, the plotting positions for the A_j uncensored observations in the interval [T_J, T_{j+1}) (j = 0, 1, …, K) are computed as
\hat{p}_i = [1  \widehat{S^*}(T_j)] + [\widehat{S^*}(T_j)  \widehat{S^*}(T_{j+1})] \frac{ra}{A_j  2a + 1} \;\;\;\;\;\; (33)
where a denotes the plotting position constant, 0 ≤ a ≤ 0.5, and r denotes the rank of the i'th observation among the A_j uncensored observations in the interval [T_J, T_{j+1}). (Tied observations are given distinct ranks.)
For the c_j observations censored at censoring level T_j (j = 1, 2, …, K), the plotting positions are computed as:
\hat{p}_i = [1  \widehat{S^*}(T_j)] \frac{ra}{c_j  2a + 1} \;\;\;\;\;\; (34)
where r denotes the rank of the i'th observation among the c_j observations censored at censoring level T_j. Note that all the observations censored at the same censoring level are given distinct ranks, even though there is no way to distinguish between them.
ppointsCensored
returns a list with the following components:
Order.Statistics 
numeric vector of the “ordered” observations. 
Cumulative.Probabilities 
numeric vector of the associated plotting positions. 
Censored 
logical vector indicating which of the ordered observations are censored. 
Censoring.Side 
character string indicating whether the data are left or rightcensored.
This is same value as the argument 
Prob.Method 
character string indicating what method was used to compute the plotting positions.
This is the same value as the argument 
Optional Component (only present when prob.method="michaelschucany"
or
prob.method="hirschstedinger"
):
Plot.Pos.Con 
numeric scalar containing the value of the plotting position constant that was used.
This is the same as the argument 
For censored data sets, plotting positions may be used to construct empirical cumulative distribution
plots (see ecdfPlotCensored
), construct quantilequantile plots
(see qqPlotCensored
), or to estimate distribution parameters
(see FcnsByCatCensoredData
).
The function survfit
in the builtin R library
survival computes the survival function for rightcensored, leftcensored, or
intervalcensored data. Calling survfit
with
type="kaplanmeier"
will produce similar results to calling
ppointsCensored
with prob.method="kaplanmeier"
. Also, calling
survfit
with type="fh2"
will produce similar results
to calling ppointsCensored
with prob.method="nelson"
.
Helsel and Cohn (1988, p.2001) found very little effect of changing the value of the plotting position constant when using the method of Hirsch and Stedinger (1987) to compute plotting positions for multiply leftcensored data. In general, there will be very little difference between plotting positions computed by the different methods except in the case of very small samples and a large amount of censoring.
Steven P. Millard (EnvStats@ProbStatInfo.com)
Chambers, J.M., W.S. Cleveland, B. Kleiner, and P.A. Tukey. (1983). Graphical Methods for Data Analysis. Duxbury Press, Boston, MA, pp.1116.
Cleveland, W.S. (1993). Visualizing Data. Hobart Press, Summit, New Jersey, 360pp.
D'Agostino, R.B. (1986a). Graphical Analysis. In: D'Agostino, R.B., and M.A. Stephens, eds. Goodnessof Fit Techniques. Marcel Dekker, New York, Chapter 2, pp.762.
Gillespie, B.W., Q. Chen, H. Reichert, A. Franzblau, E. Hedgeman, J. Lepkowski, P. Adriaens, A. Demond, W. Luksemburg, and D.H. Garabrant. (2010). Estimating Population Distributions When Some Data Are Below a Limit of Detection by Using a Reverse KaplanMeier Estimator. Epidemiology 21(4), S64–S70.
Helsel, D.R. (2012). Statistics for Censored Environmental Data Using Minitab and R, Second Edition. John Wiley \& Sons, Hoboken, New Jersey.
Helsel, D.R., and T.A. Cohn. (1988). Estimation of Descriptive Statistics for Multiply Censored Water Quality Data. Water Resources Research 24(12), 19972004.
Hirsch, R.M., and J.R. Stedinger. (1987). Plotting Positions for Historical Floods and Their Precision. Water Resources Research 23(4), 715727.
Kaplan, E.L., and P. Meier. (1958). Nonparametric Estimation From Incomplete Observations. Journal of the American Statistical Association 53, 457481.
Lee, E.T., and J. Wang. (2003). Statistical Methods for Survival Data Analysis, Third Edition. John Wiley and Sons, New York.
Michael, J.R., and W.R. Schucany. (1986). Analysis of Data from Censored Samples. In D'Agostino, R.B., and M.A. Stephens, eds. Goodnessof Fit Techniques. Marcel Dekker, New York, 560pp, Chapter 11, 461496.
Nelson, W. (1972). Theory and Applications of Hazard Plotting for Censored Failure Data. Technometrics 14, 945966.
USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R09007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. Chapter 15.
USEPA. (2010). Errata Sheet  March 2009 Unified Guidance. EPA 530/R09007a, August 9, 2010. Office of Resource Conservation and Recovery, Program Information and Implementation Division. U.S. Environmental Protection Agency, Washington, D.C.
ppoints
, ecdfPlot
, qqPlot
,
ecdfPlotCensored
, qqPlotCensored
,
survfit
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159  # Generate 20 observations from a normal distribution with mean=20 and sd=5,
# censor all observations less than 18, then compute plotting positions for
# this data set. Compare the plotting positions to the plotting positions
# for the uncensored data set. Note that the plotting positions for the
# censored data set start at the first ordered uncensored observation and
# that for values of x > 18 the plotting positions for the two data sets are
# exactly the same. This is because there is only one censoring level and
# no uncensored observations fall below the censored observations.
# (Note: the call to set.seed simply allows you to reproduce this example.)
set.seed(333)
x < rnorm(20, mean=20, sd=5)
censored < x < 18
censored
# [1] FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
#[13] FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
sum(censored)
#[1] 7
new.x < x
new.x[censored] < 18
round(sort(new.x),1)
# [1] 18.0 18.0 18.0 18.0 18.0 18.0 18.0 18.1 18.7 19.6 20.2 20.3 20.6 21.4
#[15] 21.8 21.8 23.2 26.2 26.8 29.7
p.list < ppointsCensored(new.x, censored)
p.list
#$Order.Statistics
# [1] 18.00000 18.00000 18.00000 18.00000 18.00000 18.00000 18.00000 18.09771
# [9] 18.65418 19.58594 20.21931 20.26851 20.55296 21.38869 21.76359 21.82364
#[17] 23.16804 26.16527 26.84336 29.67340
#
#$Cumulative.Probabilities
# [1] 0.3765432 0.3765432 0.3765432 0.3765432 0.3765432 0.3765432 0.3765432
# [8] 0.3765432 0.4259259 0.4753086 0.5246914 0.5740741 0.6234568 0.6728395
#[15] 0.7222222 0.7716049 0.8209877 0.8703704 0.9197531 0.9691358
#
#$Censored
# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
#[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#
#$Censoring.Side
#[1] "left"
#
#$Prob.Method
#[1] "michaelschucany"
#
#$Plot.Pos.Con
#[1] 0.375
#
# Round off plotting positions to two decimal places
# and compare to plotting positions that ignore censoring
#
round(p.list$Cum, 2)
# [1] 0.38 0.38 0.38 0.38 0.38 0.38 0.38 0.38 0.43 0.48 0.52 0.57 0.62 0.67
#[15] 0.72 0.77 0.82 0.87 0.92 0.97
round(ppoints(x, a=0.375), 2)
# [1] 0.03 0.08 0.13 0.18 0.23 0.28 0.33 0.38 0.43 0.48 0.52 0.57 0.62 0.67
#[15] 0.72 0.77 0.82 0.87 0.92 0.97
#
# Clean up
#
rm(x, censored, new.x, p.list)
#
# Reproduce the example in Appendix B of Helsel and Cohn (1988). The data
# are stored in Helsel.Cohn.88.appb.df. This data frame contains 18
# observations, of which 9 are censored below one of 2 distinct censoring
# levels.
Helsel.Cohn.88.app.b.df
# Conc.orig Conc Censored
#1 <1 1 TRUE
#2 <1 1 TRUE
#...
#17 33 33 FALSE
#18 50 50 FALSE
p.list < with(Helsel.Cohn.88.app.b.df,
ppointsCensored(Conc, Censored, prob.method="hirschstedinger", plot.pos.con=0))
lapply(p.list[1:2], round, 3)
#$Order.Statistics
# [1] 1 1 1 1 1 1 3 7 9 10 10 10 12 15 20 27 33 50
#
#$Cumulative.Probabilities
# [1] 0.063 0.127 0.190 0.254 0.317 0.381 0.500 0.556 0.611 0.167 0.333 0.500
#[13] 0.714 0.762 0.810 0.857 0.905 0.952
# Clean up
#
rm(p.list)
#
# Example 151 of USEPA (2009, page 1510) gives an example of
# computing plotting positions based on censored manganese
# concentrations (ppb) in groundwater collected at 5 monitoring
# wells. The data for this example are stored in
# EPA.09.Ex.15.1.manganese.df.
EPA.09.Ex.15.1.manganese.df
# Sample Well Manganese.Orig.ppb Manganese.ppb Censored
#1 1 Well.1 <5 5.0 TRUE
#2 2 Well.1 12.1 12.1 FALSE
#3 3 Well.1 16.9 16.9 FALSE
#4 4 Well.1 21.6 21.6 FALSE
#5 5 Well.1 <2 2.0 TRUE
#...
#21 1 Well.5 17.9 17.9 FALSE
#22 2 Well.5 22.7 22.7 FALSE
#23 3 Well.5 3.3 3.3 FALSE
#24 4 Well.5 8.4 8.4 FALSE
#25 5 Well.5 <2 2.0 TRUE
p.list.EPA < with(EPA.09.Ex.15.1.manganese.df,
ppointsCensored(Manganese.ppb, Censored,
prob.method = "kaplanmeier"))
data.frame(Mn = p.list.EPA$Order.Statistics, Censored = p.list.EPA$Censored,
CDF = p.list.EPA$Cumulative.Probabilities)
# Mn Censored CDF
#1 2.0 TRUE 0.21
#2 2.0 TRUE 0.21
#3 2.0 TRUE 0.21
#4 3.3 FALSE 0.28
#5 5.0 TRUE 0.28
#6 5.0 TRUE 0.28
#7 5.0 TRUE 0.28
#8 5.3 FALSE 0.32
#9 6.3 FALSE 0.36
#10 7.7 FALSE 0.40
#11 8.4 FALSE 0.44
#12 9.5 FALSE 0.48
#13 10.0 FALSE 0.52
#14 11.9 FALSE 0.56
#15 12.1 FALSE 0.60
#16 12.6 FALSE 0.64
#17 16.9 FALSE 0.68
#18 17.9 FALSE 0.72
#19 21.6 FALSE 0.76
#20 22.7 FALSE 0.80
#21 34.5 FALSE 0.84
#22 45.9 FALSE 0.88
#23 53.6 FALSE 0.92
#24 77.2 FALSE 0.96
#25 106.3 FALSE 1.00
#
# Clean up
#
rm(p.list.EPA)

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.
All documentation is copyright its authors; we didn't write any of that.