Description Usage Arguments Details Value Examples
View source: R/LNConditionalMeans.R
sub_cmeans
replaces left censored values with estimated
conditional means. The means are conditioned upon the fact that the
observation was censored. All one knows about the value of a censored
observation is that the true value lies below a detection limit or other
threshold. The estimated conditional means use information on the number
of observations below the detection limit and the distribution of values
above the detection limit to estimate censored values.
1 | sub_cmeans(cc, flg, sz = 1000, max_samp = 10^6, start = c(1, 1))
|
cc |
A vector of data values, including both observed values, where they exist, or applicable detection limits, where data was censored. |
flg |
A vector of TRUE or FALSE values, of the same length as cc, that indicates which values are detection limits (TRUE) and which are measured values (FALSE). Detection limits for censored observations can differ. |
sz |
The target size of the (simulated) sample for calculating conditional means (default = 1000). |
max_samp |
The maximum size of the sample drawn to estimate of conditional means. A large value risks lengthy computation, especially for data with a small probability of non-detects (default = 1 million). |
start |
A list or vector containing parameters of the underlying uncensored probability distribution (e.g., mean and SD of the related normal distribution for the default lognormal distribution). This is used to initialize the numerical search for maximum likelihood estimates. A good starting value may help speed the convergence. If convergence is not achieved, consider providing a better starting point for the optimization. |
An assumption of the method, however, is that all observations come from a single underlying distribution.
Thus if the goal is to compare concentrations of contaminants from different populations, the correction should be applied separately to each population before conducting additional analyses.
These procedures may not be well suited for use where a covariate may alter conditional means. For example, where rainfall or river discharge has a large effect on concentrations of pollutants of interest, the assumption that all observations are drawn from a single lognormal distribution may be untenable. In practice, however, if censored observations are infrequent, the effect on further analyses is likely to be small, and these methods may still be preferable to making arbitrary choices about what value to use to replace non-detects.
The method simulates conditional means by drawing from a best fit lognormal distribution, selected based on maximum likelihood. Because the analysis is based on simulation, results will not be identical for subsequent runs.
A vector containing original uncensored values, and estimates of the conditional means (expected value) of censored observations.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | df = data.frame(sim = sort(stats::rlnorm(25,2,3)),
cens=c(rep.int(TRUE,5), rep.int(FALSE,20)))
df$sim[1:4] <- df$sim[5]
vals <- sub_cmeans(cc = df$sim, flg =df$cens)
library(ggplot2)
ggplot(df, aes(x = 1:25)) +
geom_line(aes(y = sim, color=cens)) +
geom_point(aes(y = vals)) +
scale_y_log10() +
scale_color_discrete(name = 'Censored') +
theme_minimal() +
xlab('Rank Order') +
ylab('Raw Data (Line) and Data with Substitutions (points)')
rm(vals)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.