sim_cmeans: Estimate the conditional mean of unobserved left censored...
In ccb60/LCensMeans: Calculate Condtional Means of Left Censored Data

Description Usage Arguments Details Value Examples

sim_cmeans estimates conditional means of values from specified lognormal distribution (with specific parameters) that fall below (variable) cutoff values. Estimated means are determined via simulation.

1	sim_cmeans(lmu, lsigma, cutoff, sz = 1000, max_samp = 1e+06)

`lmu`	The mean (on the log scale) of the lognormal distribution.
`lsigma`	The standard deviation of the lognormal distribution.
`cutoff`	Vector of detection limits (or observed values). Must be non-negative.
`sz`	The target size of the (simulated) sample for calculating conditional means (default = 1000).
`max_samp`	The maximum size of the sample drawn to estimate of conditional means. A large value risks lengthy computation, especially for data with a small probability of non-detects (default = 1 million).

This functions in inefficient in the current context, as it simulates means for all values, not just censored values. That will make it slow for large data sets or high values of sz.

Users should also be aware that because the function estimates values by simulation, certain combinations of lmu, lsigma, cutoff and sz can result in very slow calculations.

Given a specific lognormal density (determined by lmu, lsigma), it is easy to estimate how many draws will be needed to sz values below cutoff. Currently, the function checks item by item to see if that number is large (over 500,000), and if it is, returns NA, with a warning. In principal that could generate many warnings and many NAs, but that has not been a problem for most practical problems, as it will arise only if the probability of a observation falling below the cutoff is very small. And if that is the case, it is unlikely you will have any non-detects.

The current approach simulates a large draw from the underlying uncensored lognormal distribution, and retains only values below the detection limits. The function estimates the size of the oversample needed, but because this is based on probabilistic reasoning, the initial sample is not guaranteed to always be large enough. If the initial draw is too small, additional values are drawn until the number of values below the cutoff exceeds sz. This can be quite slow.

a vector of estimated conditional means. Note that this function provides conditional means for all observations, not only the censored ones. This is wasteful, and potentially confusing to users. See examples.

df <- data.frame(sim = sort(stats::rlnorm(25,2,3)),
                cens=c(rep.int(TRUE,5), rep.int(FALSE,20)))
df$sim[1:4] <- df$sim[5]
est <- sim_cmeans(lmu = 2, lsigma = 3, cutoff = df$sim)
library(ggplot2)
ggplot(df, aes(x = 1:25)) +
geom_line(aes(y = sim, color=cens)) +
  geom_point(aes(y = est)) +
  scale_y_log10() +
  scale_color_discrete(name = 'Censored') +
  theme_minimal() +
  xlab('Rank Order') +
  ylab('Raw Data (Line) and Conditional Means (points)')
rm(est)