Description Usage Arguments Details Value Examples
sim_cmeans
estimates conditional means of
values from specified lognormal distribution (with specific parameters) that
fall below (variable) cutoff values. Estimated means are determined via
simulation.
1 | sim_cmeans(lmu, lsigma, cutoff, sz = 1000, max_samp = 1e+06)
|
lmu |
The mean (on the log scale) of the lognormal distribution. |
lsigma |
The standard deviation of the lognormal distribution. |
cutoff |
Vector of detection limits (or observed values). Must be non-negative. |
sz |
The target size of the (simulated) sample for calculating conditional means (default = 1000). |
max_samp |
The maximum size of the sample drawn to estimate of conditional means. A large value risks lengthy computation, especially for data with a small probability of non-detects (default = 1 million). |
This functions in inefficient in the current context, as it simulates means
for all values, not just censored values. That will make it slow for large
data sets or high values of sz
.
Users should also be aware that because the function estimates values by
simulation, certain combinations of lmu, lsigma, cutoff
and
sz
can result in very slow calculations.
Given a specific lognormal density (determined by lmu, lsigma
), it is
easy to estimate how many draws will be needed to sz
values below
cutoff
. Currently, the function checks item by item to see if that
number is large (over 500,000), and if it is, returns NA, with a warning.
In principal that could generate many warnings and many NAs, but that has
not been a problem for most practical problems, as it will arise only if the
probability of a observation falling below the cutoff is very small. And if
that is the case, it is unlikely you will have any non-detects.
The current approach simulates a large draw from the underlying uncensored lognormal distribution, and retains only values below the detection limits. The function estimates the size of the oversample needed, but because this is based on probabilistic reasoning, the initial sample is not guaranteed to always be large enough. If the initial draw is too small, additional values are drawn until the number of values below the cutoff exceeds sz. This can be quite slow.
a vector of estimated conditional means. Note that this function provides conditional means for all observations, not only the censored ones. This is wasteful, and potentially confusing to users. See examples.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | df <- data.frame(sim = sort(stats::rlnorm(25,2,3)),
cens=c(rep.int(TRUE,5), rep.int(FALSE,20)))
df$sim[1:4] <- df$sim[5]
est <- sim_cmeans(lmu = 2, lsigma = 3, cutoff = df$sim)
library(ggplot2)
ggplot(df, aes(x = 1:25)) +
geom_line(aes(y = sim, color=cens)) +
geom_point(aes(y = est)) +
scale_y_log10() +
scale_color_discrete(name = 'Censored') +
theme_minimal() +
xlab('Rank Order') +
ylab('Raw Data (Line) and Conditional Means (points)')
rm(est)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.