sim_cmeans: Estimate the conditional mean of unobserved left censored...

Description Usage Arguments Details Value Examples

Description

sim_cmeans estimates conditional means of values from specified lognormal distribution (with specific parameters) that fall below (variable) cutoff values. Estimated means are determined via simulation.

Usage

1
sim_cmeans(lmu, lsigma, cutoff, sz = 1000, max_samp = 1e+06)

Arguments

lmu

The mean (on the log scale) of the lognormal distribution.

lsigma

The standard deviation of the lognormal distribution.

cutoff

Vector of detection limits (or observed values). Must be non-negative.

sz

The target size of the (simulated) sample for calculating conditional means (default = 1000).

max_samp

The maximum size of the sample drawn to estimate of conditional means. A large value risks lengthy computation, especially for data with a small probability of non-detects (default = 1 million).

Details

This functions in inefficient in the current context, as it simulates means for all values, not just censored values. That will make it slow for large data sets or high values of sz.

Users should also be aware that because the function estimates values by simulation, certain combinations of lmu, lsigma, cutoff and sz can result in very slow calculations.

Given a specific lognormal density (determined by lmu, lsigma), it is easy to estimate how many draws will be needed to sz values below cutoff. Currently, the function checks item by item to see if that number is large (over 500,000), and if it is, returns NA, with a warning. In principal that could generate many warnings and many NAs, but that has not been a problem for most practical problems, as it will arise only if the probability of a observation falling below the cutoff is very small. And if that is the case, it is unlikely you will have any non-detects.

The current approach simulates a large draw from the underlying uncensored lognormal distribution, and retains only values below the detection limits. The function estimates the size of the oversample needed, but because this is based on probabilistic reasoning, the initial sample is not guaranteed to always be large enough. If the initial draw is too small, additional values are drawn until the number of values below the cutoff exceeds sz. This can be quite slow.

Value

a vector of estimated conditional means. Note that this function provides conditional means for all observations, not only the censored ones. This is wasteful, and potentially confusing to users. See examples.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
df <- data.frame(sim = sort(stats::rlnorm(25,2,3)),
                cens=c(rep.int(TRUE,5), rep.int(FALSE,20)))
df$sim[1:4] <- df$sim[5]
est <- sim_cmeans(lmu = 2, lsigma = 3, cutoff = df$sim)
library(ggplot2)
ggplot(df, aes(x = 1:25)) +
geom_line(aes(y = sim, color=cens)) +
  geom_point(aes(y = est)) +
  scale_y_log10() +
  scale_color_discrete(name = 'Censored') +
  theme_minimal() +
  xlab('Rank Order') +
  ylab('Raw Data (Line) and Conditional Means (points)')
rm(est)

ccb60/LCensMeans documentation built on Oct. 30, 2020, 3:26 a.m.