sub_cmeans: Replace censored values with estimated conditional means

Description Usage Arguments Details Value Examples

View source: R/LNConditionalMeans.R

Description

sub_cmeans replaces left censored values with estimated conditional means. The means are conditioned upon the fact that the observation was censored. All one knows about the value of a censored observation is that the true value lies below a detection limit or other threshold. The estimated conditional means use information on the number of observations below the detection limit and the distribution of values above the detection limit to estimate censored values.

Usage

1
sub_cmeans(cc, flg, sz = 1000, max_samp = 10^6, start = c(1, 1))

Arguments

cc

A vector of data values, including both observed values, where they exist, or applicable detection limits, where data was censored.

flg

A vector of TRUE or FALSE values, of the same length as cc, that indicates which values are detection limits (TRUE) and which are measured values (FALSE). Detection limits for censored observations can differ.

sz

The target size of the (simulated) sample for calculating conditional means (default = 1000).

max_samp

The maximum size of the sample drawn to estimate of conditional means. A large value risks lengthy computation, especially for data with a small probability of non-detects (default = 1 million).

start

A list or vector containing parameters of the underlying uncensored probability distribution (e.g., mean and SD of the related normal distribution for the default lognormal distribution). This is used to initialize the numerical search for maximum likelihood estimates. A good starting value may help speed the convergence. If convergence is not achieved, consider providing a better starting point for the optimization.

Details

An assumption of the method, however, is that all observations come from a single underlying distribution.

Thus if the goal is to compare concentrations of contaminants from different populations, the correction should be applied separately to each population before conducting additional analyses.

These procedures may not be well suited for use where a covariate may alter conditional means. For example, where rainfall or river discharge has a large effect on concentrations of pollutants of interest, the assumption that all observations are drawn from a single lognormal distribution may be untenable. In practice, however, if censored observations are infrequent, the effect on further analyses is likely to be small, and these methods may still be preferable to making arbitrary choices about what value to use to replace non-detects.

The method simulates conditional means by drawing from a best fit lognormal distribution, selected based on maximum likelihood. Because the analysis is based on simulation, results will not be identical for subsequent runs.

Value

A vector containing original uncensored values, and estimates of the conditional means (expected value) of censored observations.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
df = data.frame(sim = sort(stats::rlnorm(25,2,3)),
                cens=c(rep.int(TRUE,5), rep.int(FALSE,20)))
df$sim[1:4] <- df$sim[5]
vals <- sub_cmeans(cc = df$sim, flg =df$cens)
library(ggplot2)
ggplot(df, aes(x = 1:25)) +
geom_line(aes(y = sim, color=cens)) +
  geom_point(aes(y = vals)) +
  scale_y_log10() +
  scale_color_discrete(name = 'Censored') +
  theme_minimal() +
  xlab('Rank Order') +
  ylab('Raw Data (Line) and Data with Substitutions (points)')
rm(vals)

ccb60/LCensMeans documentation built on Oct. 30, 2020, 3:26 a.m.