knitr::opts_chunk$set( fig.width = 8, collapse = TRUE, comment = "#>" )
library(ggplot2) library(ggpower) library(scales) # library(tidyverse) library(cowplot)
The log-normal distribution plays an important role in biology (and everywhere else: finances, engineering, etc.) due to the Central Limith Theorem: if many random effects are added together, the result converges to a normal distribution, if they are multiplied it converges to a log-normal distribution.
Docu says that meanlog and sdlog are the mean and standard deviation of the distribution on the log scale, and here we look at what that means.
If x is log-normally distributed, log(x) is normal with mean meanlog and variance $sdlog^2$:
x <- rlnorm(100000, meanlog = 2, sdlog = 1.5) hist(log(x), 100, freq = F) points(seq(-4,8, length.out = 50), dnorm(seq(-4, 8, length.out = 50), mean = 2, sd = 1.5), type="l")
This is obvious from the Lognormal's definition and helpful if log-transforming your data is an option. Let's look at building intuition for the meanlog argument in Lognormal distribution itself, i.e. the untransformed data.
I compute mean, median and mode for given meanlog (mu1) and sdlog (s1) and plot it into the density function.
mu1 = 2; s1 = 0.5 lnorm_stats <- c(mode = exp(mu1 - s1*s1), # see e.g. wikipedia median = exp(mu1), mean = exp(mu1+s1*s1/2)) # logarithmic spaced points on x-axis are not crucial here, they become useful # when we do histograms, see below xseq <- pmax(1e-5, exp(seq(log(1), log(25), length.out = 1000))) p <- ggplot() + geom_line(aes(xseq, dlnorm(xseq, mu1, s1))) + geom_vline(aes(xintercept = lnorm_stats, color = names(lnorm_stats)), linetype = "dashed") + xlab("x") + ggtitle(paste0("dlnorm(x, meanlog=", mu1, ", sdlog=", s1, ")")) + scale_color_hue(name = "lognorm stats", labels = c("mean(x) = exp(meanlog+sdlog^2/2)", "median(x) = exp(meanlog)", "mode(x) = exp(meanlog-sdlog^2)")) + theme(legend.justification = c(1, 0), legend.position = c(.9, .5)) p
We see that meanlog has the most direct relationship with the expected median of a Lognormally distributed random variable X. Put differently, meanlog is the mean of log(X), not the mean of X itself. Or simply: meanlog is X's logged median value.
I'm repeating this on purpose because confusion arises easily when we log-transform the x-axis:
p+scale_x_log10() + theme(legend.justification = c(1, 0), legend.position = c(.4, .5))
On the log-scale, x looks like a normal with mean $meanlog-sdlog^2$. Note the difference to the above histogram of $log(x)$, which follows a normal distribution with mean $meanlog$.
Currently, this visualization is the one I find the most useful (see lhist function):
x <- rlnorm(100000, meanlog = c(-6,.4), sdlog = 1) xseq <- pmax(1e-5, exp(seq(log(.9*min(x)), log(1.1*max(x)), length.out = 100))) ggplot()+ geom_histogram(data=data.frame(x=x), aes(x, stat(density)), breaks=xseq) + geom_line(aes(xseq, .5*dlnorm(xseq, -6, 1)), color="blue") + geom_line(aes(xseq, .5*dlnorm(xseq, .4, 1)), color="red") + coord_trans(x="log", y="sqrt")
Wikipedia gives rather complicated and confusing estimators. I find the much simpler formulas:
In the following master thesis:
With experiments I find that this works pretty well, as long as sdlog is > .7. Not sure why, that's just what I observe.
x <- rlnorm(1000, meanlog = 1.2, sdlog = .7) xseq <- pmax(1e-5, exp(seq(log(.9*min(x)), log(1.1*max(x)), length.out = 100))) m_estim <- mean(log(x)) s_estim <- mean((log(x)-m_estim)^2) ggplot()+ geom_histogram(data=data.frame(x=x), aes(x, stat(density)), breaks=xseq) + geom_line(aes(xseq, dlnorm(xseq, m_estim, s_estim)), color="red") + coord_trans(x="log", y="sqrt")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.