In FelixTheStudent/ggpower: Package for power transformations, useful e.g. in ggplot

knitr::opts_chunk$set(
  fig.width = 8,
  collapse = TRUE,
  comment = "#>"
)

library(ggplot2)
library(ggpower)
library(scales)
# library(tidyverse)
library(cowplot)

The log-normal distribution plays an important role in biology (and everywhere else: finances, engineering, etc.) due to the Central Limith Theorem: if many random effects are added together, the result converges to a normal distribution, if they are multiplied it converges to a log-normal distribution.

The meanlog argument

Docu says that meanlog and sdlog are the mean and standard deviation of the distribution on the log scale, and here we look at what that means.

The log of a normal

If x is log-normally distributed, log(x) is normal with mean meanlog and variance $sdlog^2$:

x <- rlnorm(100000, meanlog = 2, sdlog = 1.5)
hist(log(x), 100, freq = F)
points(seq(-4,8, length.out = 50), dnorm(seq(-4, 8, length.out = 50), mean = 2, sd = 1.5),
       type="l")

This is obvious from the Lognormal's definition and helpful if log-transforming your data is an option. Let's look at building intuition for the meanlog argument in Lognormal distribution itself, i.e. the untransformed data.

meanlog is the logged median value

I compute mean, median and mode for given meanlog (mu1) and sdlog (s1) and plot it into the density function.

mu1 = 2; s1  = 0.5
lnorm_stats <- c(mode   = exp(mu1 - s1*s1),  # see e.g. wikipedia
                 median = exp(mu1),
                 mean   = exp(mu1+s1*s1/2))
# logarithmic spaced points on x-axis are not crucial here, they become useful
# when we do histograms, see below
xseq <- pmax(1e-5, exp(seq(log(1), log(25), length.out = 1000)))
p <- ggplot() +
  geom_line(aes(xseq, dlnorm(xseq, mu1, s1))) +
  geom_vline(aes(xintercept = lnorm_stats,
                 color = names(lnorm_stats)), linetype = "dashed") +
  xlab("x") + ggtitle(paste0("dlnorm(x, meanlog=", mu1, ", sdlog=", s1, ")")) +
  scale_color_hue(name = "lognorm stats",
                  labels = c("mean(x) = exp(meanlog+sdlog^2/2)",
                             "median(x) = exp(meanlog)",
                             "mode(x) = exp(meanlog-sdlog^2)")) +
  theme(legend.justification = c(1, 0), legend.position = c(.9, .5))

p

We see that meanlog has the most direct relationship with the expected median of a Lognormally distributed random variable X. Put differently, meanlog is the mean of log(X), not the mean of X itself. Or simply: meanlog is X's logged median value.

I'm repeating this on purpose because confusion arises easily when we log-transform the x-axis:

p+scale_x_log10() + theme(legend.justification = c(1, 0), legend.position = c(.4, .5))

On the log-scale, x looks like a normal with mean $meanlog-sdlog^2$. Note the difference to the above histogram of $log(x)$, which follows a normal distribution with mean $meanlog$.

Histogram of two log-normals

Currently, this visualization is the one I find the most useful (see lhist function):

x-axis is log-transformed (coord_trans, !not! scale_x)
y-axis is sqrt-transformed, otherwise you don't see the second distribution as the bins become wider, so high values get extremely low densities
both multiplied with .5, because they make up half the data

x <- rlnorm(100000, meanlog = c(-6,.4), sdlog = 1)
xseq <- pmax(1e-5, exp(seq(log(.9*min(x)), log(1.1*max(x)), length.out = 100)))

ggplot()+
  geom_histogram(data=data.frame(x=x), aes(x, stat(density)), breaks=xseq) +
  geom_line(aes(xseq, .5*dlnorm(xseq, -6, 1)), color="blue") +
  geom_line(aes(xseq, .5*dlnorm(xseq, .4, 1)), color="red") +
  coord_trans(x="log", y="sqrt")

MLE estimation

Wikipedia gives rather complicated and confusing estimators. I find the much simpler formulas:

meanlog = mean(log(x))
sdlog = mean(x-meanlog^2)

In the following master thesis:

Parameter Estimation for the Lognormal Distribution
Brenda Faith Ginos
2009
https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=2927&context=etd

With experiments I find that this works pretty well, as long as sdlog is > .7. Not sure why, that's just what I observe.

x <- rlnorm(1000, meanlog = 1.2, sdlog = .7)
xseq <- pmax(1e-5, exp(seq(log(.9*min(x)), log(1.1*max(x)), length.out = 100)))

m_estim <- mean(log(x))
s_estim <- mean((log(x)-m_estim)^2)
ggplot()+
  geom_histogram(data=data.frame(x=x), aes(x, stat(density)), breaks=xseq) +
  geom_line(aes(xseq, dlnorm(xseq, m_estim, s_estim)), color="red") +
  coord_trans(x="log", y="sqrt")

FelixTheStudent/ggpower documentation built on March 5, 2020, 8:37 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

FelixTheStudent/ggpower
Package for power transformations, useful e.g. in ggplot

In FelixTheStudent/ggpower: Package for power transformations, useful e.g. in ggplot

The meanlog argument

The log of a normal

meanlog is the logged median value

Histogram of two log-normals

MLE estimation

R Package Documentation

Browse R Packages

We want your feedback!

FelixTheStudent/ggpower Package for power transformations, useful e.g. in ggplot

In FelixTheStudent/ggpower: Package for power transformations, useful e.g. in ggplot

The meanlog argument

The log of a normal

meanlog is the logged median value

Histogram of two log-normals

MLE estimation

R Package Documentation

Browse R Packages

We want your feedback!

FelixTheStudent/ggpower
Package for power transformations, useful e.g. in ggplot