zR- title: "paretoconv" author: "Mark Padgham" date: "r Sys.Date()" output: html_document: toc: true toc_float: true theme: flatly number_sections: true vignette: > %\VignetteIndexEntry{paretoconv} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8}


devtools::load_all (".", export_all=FALSE)
library (poweRlaw)

Code to set up ggplot2 defaults

library (showtext)
font_add_google("Bungee Shade", "Bungee")
font_add_google("IBM Plex Sans", "Plex")
#font_files()
#font_families()
showtext_auto()
library (ggplot2)
theme_update (strip.background = element_rect(colour = "white", fill = "white"),
              legend.position = c (0.15, 0.9),
              text = element_text (size = 14,
                                         family = "Plex"),
              legend.title = element_blank(),
              plot.title = element_text (size = 24,
                                         family = "Bungee",
                                         hjust = 0.5))
#theme_set (theme_grey()) # reset theme
library (ggplot2)
theme_update (strip.background = element_rect(colour = "white", fill = "white"),
              legend.position = c (0.1, 0.9),
              plot.title = element_text (size = 24,
                                         hjust = 0.5))
#theme_set (theme_grey()) # reset theme

The following code analyses the American civil war data included in the poweRlaw package data.


Generating cumulative distribution functions

Cumulative Distribution Functions (CDFs) are not appropriately fitted with conventional methods, because these assume an absolute lower limit, and simply normalise to (empirical) unit integral above that limit. This section implements the method described in the paper to properly scale CDFs.

If the maximal distance between observed ($y_i$) and modelled ($z_i$) values is $(z_a - y_a)$ for some particular $a$--where it is explicitly assumed that $z_a

y_a$---and the corresponding minimal distance is $(y_b - z_b)$ for some $b$, then there will exist some value, $k$, such that, \begin{equation} k z_a - y_a = y_b - k z_b, \end{equation} such that the observed values will extend both above and below the modelled values by the same extent. This extent will obviously represent the minimal possible value of the Kolgorov-Smirnov (KS) statistic, and will arise with a value of, \begin{equation} k = \frac {y_a + y_b} {z_a + z_b}. \end{equation} The value of $k$ requires initially determining the values of $a$ and $b$ at which the respective maxima arise. The modelled values are the multiplied by this value of $k$, and the values of $a$ and $b$ once again determined, along with corresponding KS statisics. These subsequent values of $a$ and $b$ may differ from the initial values, and the procedure must be repeated until the values of $a$ and $b$--the positions at which the maxima arise--converge.

Straight power law

The following fits a poweRlaw discrete power law object to the American casualities data.

data ("us_american")
x <- us_american$Cas
m_us <- displ$new(x)
#m$setPars (estimate_pars (m_us))
m_us$setXmin (estimate_xmin (m_us))
a <- m_us$pars
x0 <- m_us$xmin

freq <- as.vector (table (sort (x)))
cum_n <- rev (cumsum (rev (freq)))
cum_n <- cum_n / max (cum_n)

x <- sort (unique (x))
y <- (x0 / x) ^ (a - 1)

The observed and modelled values are then respectively represented by cum_n and y. The iterative procedure is implemented with the following code, which simply stops when the value of $k$ converges to one, because the maximal difference arises for the specified values of ${z_i}$ without any additional multiplication.

fit_ks_model <- function (yobs, ymod, k = 10)
{
    n <- 0
    while (abs (k - 1) > 1e-10)
    {
        i1 <- which.min (yobs - ymod)
        i2 <- which.max (yobs - ymod)
        k <- (yobs [i1] + yobs [i2]) / (ymod [i1] + ymod [i2])
        ymod <- ymod * k
        n <- n + 1
    }
    message ('converged in ', n, ' iterations')
    if (abs (max (yobs - ymod) - max (ymod - yobs)) > 1e-10)
        message ("Iterative fitting of KS statistic did not yield equal
                 estimates")
    list (fit = ymod, ks = max (yobs - ymod))
}
ft <- fit_ks_model (cum_n, y, k = 10)
ft$ks

The difference between this approach and the conventional approach used in poweRlaw can be seen in the following plot:

plot (m_us)
lines (m_us, col="red", lwd=1)
lines (x, ft$fit, col="blue", lty=2)

Convoluted power law

Follow the same procedure for the native American casualty data which reflect the convolution of multiple power-law processes.

data ("native_american")
x <- native_american$Cas
m_na <- displ$new(x)
m_na$setXmin (estimate_xmin (m_na))
a <- m_na$pars
x0 <- m_na$xmin

freq <- as.vector (table (sort (x)))
cum_n <- rev (cumsum (rev (freq)))
cum_n <- cum_n / max (cum_n)

x <- sort (unique (x))
y <- paretoconv (x, a=a, x0 = 3, n = 2, cdf=TRUE)
ft <- fit_ks_model (cum_n, y, k = 10)
ft$ks

And then the actual plot

plot (m_na)
lines (m_na, col="red", lwd=1)
lines (x, ft$fit, col="blue", lty=2)

Optimal paretoconv models

Optimal paretoconv models simply refer to values of $n$ and $x_0$. These have to be found using a form of discrete optimisation, currently implemented in the following function:

pareto_optimise (m_na, x0 = 1, n = 1, quiet = FALSE)
message ("checking whether non-convoluted version is optimal: NO\n",
         "calculating initial KS statistics ...\n",
         "iteration#1 -> (x0, n) = (2, 2)\n",
         "iteration#2 -> (x0, n) = (3, 2)\n")
res <- c (3, 2, 0.02782716)
names (res) <- c ("x0", "n", "KS")
print (res)

Revealing optimal values of $x_0 = 3$ and $n = 2$, with a KS distance of 0.02782716. What is needed is a discrete optimisation algorithm, which doens't actually exist in any R package

Discrete Optimisation

Because the maximum for most discrete cases is likely to be for fairly small values of n and x0, the optimisation problem is best solved by brute force. (Alternatives such as hard-coding a Newton-type algorithm are not really practical, because they all generate non-integer step sizes which can't be readily implemented.) A brute force gradient approach is implemented as pareto_optimise(), which repeatedly examines the $\pm1$ values around $(x_0, n)$, and discerns the values yielding the minimal KS statistics. The routine stops on convergence, yielding the following results:

system.time (
             x <- pareto_optimise (m_na, x0 = 1, n = 1, quiet = FALSE)
             )
x
x <- c (3, 2, 0.02782716)
names (x) <- c ('x0', 'n', 'KS')
tt <- c (17.583, 0.000, 17.602)
names (tt) <- c ("user", "system", "elapsed")
message ("checking whether non-convoluted version is optimal: NO\n",
         "calculating initial KS statistics ...\n",
         "iteration#1 -> (x0, n) = (2, 2)\n",
         "iteration#2 -> (x0, n) = (3, 2)\n")
tt; x

And the equivalent values for US casualities:

system.time (
             x <- pareto_optimise (m_us, x0 = 1, n = 1, quiet = FALSE)
             )
x
x [1:3] <- c (1, 1, 0.05824103)
tt [1:3] <- c (4.354, 0.000, 4.357)
message ("checking whether non-convoluted version is optimal: NO\n",
         "calculating initial KS statistics ...")
tt; x

Testing best model

The preceding KS distance metrics may be used to establish the best model, but that model still needs to be statistically tested against the observed data. This is done using synthetic data generated with rplconv(), which is simply poweRlaw::rpldis modified to generate random deviates across the full range including 0<x<xmin.

a <- m_na$getPars ()
n <- 2
nx <- 1000
xt <- rep (0, nx)
set.seed (0)
for (i in 1:n)
    xt <- xt + rplconv (n = nx, x0 = 3, alpha=a)
xt <- floor (xt / n)

A poweRlaw model then has to be generated for these data by first estimating the value of Xmin:

m_sy <- displ$new (xt)
m_sy$setXmin (poweRlaw::estimate_xmin (m_sy))
plot (m_sy)
lines (m_sy, col="red", lwd=2)

Again, the plot method of poweRlaw just does this:

x0 <- as.vector (table (sort (xt)))
y0 <- rev (cumsum (rev (x0)))
y0 <- y0 / max (y0)
plot (unique (sort (xt)), y0, pch=1, log="xy")

Proper testing requires values for x0 and n to be estimated for each set of simulated data

x <- pareto_optimise (m_sy, x0 = 3, n = 2, quiet = FALSE)
message ("checking whether non-convoluted version is optimal: NO\n",
         "calculating initial KS statistics ...\n",
         "iteration#1 -> (x0, n) = (2, 1)\n",
         "iteration#2 -> (x0, n) = (1, 2)\n")
x
junk <- c (1, 2, 0.0326965)
names (junk) <- c ('x0', 'n', 'KS')
junk

That demonstrates the principle of testing an observed model, as well as the fact that synthetic models often correspond to different values of x0 and n than the original models, and thus optimal models need to be fitted to synthetic data. This is time consuming, yet currently incorporated within the paretoconv function pparetoconv(), which generates a number of synthetic models (values of a, x0 and n), uses those to generate synthetic series (using rplconv()), and derives the probability of observing a more extreme KS-statistic than the observed value.

st <- Sys.time ()
pparetoconv (m_na, x0 = 3, n = 2, quiet = FALSE)
Sys.time () - st
message ("Generating simulated models\n",
         "Generating models until same model appears 4 times\n\n",
         "model#01: (x0, n) = (3, 1)\n",
         "model#02: (x0, n) = (3, 1)\n",
         "model#03: (x0, n) = (3, 1)\n",
         "model#04: (x0, n) = (1, 2)\n",
         "model#05: (x0, n) = (1, 3)\n",
         "model#05: (x0, n) = (1, 2)\n",
         "model#05: (x0, n) = (1, 2)\n",
         "model#06: (x0, n) = (1, 2)\n\n",
         "Generating synthetic series from 3 simulated models\n",
         "mod [3, 1] ..............................\n",
         "mod [1, 2] ........................................\n",
         "mod [1, 3] ..........\n",
         "Generated 80 synthetic KS statistics")
message ("[1] 0.9656372")
message ("Time difference of 3.479736 mins")

And that really does take quite a long time, yet finally confirms that the observed KL statistic can be generated by neutral models, which suggests it is indeed an accurate model of the underlying data.


Probability for non-convoluted model

The following code just demonstrates that the procedure works just as well for non-convoluted models of the type analysed with poweRlaw.

data ('moby', package='poweRlaw')
m <- poweRlaw::displ$new (moby)
m$setXmin (poweRlaw::estimate_xmin (m))
pareto_optimise (m, x0 = 1, n = 0, quiet = FALSE)
pparetoconv (m, x0 = 1, n = 0, quiet = FALSE)
message ("checking whether non-convoluted version is optimal: YES")
res <- c (1, 0, 0.0036373)
names (res) <- c ("x0", "n", "KS")
print (res)
message ("Generating simulated models\n",
         "Generating models until same model appears 4 times\n\n",
         "model#01: (x0, n) = (1, 0)\n",
         "model#02: (x0, n) = (1, 0)\n",
         "model#03: (x0, n) = (1, 0)\n",
         "model#04: (x0, n) = (1, 0)\n\n",
         "Generating synthetic series from 1 simulated models\n",
         "mod [1, 0] ........................................\n",
         "Generated 40 synthetic KS statistics")
res <- 1
res

Convolutions of hetergeneous distributions

Heterogeneous distributions can be modelled using poweRlaw::dpldis function (or dplcon for continuous distributions).

Pareto optimisation of poweRlaw simulated values

Note first that values simulated with rpldis yield models which can be recreated using pareto_optimise. The following code generates a power law distribution by adding three separate rpldis series with delta = (-1:1) / 2. Note that the addition of three series will result in values of 3 having the same distributional density as values of 1 in the original series. The final additive result thus has to have 2 subtracted to render it equivalent to the non-convoluted series.

library (poweRlaw)
alpha <- 3
xmin <- 1
len <- 1e4
delta <- (-1:1) / 2 # hard-coded values
x <- lapply (delta, function (i) rpldis (len, xmin = xmin, alpha = alpha + i))
x <- Reduce (`+`, x) - length (delta) + 1
m <- displ$new(x)
#m$setPars (estimate_pars (m))
m$setXmin (estimate_xmin (m))
plot (m)
lines (m, col = "red")
estimate_pars(m)$pars

This code produces realistic models, but the estimated parameters do not reproduce the stated value of alpha, which should be alpha + delta = -2.5. This is actually just because of the behaviour of both extrema. Removing some small number at the upper end, along with the lowest values of 1, transforms that estimated to the following:

for (i in 1:5)
    x <- x [x != max (x)]
x <- x [x > 1]
m <- displ$new(x)
m$setPars (estimate_pars (m))
message (alpha, " - ", max (delta), " = ", alpha - max (delta),
         " estimated as ", estimate_pars(m)$pars)

Wrap in a function to estimate the asymptotic slope as a function of delta, and apply to sequences of both 2 and 3 heterogeneous values (that is, O(2): [a - d, a + d] and O(3): [a - d, a, a + d]). Slopes are estimated in this function via the poweRlaw::estimate_pars() function.

library (magrittr)
get_slope1 <- function (xmin = 1, alpha = 3, delta = 0.5, len = 1e4)
{
    alpha0 <- alpha
    alpha <- c (alpha0 - delta, alpha0 + delta)
    get_plmod <- function (alpha, xmin, len)
    {
        x <- lapply (alpha, function (i)
                     rpldis (len, xmin = xmin, alpha = i))
        x <- Reduce (`+`, x) - length (alpha) + 1
        for (i in 1:5)
            x <- x [x != max (x)]
        x <- x [x > 1]
        m <- displ$new(x)
        -estimate_pars (m)$pars # negative value
    }
    res <- c (get_plmod (alpha, xmin, len),
              get_plmod (c (alpha0, alpha), xmin, len))
    names (res) <- c ("O2", "O3")
    return (res)
}
delta <- 1:10 / 10
s <- do.call (rbind, lapply (delta, function (i) get_slope1(delta = i)))
s <- data.frame (delta = delta,
                 theoretical = -3 + delta,
                 O2 = s [, 1],
                 O3 = s [, 2]) %>%
    tidyr::gather (key = "type", value = "slope", -delta)

ggplot(s, aes (x = delta, y = slope, colour = type)) +
    geom_line() +
    ggtitle ("Power law slopes")

The observed lines provide good estimates of the expected ("theoretical") values.

Function to generate heterogeneous convoluted power law distributions

The convolution can be generated with dpldis, with integral from https://stackoverflow.com/questions/23569133/adding-two-random-variables-via-convolution-in-r#23569980 CRAN version of poweRlaw currently removes all values below xmin from analysis, but this is fixed with my PR (25/01/18) on current dev version.

f <- function(x, xmin = 1, alpha = 3)
    dpldis(x, xmin = xmin, alpha = alpha, log = FALSE)
fz0 <- Vectorize (function(z) integrate (function (x, z) f (z - x) * f (x),
                              lower = -Inf, upper = Inf, z)$value)
fz2 <- Vectorize (function(z) integrate (function (x, z)
                                         f (z - x, alpha = 2.5) * f (x, alpha = 3.5),
                              lower = -Inf, upper = Inf, z)$value)
z <- 10:1000 / 10
y0 <- f (z)

yz0 <- fz0 (z)
yz0 <- yz0 * min (y0) / min (yz0 [yz0 > 0])
indx <- which (yz0 != 0)
yz2 <- fz2 (z)
yz2 <- yz2 * min (y0) / min (yz2 [yz2 > 0])
indx <- which (yz2 != 0)

yp <- paretoconv (x=z, a=3, n=1, x0 = 1)
yp <- yp * min (y0) / min (yp)
yz0 [yz0 == 0] <- NA
yz2 [yz2 == 0] <- NA
dat <- data.frame (z = z,
                   "noconv" = y0,
                   "delta00" = yz0,
                   "delta05" = yz2,
                   "ramsay" = yp) %>%
    tidyr::gather (key = "type", value = "y", -z)
ggplot (dat, aes (x = z, y = y, colour = type)) +
    scale_x_log10() +
    scale_y_log10() +
    geom_line() +
    scale_colour_discrete (breaks = c ("noconv", "delta00", "delta05", "ramsay"),
                           labels = c ("No conv",
                                       "conv (Delta = 0)",
                                       "conv (Delta = 0.5)",
                                       "conv (Ramsay)")) +
    theme (legend.position = c (0.8, 0.9)) +
    ggtitle ("Convoluted power-law distributions")
# reset NA values to 0
yz0 [is.na (yz0)] <- 0
yz2 [is.na (yz2)] <- 0

Although the heads of the dpldis curves do not really accord with the formal result of Colin Ramsay, the tails do seem to converge to the expected values. The following sub-section demonstrates how to estimate values of alpha from these tails.

Fitting tails of simulated values (1)

As explained by both Clauset and Gillepie, estimates of slopes based on linear regressions are statistically invalid. Nevertheless, for demonstration purposes only, we can see that,

indx <- which (z > 20)
summary (lm (log10 (yz0 [indx]) ~ log10 (z [indx])))$coefficients [2]
summary (lm (log10 (yp [indx]) ~ log10 (z [indx])))$coefficients [2]
summary (lm (log10 (yz2 [indx]) ~ log10 (z [indx])))$coefficients [2]

And the latter is as expected almost precisely 0.5 less than the other estimates of ~3. Wrap this in a function to estimate the slopes only:

get_slope <- function (xmin = 1, alpha = 3, delta = 0.5)
{
    f <- function(x, xmin = 1, alpha = 3)
        dpldis(x, xmin = xmin, alpha = alpha, log = FALSE)
    fz <- Vectorize (function(z) integrate (function (x, z)
                                             f (z - x, alpha = alpha - delta) *
                                                 f (x, alpha = alpha + delta),
                                             lower = -Inf, upper = Inf, z)$value)
    z <- 200:1000 / 10
    y0 <- f (z)

    yz <- fz (z)
    yz <- yz * min (y0) / min (yz [yz > 0])
    summary (lm (log10 (yz) ~ log10 (z)))$coefficients [2]
}
delta <- 0:10 / 10
s <- vapply (delta, function (i) get_slope (delta = i), numeric (1))
dat <- data.frame (delta = delta, observed = s, theoretical = -3 + delta) %>%
    tidyr::gather (key = "type", value = "slope", -delta)
ggplot (dat, aes (x = delta, y = slope, colour = type)) +
    geom_line() +
    ggtitle ("Power law slopes")
delta <- 0:10 / 10
s <-  c (-2.958965, -2.874606, -2.784791, -2.691580, -2.590871, -2.490464,
       -2.388093, -2.293019, -2.224518, -2.156698, -2.044593)
dat <- data.frame (delta = delta, observed = s, theoretical = -3 + delta) %>%
    tidyr::gather (key = "type", value = "slope", -delta)
ggplot (dat, aes (x = delta, y = slope, colour = type)) +
    geom_line() +
    ggtitle ("Power law slopes")

The following code implements a third-order convolution, but does not run because either maximum number of subdivisions reached or roundoff error was detected.

xmin <- 1
alpha <- 3
delta <- 0.5
f <- function(x, xmin = 1, alpha = 3)
    dpldis(x, xmin = xmin, alpha = alpha, log = FALSE)
fz2 <- Vectorize (function(z) integrate (function (x, z)
                                         f (z - x, alpha = alpha - delta) *
                                             f (x, alpha = alpha + delta),
                                         lower = -Inf, upper = Inf, z)$value)
fz3 <- Vectorize (function (z) integrate (function (x, z)
                                          fz2 (z - x) * f (x, alpha = alpha),
                                          lower = -Inf, upper = Inf, z)$value)
z <- 10:100 / 10
z <- 1:10
yz <- fz3 (z)

The O(2) results nevertheless produce the desired demonstration that second-order convolution of power law distributions accords precisely with the theoretical expectations from exponential functions, in that observed slopes are precisely biased by the value of $\Delta$.

Fitting tails of simulated values (2)

The unbiased ML estimator of the slope for discrete power laws is \begin{equation} \hat{\alpha} = 1 + n \left[\sum_{i = 1}^n {\rm ln} \frac{x_i}{x_{min} - \frac{1}{2}} \right]^{-1}. \end{equation} We begin by repeating the code used to generate the probability density series.

xmin <- 10
f <- function(x, xmin = 10, alpha = 3)
    dpldis(x, xmin = xmin, alpha = alpha, log = FALSE)
fz0 <- Vectorize (function(z) integrate (function (x, z) f (z - x) * f (x),
                              lower = -Inf, upper = Inf, z)$value)
fz2 <- Vectorize (function(z) integrate (function (x, z)
                                         f (z - x, alpha = 2.5) * f (x, alpha = 3.5),
                              lower = -Inf, upper = Inf, z)$value)
z <- 1:1000
y0 <- f (z)
yz0 <- fz0 (z)
yz2 <- fz2 (z)
yp <- paretoconv (x=z, a=3, n=1, x0 = xmin)

Then the actual estimator function

est_a <- function (z, y, xmin = 10)
{
    #m <- conpl$new (y [y > 0])
    #xmin <- estimate_xmin (m)$pars
    1 + sum (y) / sum (y * log (z / (xmin - 0.5)))
}
est_a (z, y0)
est_a (z, yz0)
est_a (z, yz2)
est_a (z, yp)
library (tippy)
tt <- paste0 ("<strong>Oh look</strong>, I'm a footnote, and i can go on and ",
              "on and get really long and go over lots of lines, and include ",
              "some extra stuff too.")
tippy("<strong style='color: green;'>Here's a footnote</strong>", 
      tooltip = tt,
      allowHTML = TRUE,
      animation = "scale",
      duration = 1000, 
      placement = "right",
      arrow = TRUE,
      size = "large",
      theme = "translucent"
)

Comparing Distributions (old stuff)

Clauset et. al (2009) present a justification for using log-likehihood ratios to compare model distributions, following Vuong (1989). Such calculations presume that candidate models have already been demonstrated to provide appropriate descriptions of observed data. These log-likelihood tests are thus not the appropriate tool for comparing different forms of convoluted power-law distributions.

In particular, log-likelihood ratios will always be maximised for the model with the highest mean (log-)probabilities, and without implementing an appropriate initial selection, this will always be the model with the lowest x0. In the present context, models can therefore only be compared using some objective metric of model performance, which Clauset et. al convincingly argue should be a simple maximal ('KL') distance bewteen observed and modelled distributions (not log-scaled).

Although therefore not required, the following code reproduces Gillespie's compare_distributions function, which requires an alternative distribution to be generated for these data:

#library (poweRlaw)
data ("native_american", package="poweRlaw")
x <- native_american$Cas
m1 <- poweRlaw::displ$new(x)
m1$setPars (poweRlaw::estimate_pars (m1))
m1$setXmin (poweRlaw::estimate_xmin (m1))
m2 <- poweRlaw::disexp$new (x)
m2$setPars (poweRlaw::estimate_pars (m2))
m2$setXmin (m1$getXmin ())
plot (m1)
lines (m1, col="red", lwd=2)
lines (m2, col="orange", lwd=2, lty=2)
comp <- poweRlaw::compare_distributions (m1, m2)
message ("test_statistic = ", comp$test_statistic, " (p = ", comp$p_two_sided, ")")

These values are then reproduced here using a stripped down version of compare_distributions, hard-coded so that d1 is a discrete power law (dpl), and d2 is an exponential. The function dist_pdf for a dpl is defined with parameters (m, q=NULL, log=FALSE), which are passed to dpldis (q[q >= m$xmin], m$xmin, m$pars, TRUE). The equivalent definition for an exponential is adapted straight from def_disexp.R. These versions are also hard-coded for log=TRUE, the value required for compare_distributions.

dpldis2 <- function(x, xmin, alpha) {
    x <- x[round(x) >= round(xmin)]
    xmin <- floor(xmin)
    constant <- VGAM::zeta(alpha)
    if(xmin > 1) 
        constant <- constant - sum ((1:(xmin - 1)) ^ (-alpha))

    -alpha * log (x) - log (constant)
}
expdis <- function (m, q=NULL) 
{
    xmin <- m$getXmin()
    pars <- m$getPars()
    l1 <- pexp (q - 0.5, pars, lower.tail=FALSE, log.p=TRUE)
    l2 <- pexp (q + 0.5, pars, lower.tail=FALSE, log.p=TRUE)

    l1 + log (1 - exp (l2 - l1)) - 
        pexp (xmin - 0.5, pars, lower.tail=FALSE, log.p=TRUE)
}
compare_distributions2 <- function(d1, d2) 
{
    xmin <- d1$getXmin()
    q <- d1$getDat()
    q <- q[q >= xmin]
    ll_ratio_pts <- dpldis2 (q, xmin, m1$getPars()) - expdis(d2, q)

    m <- mean(ll_ratio_pts); s <- sd(ll_ratio_pts)
    v <- sqrt(length(ll_ratio_pts))*m/s
    p1 <- pnorm(v)

    if (p1 < 0.5) {p2 <- 2*p1} else {p2 <- 2*(1-p1)}

    l <- list(test_statistic = v, 
              p_one_sided = 1 - p1, p_two_sided=p2, 
              ratio = data.frame(x=q, ratio=ll_ratio_pts))
    class(l) <- "compare_distributions"
    return(l)
}

Then demonstrate that the above values are reproduced using these definitions:

comp <- compare_distributions2 (m1, m2)
message ("test_statistic = ", comp$test_statistic, " (p = ", comp$p_two_sided, ")")


mpadge/paretoconv documentation built on March 9, 2020, 9:54 p.m.