knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

Application of the exact statistical inference frequently leads to a non-standard probability distributions of the considered estimators or test statistics. Frequently, evaluation of the PDF, CDF, and/or the quantile function (QF) is possible from the CF. In many important situations, derivation of the CFs is more simple than derivation of the PDFs and/or CDFs.

In particular, CF of a linear combination, $Y = c_1X_1+\ldots+c_nX_n$, where $X_j$ are independent RVs with known CFs, say $\mathrm{cf}{X_j}(t)$, and coefficients $c_j$, is given by $$\mathrm{cf}{Y}(t)=\prod\limits_{j=1}^{n}\mathrm{cf}{X_j}(c_jt).$$ CF of a stochastic convolution (compound distribution), $Y = X_1+\ldots+X_N$, where $X_j$ are i.i.d. RVs with common $\mathrm{cf}_X(t)$ and $N$ is a discrete RV defined on non-negative integers with $\mathrm{cf}_N(t)$, is given by $$\mathrm{cf}{Y}(t)=\mathrm{cf}N(-i\mathrm{log}(\mathrm{cf}_X(t))).$$ CF of a weighted mixture of distributions, say $F=\sum\limits{j=1}^{n}w_jF_{X_j}$ with $\sum\limits_{j=1}^{n}w_j=1$ is $$\mathrm{cf}F(t)=\sum\limits{j=1}^{n}w_j\mathrm{cf}{X_j}(t).$$ The empirical characteristic function (ECF), based on the random sample $X_1,\ldots,X_n$, where $X_j\sim F$, is defined as $$\mathrm{cf}{\hat{F}n}(t)=\sum\limits{j=1}^{n}w_j\mathrm{cf}{X_j}(t)=\dfrac{1}{n}\sum\limits{j=1}^{n}e^{itX_j},$$ i.e. equally weighted mixture of the CFs of the Dirac random variables, $\mathrm{cf}_{X_j}(t)=e^{itX_j}$.

However, analytical derivation of the PDF and/or CDF by using the inverse Fourier transform is available only in special cases. Thus, in most practical situations, a numerical derivation of the PDF/CDF from the CF is an indispensable tool.

The Characteristic Functions Package (CharFunToolR) consists of a set of algorithms for evaluating selected characteristic funcions (CF) and algorithms for numerical inversion of the (combined and/or compound) characteristic functions, used to evaluate the probability density function (PDF) and the cumulative distribution function (CDF).

The package includes inversion algorithm based on simple trapezoidal rule for computing the integrals defined by the Gil-Pelaez formulae [@gil-pelaez_note_1951].

Numerical inversion of characteristic function

Let $Y$ be a random variable with known CF $\mathrm{cf}Y(t)$. Then, by the Fourier inversion theorem, the PDF of the random variable $Y$ is given by $$\mathrm{pdf}_Y(y)=\dfrac{1}{2\pi}\int{-\infty}^{\infty}e^{-ity}\mathrm{cf}Y(t)dt,\,\,\,y\in\mathbb{R}.$$ This exact inverse Fourier transform can be naturally approximated by the truncated integral form $$\mathrm{pdf}_Y(y)=\dfrac{1}{2\pi}\int{-T}^{T}e^{-ity}\mathrm{cf}_Y(t)dt,$$ where $T$ is a sufficiently large (real) value, and the integrand is a complex (oscillatory) function, which is assumed to decay to zero reasonably fast for $|t|\rightarrow\infty$ (as is typical for continuous distributions used in metrological applications). Note that the selection of the integration limit and the speed of integrand function decay contribute significantly to the truncation error of this approximation.

In general, the required integral can be evaluated by any suitable numerical quadrature method. Frequently, a simple trapezoidal rule gives fast and satisfactory results. However, caution is necessary if the integrand is a highly oscillatory function, what is the case if $|y|$ is a large value from the tailarea of the distribution, as e.g. the $99 \%$ quantile. In such situations a more advanced quadrature method should be used, typically in combination with efficient root-finding algorithms and algorithms for accelerated computing of the limit values of the sums of alternating series (not considered here), see e.g., [@sidi_user-friendly_1988], [@cohen_convergence_2000]. Note that the selection of the integration method significantly contributes to the integration error of this approximation.

Let us briefly describe one approach [see @witkovsky_numerical_2016 for more details] for numerical inversion of CF implemented in our package. More precisely it is a method based on Gil-Pelaez formulae [@gil-pelaez_note_1951].

The Gil-Pelaez inversion formulae

In [@gil-pelaez_note_1951], Gil-Pelaez derived the inversion formulae, suitable for numerical evaluation of the PDF and/or the CDF, which require integration of a real-valued function, only. The PDF is given by $$\mathrm{pdf}Y(y)=\dfrac{1}{\pi}\int{0}^{\infty}\mathcal{R}\left(e^{-ity}\mathrm{cf}Y(t)\right)dt.$$ Further, if $y$ is a continuity point of the cumulative distribution function of $Y$, the CDF is given by $$\mathrm{cdf}_Y(y)=\dfrac{1}{2}-\dfrac{1}{\pi}\int{0}^{\infty}\mathcal{I}\left(\dfrac{e^{-ity}\mathrm{cf}_Y(t)}{t}\right)dt.$$ By $\mathcal{R}\left(f(t)\right)$ and $\mathcal{I}\left(f(t)\right)$ we denote the real and imaginary part of the complex function $f(t)$, respectively.

In general, the integrals in formulas introduced above can be computed by any numerical quadrature method, possibly in combination with efficient root-finding algorithms and accelerated computing of limits of the alternating series, as considered, e.g., in [@sidi_numerical_1982] and [@waller_obtaining_1995]. For more details about numerical computations, efficient approximations of integrals, precision and successful previous implementations of Gil-Pelaez inversion formulae, see [@witkovsky_numerical_2016] or documentation of function cf2DistGP.

In our package function cf2DistGP implements this approach for numerical inversion of CF. You can see examples how to use the function in the following section.

The CharFunToolR package

In this section we present several classes of R functions from our package. We choose some functions as representatives, we give examples how to use particular functions in R and the other functions in the package from the same families of functions work similalrly.

Functions & examples

The first important class of functions in our package is CF Inversion Algorithm. Currently it involves one R function: cf2DistGP mentioned in introductory part of this manual. We have already explained shortly the main idea of algorithm which this function implements. So let us take a look at particular example in R about how to use cf2DistGP.

Imagine we would like to calculate CDF/PDF of $N(0,1)$ by inverting its CF. At least the cf input argument of cf2DistGP need to be specified. We connect package CharFunToolR and create a characteristic function for $N(0,1)$.

library(CharFunToolR)
cf <- function(t) exp(-t ^ 2 / 2)

The final step is to call cf2DistGP to calculate CDF/PDF.

result <- cf2DistGP(cf)
# to see the structure of the whole output you can run the following command 
# str(result)

Note that argument x (vector of values where the CDF/PDF is computed) of cf2DistGP(cf, x, prob, options) is set by default. Similarly options object (list) is created with some default values. As prob has not been specified, the quantiles are not calculated and therefore options$qf = NULL.

The next family of R functions in our package is Discrete Probability Distribution. It includes characteristic functions of chosen discrete probability distributions. We give three examples to illustrate what can be done in CharFunToolR with discrete probability distributions. We demonstrate ideas with function cfN_Binomial. In first example we want to compute values of CF of the Binomial distribution with the following parameters: the number of trials n = 25, probability of success p = 0.3. We choose $1001$ equally spaced points in interval $[-15,15]$, in that we want to know the values of CF.

n <- 25
p <- 0.3
t <- seq(-15, 15, length.out = 1001)

Input arguments of cfN_Binomial have been set so we can execute the code below. You can see that we use an auxiliary function plotReIm from CharFunToolR to plot the real and imaginary part of CF.

plotReIm(function(t)
        cfN_Binomial(t, n, p),
        t,
        title = "CF of the Binomial distribution with n = 25, p = 0.3")

The next example is about calculation of values of CF of the compound Binomial-Exponential distribution. As in the previous example we set values of input parameters n and p for Binomial distribution. We also set $\lambda$ parameter for Exponential distribution, define cfX as the characteristic function of Exponential distribution and choose $501$ equally spaced points in $[-10,10]$ in which we want to calculate values of CF of the compound Binomial-Exponential distribution. Note that we use cfX_Exponential function from CharFunToolR.

n <- 25
p <- 0.3
lambda <- 10
cfX <- function(t) cfX_Exponential(t, lambda)
t <- seq(-10, 10, length.out = 501)

All the input parameters have been set so we can run the final command to get the values of CF and to plot the CF.

plotReIm(function(t)
        cfN_Binomial(t, n, p, cfX),
        t,
        title = "CF of the compound Binomial-Exponential distribution")

In the third example we want to compute PDF/CDF of the compound Binomial-Exponential distribution using cf2DistGP function. We set the input parameters similarly as in the previous examples.

n <- 25
p <- 0.3
lambda <- 5
cfX <- function(t)
        cfX_Exponential(t, lambda)
cf <- function(t)
        cfN_Binomial(t, n, p, cfX)

We choose $101$ equally spaced points (x) in $[0,5]$ to calculate PDF/CDF. Probabilities (prob) are specified to calculate corresponding quantiles (Quantile Function - QF). It is necessary to specify for cf2DistGP that considered distribution is compound.

x <- seq(0, 5, length.out = 101)
prob <- c(0.9, 0.95, 0.99)
options <- list()
options$isCompound <- TRUE

Finally we execute cf2DistGP(cf, x, prob, options) to calculate PDF/CDF/QF of the compound Binomial-Exponential distribution. As you can see, the plots for PDF and CDF are created by default.

result <- cf2DistGP(cf, x, prob, options)
# str(result)

Another family of R functions in our package is Continuous Probability Distribution. It includes characteristic functions of chosen continuous probability distributions. We give two examples to illustrate some applications of mentioned class of R functions in CharFunToolR. We demonstrate ideas with function cf_Gamma. In first example we would like to calculate values of CF of a linear combination of independent Gamma RVs. So we need to specify coefficients of particular linear combination. We can also plot the coefficients.

coef <- 1 / (((1:50) - 0.5) * pi) ^ 2
plot(
        1:50,
        coef,
        xlab = "",
        ylab = "",
        type = "p",
        pch = 20,
        col = "blue",
        cex = 1,
        main = expression('Coefficients of the linear combination of GAMMA RVs')
)
lines(1:50, coef, col = "blue")

In the following step we specify parameters of Gamma RVs. Note that all RVS of considered linear combination have the same shape (alpha) and scale (beta) parameter. We also specify points (t) to calculate the values of CF.

alpha <- 5 / 2
beta <- 1 / 2
t <- seq(from = -10,
         to = 10,
         length.out = 201)

Now we can calculate the values of CF, plot the real and imaginary part of CF.

plotReIm(function(t)
        cf_Gamma(t, alpha, beta, coef),
        t,
        title = "Characteristic function of the linear combination of GAMMA RVs")

In the second example we want to compute PDF/CDF from the CF by cf2DistGP. To this purpose we consider the same linear combination of independent Gamma RVs as in the previous example.

alpha <- 5 / 2
beta <- 1 / 2
coef <- 1 / (((1:50) - 0.5) * pi) ^ 2
cf <- function(t)
        cf_Gamma(t, alpha, beta, coef)

We can specify some advanced options. Realizing that support of PDF/CDF is a non-negative part of real line we set options$xMin to $0$. We can controll the precision of numerical integration by setting options$N.

options <- list()
options$N <- 2 ^ 10
options$xMin <- 0

Finally we execute cf2DistGP(cf = cf, options = options) to get the numerical and graphical result of PDF/CDF.

result <- cf2DistGP(cf = cf, options = options)

Note that funtion cf_Gamma has one more optional input argument denoted niid. It stands for scalar convolution coeficient. It is possible to calculate CF of convolution. See documentation of cf_Gamma for more details.

In this last part of tutorial we present examples with function cfE_Empirical from family denoted Empirical Probability Distribution. cfE_Empirical is based on the following principle. Let $X_1, \ldots, X_n$ be i.i.d. random variables with the common distribution function $F_X$. The empirical distribution based on the random sample $X_1, \ldots, X_n$ is a mixture distribution of equally weighted degenerate Dirac distributions, concentrated at $X_1, \ldots, X_n$. Hence, the observed empirical characteristic function (ECF) is equally weighted mixture of the characteristic functions of the Dirac random variables concentrated at the observed values $x_j$ if $X_j$, i.e. mixture of CFs given by $cf_{x_j}(t)=e^{itx_j}$, $$cf_{\hat{F}{X}}(t)=\dfrac{1}{n}\sum\limits{j=1}^{n}e^{itx_j}.$$

Let us first introduce an example, in which we calculate the ECF of weighted mixture of independent Dirac variables from particular Normal, Student's t and Chi-square distribution.

set.seed(101)
n <- 1000
data <- c(rnorm(3 * n, 5, 0.2), rt(n, 3), rchisq(n, 1))
t <- seq(-50, 50, length.out = 2 ^ 10)
plotReIm(function(t)
        cfE_Empirical(t, data),
        t,
        title = "Empirical CF - CF of the mixture of Dirac random variables")

It is also possible to use cfE_Empirical to get the values of PDF/CDF of the compound empirical distribution. In the following example we compose the Binomial and Log Normal distibution. At first we calculate and plot the ECF of the compound empirical distribution.

set.seed(101)
lambda <- 25
nN <- 10
Ndata <- rpois(nN, lambda)

mu <- 0.1
sigma <- 2
nX <- 1500
Xdata <- rlnorm(nX, mu, sigma)
cfX <- function(t)
        cfE_Empirical(t, Xdata)
cf  <- function(t)
        cfE_Empirical(t, Ndata, cfX)
t <- seq(-0.2, 0.2, length.out = 2 ^ 10)
plotReIm(cf, t, title = "Compound Empirical CF")

In the final step we compute and plot PDF/CDF of the compound empirical distribution. We also get the values for chosen percentiles of that distribution.

x <- seq(0, 1000, length.out = 501)
prob <- c(0.9, 0.95)
options <- list()
options$N <- 2 ^ 10
options$SixSigmaRule <- 10
result <- cf2DistGP(cf, x, prob, options)

There some other families of R functions in CharFunToolR. For example Symmetric Probability Distribution, Circular Probability Distribution, but such functions belong to Continuous Probability Distribution family as well. You can find some auxiliar functions in our package. These are represented by families Special Function or Graphic Function. Such functions serve a special purpose. For example they are used in case of interpolation or when some specific (hypergeometric) function needs to be computed or they are used to create plots. We do not provide details of using such functions here, as they are beyond the main idea of CharFunToolR package. You can find some details in documentation of particular functions.

References



gajdosandrej/CharFunToolR documentation built on June 3, 2024, 7:46 p.m.