View source: R/estimate_xmin.R
get_bootstrap_sims | R Documentation |
When fitting heavy tailed distributions, sometimes it is necessary to estimate the lower threshold, xmin. The lower bound is estimated by minimising the Kolmogorov-Smirnoff statistic (as described in Clauset, Shalizi, Newman (2009)).
get_KS_statistic
Calculates the KS statistic for a particular value of xmin.
estimate_xmin
Estimates the optimal lower cutoff using a
goodness-of-fit based approach. This function may issue warnings
when fitting lognormal, Poisson or Exponential distributions. The
warnings occur for large values of xmin
. Essentially, we are discarding
the bulk of the distribution and cannot calculate the tails to enough
accuracy.
bootstrap
Estimates the unncertainty in the xmin and parameter values via bootstrapping.
bootstrap_p
Performs a bootstrapping hypothesis test to determine
whether a suggested
(typically power law) distribution is plausible. This is only available for distributions that
have dist_rand
methods available.
get_bootstrap_sims(m, no_of_sims, seed, threads = 1)
bootstrap(
m,
xmins = NULL,
pars = NULL,
xmax = 1e+05,
no_of_sims = 100,
threads = 1,
seed = NULL,
distance = "ks"
)
get_bootstrap_p_sims(m, no_of_sims, seed, threads = 1)
bootstrap_p(
m,
xmins = NULL,
pars = NULL,
xmax = 1e+05,
no_of_sims = 100,
threads = 1,
seed = NULL,
distance = "ks"
)
get_distance_statistic(m, xmax = 1e+05, distance = "ks")
estimate_xmin(m, xmins = NULL, pars = NULL, xmax = 1e+05, distance = "ks")
m |
A reference class object that contains the data. |
no_of_sims |
number of bootstrap simulations. When |
seed |
default |
threads |
number of concurrent threads used during the bootstrap. |
xmins |
default |
pars |
default |
xmax |
default |
distance |
A string containing the distance measure (or measures) to calculate.
Possible values are |
When estimating xmin
for discrete distributions, the search space when
comparing the data-cdf (empirical cdf)
and the distribution_cdf runs from xmin to max(x)
where x
is the data set. This can often be
computationally brutal. In particular, when bootstrapping
we generate random numbers from the power law distribution,
which has a long tail.
To speed up computations for discrete distributions it is sensible to put an
upper bound, i.e. xmax
and/or explicitly give values of where to search, i.e. xmin
.
Occassionally bootstrapping can generate strange situations. For example,
all values in the simulated data set are less then xmin
. In this case,
the estimated distance measure will be Inf
and the parameter values, NA
.
There are other possible distance measures that can be calculated. The default is the
Kolomogorov Smirnoff statistic (KS
). This is equation 3.9 in the CSN paper. The
other measure currently available is reweight
, which is equation 3.11.
Adapted from Laurent Dubroca's code
###################################################
# Load the data set and create distribution object#
###################################################
x = 1:10
m = displ$new(x)
###################################################
# Estimate xmin and pars #
###################################################
est = estimate_xmin(m)
m$setXmin(est)
###################################################
# Bootstrap examples #
###################################################
## Not run:
bootstrap(m, no_of_sims=1, threads=1)
bootstrap_p(m, no_of_sims=1, threads=1)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.