Examples

The pinbasic packages is equipped with four synthetic datasets of daily buys and sells, BSinfrequent, BSfrequent, BSfrequent2015 and BSheavy. They represent infrequently, frequently and heavily traded equities, respectively. The datasets BSinfrequent, BSfrequent and BSheavy cover 60 trading days, whereas BSfrequent2015 contains simulated daily buys and sells for business days in 2015. The datasets can be loaded with data function.

data("BSinfrequent")
data("BSfrequent")
data("BSfrequent2015")
data("BSheavy")

summary(BSinfrequent)
summary(BSfrequent)
summary(BSfrequent2015)
summary(BSheavy)

For estimating the probability of informed trading $\pintext$ pin_est function can be utilized. Exemplary, $\pintext$ for BSheavy dataset is estimated.

# using default values for lower and upper bounds
# confidence interval computation enabled
pin_bsheavy <- pin_est(numbuys = BSheavy[,"Buys"],
                       numsells = BSheavy[,"Sells"], 
                       confint = TRUE, ci_control = list(n = 1000, seed = 123), 
                       posterior = TRUE)
pin_bsheavy <- readRDS("../RDSfiles/pin_bsheavy.rds")
# structure of returned list
str(pin_bsheavy)

# convert matrix to data.frame for prettier output in the vignette
as.data.frame(pin_bsheavy$Results)

If model parameter estimates and therefore estimates of $\pintext$ on a quarterly basis are required, qpin function which takes care of automatic dataset splitting is an appropriate choice.
The BSfrequent2015 dataset covers four quarters and a total of r nrow(BSfrequent2015) trading days. Dates of trading days are stored in its rownames, so they can be passed to the dates argument of qpin.

# dates stored in rownames of dataset
head(rownames(BSfrequent2015))
# quarterly PIN estimates
# confidence interval computation enabled:
#   * using only 1000 simulated datasets
#   * confidence level set to 0.9
#   * seed set to 287

qpin2015 <- qpin(numbuys = BSfrequent2015[,"Buys"],
                 numsells = BSfrequent2015[,"Sells"],
                 dates = as.Date(rownames(BSfrequent2015), format = "%Y-%m-%d"),
                 confint = TRUE, ci_control = list(n = 1000, level = 0.9, seed = 287))
qpin2015 <- readRDS("../RDSfiles/qpin2015.rds")
# list of length 4 is returned
names(qpin2015[["res"]])

# confidence intervals for all four quarters
ci_quarters <- lapply(qpin2015[["res"]], function(x) x$confint)
ci_quarters

# each list element has the same structure as results from pin_est function
# convert matrices to data.frames for prettier output in the vignette
qpin2015_res <- lapply(qpin2015[["res"]], function(x) as.data.frame(x$Results))

qpin2015_res[[1]]
qpin2015_res[[4]]

Results returned by qpin can be visualized with ggplot.

library(ggplot2)
ggplot(qpin2015[["res"]])

Datasets of daily buys and sells can be simulated with simulateBS function which offers three arguments. Values of model parameters can be set via param argument, to ensure reproducibility seed should be specified and ndays determines the number of trading which are simulated. The probability parameters $\probinfevent$ and $\probbadnews$ are used to sample trading days' conditions. Once the sequence of states is computed, number of buys and sells for each trading day are drawn from Poisson distributions with intensities according to the scenario tree presented in EHO model section.

We use the estimated parameters of pin_bsheavy to simulate data for 100 trading days.

# getting the estimates
heavy_est <- pin_bsheavy$Results[,"Estimate"]

# simulate buys and sells data
set.seed(123)
sim_heavy <- simulateBS(param = heavy_est, ndays = 100)

# summary of simulated data
summary(sim_heavy)

Computation of confidence intervals for $\pintext$ can either be enabled by confint argument of the optimization routines (confint = TRUE in pin_est_core, pin_est, qpin) or performed with pin_confint directly which incorporates simulateBS for simulation of n datasets with the given parameter vector param. MLE is done for each simulated dataset to receive a total of n $\pintext$ estimates. Quantiles, induced by level argument, of this series are calculated by quantile function from stats package.

We use the simulated sim_heavy data together with the corresponding parameter estimates to calculate a confidence interval for the probability of informed trading. In addition, we compare the execution times of single-core vs. parallel computation.^[ All computations were done on an Intel Core i5-4590 with four physical cores.] The higher the number of simulation runs n the more the computation benefits from parallel execution.

# n = 10000 simulation runs, 
# level = 0.95 (confidence level)

system.time(heavy_ci <- pin_confint(param = heavy_est, 
                                    numbuys = sim_heavy[,"Buys"],
                                    numsells = sim_heavy[,"Sells"],
                                    seed = 321, ncores = 1))
systime1 <- readRDS("../RDSfiles/systimeci1.rds")
heavy_ci <- readRDS("../RDSfiles/ci1.rds")
systime1
# same setting but 4 cpu cores
system.time(heavy_ci4 <- pin_confint(param = heavy_est, 
                                     numbuys = sim_heavy[,"Buys"],
                                     numsells = sim_heavy[,"Sells"],
                                     seed = 321, ncores = 4))
systime4 <- readRDS("../RDSfiles/systimeci4.rds")
heavy_ci4 <- readRDS("../RDSfiles/ci4.rds")
systime4
heavy_ci
heavy_ci4

Posterior probabilities of trading days' condition are returned by posterior and can be displayed with ggplot. Exemplary, we compute posteriors for BSheavy dataset and use corresponding parameter estimates stored in heavy_est.

# Calculating posterior probabilities
post_heavy <- posterior(param = heavy_est,
                        numbuys = BSheavy[,"Buys"], numsells = BSheavy[,"Sells"])

# Plotting                        
ggplot(post_heavy)

If x axis should show dates, names of numbuys and numsells need to be either in "%Y-%m-%d" or "%Y/%m/%d" format which can be converted with as.Date. The following code chunk shows how posterior probabilities for BSfrequent2015 in the third quarter can be visualized.

# Corresponding parameter estimates
freq_2015.3 <- qpin2015[["res"]]$'2015.3'$Results[,"Estimate"]

# Subsetting data
third_quarter <- subset(BSfrequent2015, subset = lubridate::quarter(rownames(BSfrequent2015)) == 3)

# Calculating posterior probabilities
post_third <- posterior(param = freq_2015.3, 
                        numbuys = third_quarter[,"Buys"], numsells = third_quarter[,"Sells"])

# Plotting
ggplot(post_third)


Try the pinbasic package in your browser

Any scripts or data that you put into this service are public.

pinbasic documentation built on May 2, 2019, 2:07 a.m.