View source: R/model_mpin.ecm.R
mpin_ecm | R Documentation |
Estimates the multilayer probability of informed trading
(MPIN
) using an Expectation Conditional Maximization algorithm, as in
\insertCiteGhachem2022;textualPINstimation.
mpin_ecm(data, layers = NULL, xtraclusters = 4, initialsets = NULL,
..., verbose = TRUE)
data |
A dataframe with 2 variables: the first corresponds to buyer-initiated trades (buys), and the second corresponds to seller-initiated trades (sells). |
layers |
An integer referring to the assumed number of
information layers in the data. If the argument |
xtraclusters |
An integer used to divide trading days into
|
initialsets |
A dataframe containing initial parameter
sets for estimation of the |
... |
Additional arguments passed on to the function
|
verbose |
( |
The argument 'data' should be a numeric dataframe, and contain
at least two variables. Only the first two variables will be considered:
The first variable is assumed to correspond to the total number of
buyer-initiated trades, while the second variable is assumed to
correspond to the total number of seller-initiated trades. Each row or
observation correspond to a trading day. NA
values will be ignored.
The initial parameters for the expectation-conditional maximization
algorithm are computed using the function initials_mpin()
with
default settings. The factorization of the MPIN
likelihood function
used is developed by \insertCiteErsan2016;textualPINstimation, and
is implemented in fact_mpin()
.
The argument hyperparams
contains the hyperparameters of the ECM algorithm.
It is either empty or contains one or more of the following elements:
minalpha
(numeric
) It stands for the minimum share of days
belonging to a given layer, i.e., layers falling below this threshold are
removed during the iteration, and the model is estimated with a lower number
of layers. When missing, minalpha
takes the default value of 0.001
.
maxeval
: (integer
) It stands for maximum number of iterations of
the ECM algorithm for each initial parameter set. When missing, maxeval
takes the default value of 100
.
tolerance
(numeric
) The ECM algorithm is stopped when the
(relative) change of log-likelihood is smaller than tolerance. When
missing, tolerance
takes the default value of 0.001
.
criterion
(character
) It is the model selection criterion used to
find the optimal estimate for the MPIN
model. It take one of these values
"BIC"
, "AIC"
and "AWE"
; which stand for Bayesian Information
Criterion, Akaike Information Criterion and Approximate Weight of Evidence,
respectively \insertCiteAkogul2016PINstimation. When missing,
criterion
takes the default value of "BIC"
.
maxlayers
(integer
) It is the upper limit of number of layers used
for estimation in the ECM algorithm. If the argument layers
is missing,
the ECM algorithm will estimate MPIN
models for all layers in the integer
set from 1
to maxlayers
. When missing, maxlayers
takes the default
value of 8
.
maxinit
(integer
) It is the maximum number of initial sets used
for each individual estimation in the ECM algorithm. When missing, maxinit
takes the default value of 100
.
If the argument layers
is given, then the Expectation Conditional
Maximization algorithm will use the number of layers provided. If
layers
is omitted, the function mpin_ecm()
will simultaneously
optimize the number of layers as well as the parameters of the MPIN
model.
Practically, the function mpin_ecm()
uses the ECM algorithm to optimize
the MPIN
model parameters for each number of layers within the integer
set from 1
to 8
(or to maxlayers
if specified in the argument
hyperparams
); and returns the optimal model with the lowest Bayesian
information criterion (BIC) (or the lowest information criterion
criterion
if specified in the argument hyperparams
).
Returns an object of class estimate.mpin.ecm
.
# There is a preloaded quarterly dataset called 'dailytrades' with 60
# observations. Each observation corresponds to a day and contains the
# total number of buyer-initiated trades ('B') and seller-initiated
# trades ('S') on that day. To know more, type ?dailytrades
xdata <- dailytrades
# Estimate the MPIN model using the expectation-conditional maximization
# (ECM) algorithm.
# ------------------------------------------------------------------------ #
# Estimate the MPIN model, assuming that there exists 2 information layers #
# in the dataset #
# ------------------------------------------------------------------------ #
estimate <- mpin_ecm(xdata, layers = 2, verbose = FALSE)
# Show the estimation output
show(estimate)
# Display the optimal parameters from the Expectation Conditional
# Maximization algorithm
show(estimate@parameters)
# Display the global multilayer probability of informed trading
show(estimate@mpin)
# Display the multilayer probability of informed trading per layer
show(estimate@mpinJ)
# Display the first five rows of the initial parameter sets used in the
# expectation-conditional maximization estimation
show(round(head(estimate@initialsets, 5), 4))
# ------------------------------------------------------------------------ #
# Omit the argument 'layers', so the ECM algorithm optimizes both the #
# number of layers and the MPIN model parameters. #
# ------------------------------------------------------------------------ #
estimate <- mpin_ecm(xdata, verbose = FALSE)
# Show the estimation output
show(estimate)
# Display the optimal parameters from the estimation of the MPIN model using
# the expectation-conditional maximization (ECM) algorithm
show(estimate@parameters)
# Display the multilayer probability of informed trading
show(estimate@mpin)
# Display the multilayer probability of informed trading per layer
show(estimate@mpinJ)
# Display the first five rows of the initial parameter sets used in the
# expectation-conditional maximization estimation.
show(round(head(estimate@initialsets, 5), 4))
# ------------------------------------------------------------------------ #
# Tweak in the hyperparameters of the ECM algorithm #
# ------------------------------------------------------------------------ #
# Create a variable ecm.params containing the hyperparameters of the ECM
# algorithm. This will surely make the ECM algorithm take more time to give
# results
ecm.params <- list(tolerance = 0.0000001)
# If we suspect that the data contains more than eight information layers, we
# can raise the number of models to be estimated to 10 as an example, i.e.,
# maxlayers = 10.
ecm.params$maxlayers <- 10
# We can also choose Approximate Weight of Evidence (AWE) for model
# selection instead of the default Bayesian Information Criterion (BIC)
ecm.params$criterion <- 'AWE'
# We can also increase the maximum number of initial sets to 200, in
# order to obtain higher level of accuracy for models with high number of
# layers. We set the sub-argument 'maxinit' to `200`. Remember that its
# default value is `100`.
ecm.params$maxinit <- 200
estimate <- mpin_ecm(xdata, xtraclusters = 2, hyperparams = ecm.params,
verbose = FALSE)
# We can change the model selection criterion by calling selectModel()
estimate <- selectModel(estimate, "AIC")
# We get the mpin_ecm estimation results for the MPIN model with 2 layers
# using the slot models. We then show the first five rows of the
# corresponding slot details.
models <- estimate@models
show(round(head(models[[2]]@details, 5), 4))
# We can also use the function getSummary to get an idea about the change in
# the estimation parameters as a function of the number of layers in the
# MPIN model. The function getSummary returns a dataframe that contains,
# among others, the number of layers of the model, the number of layers in
# the optimal model,the MPIN value, and the values of the different
# information criteria, namely AIC, BIC and AWE.
summary <- getSummary(estimate)
# We can plot the MPIN value and the layers at the optimal model as a
# function of the number of layers to see whether additional layers in the
# model actually contribute to a better precision in the probability of
# informed trading. Remember that the hyperparameter 'minalpha' is
# responsible for dropping layers with "frequency" lower than 'minalpha'.
plot(summary$layers, summary$MPIN,
type = "o", col = "red",
xlab = "MPIN model layers", ylab = "MPIN value"
)
plot(summary$layers, summary$em.layers,
type = "o", col = "blue",
xlab = "MPIN model layers", ylab = "layers at the optimal model"
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.