tradescleanup: Cleans trade data

Description Usage Arguments Value Author(s) References Examples

Description

This is a wrapper function for cleaning the trade data of all stocks in "ticker" over the interval [from,to]. The result is saved in the folder datadestination. The function returns a vector indicating how many trades were removed at each cleaning step.

In case you supply the argument "rawtdata", the on-disk functionality is ignored and the function returns a list with the cleaned trades as xts object (see examples).

The following cleaning functions are performed sequentially: noZeroPrices, selectExchange, salesCondition, mergeTradesSameTimestamp.

Since the function rmTradeOutliers also requires cleaned quote data as input, it is not incorporated here and there is a seperate wrapper called tradesCleanupFinal.

Usage

1
2
tradesCleanup(from,to,datasource,datadestination,ticker,exchanges,
                tdataraw,report,selection,...)

Arguments

from

character indicating first date to clean, e.g. "2008-01-30".

to

character indicating last date to clean, e.g. "2008-01-31".

datasource

character indicating the folder in which the original data is stored.

datadestination

character indicating the folder in which the cleaned data is stored.

ticker

vector of tickers for which the data should be cleaned, e.g. ticker = c("AAPL","AIG")

exchanges

list of vectors of stock exchange(s) for all tickers in vector "ticker". It thus should have the same length as the vector ticker. E.g. in case of two stocks; exchanges = list("N", c("Q","T")). The possible exchange symbols are:

  • A: AMEX

  • N: NYSE

  • B: Boston

  • P: Arca

  • C: NSX

  • T/Q: NASDAQ

  • D: NASD ADF and TRF

  • X: Philadelphia

  • I: ISE

  • M: Chicago

  • W: CBOE

  • Z: BATS

tdataraw

xts object containing (ONE day and for ONE stock only) raw trade data. This argument is NULL by default. Enabling it means the arguments from, to, datasource and datadestination will be ignored. (only advisable for small chunks of data)

report

boolean and TRUE by default. In case it is true the function returns (also) a vector indicating how many trades remained after each cleaning step.

selection

argument to be passed on to the cleaning routine mergeTradesSameTimestamp. The default is "median".

...

additional arguments.

Value

For each day an xts object is saved into the folder of that date, containing the cleaned data. This procedure is performed for each stock in "ticker". The function returns a vector indicating how many trades remained after each cleaning step.

In case you supply the argument "rawtdata", the on-disk functionality is ignored and the function returns a list with the cleaned trades as xts object (see examples).

Author(s)

Jonathan Cornelissen and Kris Boudt

References

Barndorff-Nielsen, O. E., P. R. Hansen, A. Lunde, and N. Shephard (2009). Realized kernels in practice: Trades and quotes. Econometrics Journal 12, C1-C32.

Brownlees, C.T. and Gallo, G.M. (2006). Financial econometric analysis at ultra-high frequency: Data handling concerns. Computational Statistics & Data Analysis, 51, pages 2232-2245.

Falkenberry, T.N. (2002). High frequency data filtering. Unpublished technical report.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#Consider you have raw trade data for 1 stock for 1 day 
data("sample_tdataraw");
head(sample_tdataraw);
dim(sample_tdataraw);
tdata_afterfirstcleaning = tradesCleanup(tdataraw=sample_tdataraw,exchanges=list("N") );
tdata_afterfirstcleaning$report; 
barplot(tdata_afterfirstcleaning$report);
dim(tdata_afterfirstcleaning$tdata);

#In case you have more data it is advised to use the on-disk functionality
#via "from","to","datasource",etc. arguments

Example output

Loading required package: xts
Loading required package: zoo

Attaching package: 'zoo'

The following objects are masked from 'package:base':

    as.Date, as.Date.numeric

                    SYMBOL EX  PRICE    SIZE     COND CR  G127
2008-01-04 09:30:26 "XXX"  "N" "193.76" "345050" "O"  "0" "0" 
2008-01-04 09:30:27 "XXX"  "N" "193.82" "100"    "E"  "0" "0" 
2008-01-04 09:30:27 "XXX"  "N" "193.82" "400"    "E"  "0" "0" 
2008-01-04 09:30:27 "XXX"  "N" "193.82" "50"     "E"  "0" "0" 
2008-01-04 09:30:27 "XXX"  "N" "193.82" "50"     "E"  "0" "0" 
2008-01-04 09:30:27 "XXX"  "N" "193.82" "50"     "E"  "0" "0" 
Warning message:
timezone of object (GMT) is different than current timezone (). 
[1] 48484     7
      initial number       no zero prices      select exchange 
               48484                48479                20795 
     sales condition merge same timestamp 
               20135                 9105 
[1] 9105    7

highfrequency documentation built on May 2, 2019, 6:09 p.m.