data_filtering: Filtering a data set for further price index calculations

View source: R/f_data_processing.R

data_filteringR Documentation

Filtering a data set for further price index calculations

Description

This function returns a filtered data set, i.e. a reduced user's data frame with the same columns and rows limited by a criterion defined by filters.

Usage

data_filtering(
  data,
  start,
  end,
  filters = c(),
  plimits = c(),
  pquantiles = c(),
  dplimits = c(),
  lambda = 1.25,
  interval = FALSE,
  retailers = FALSE
)

Arguments

data

The user's data frame with information about products to be filtered. It must contain columns: time (as Date in format: year-month-day, e.g. '2020-12-01'), prices (as positive numeric) and quantities (as positive numeric).

start

The base period (as character) limited to the year and month, e.g. "2020-03".

end

The research period (as character) limited to the year and month, e.g. "2020-04".

filters

A vector of filter names (options are: extremeprices, dumpprices and/or lowsales).

plimits

A two-dimensional vector of thresholds for minimum and maximum price change (it works if one of the chosen filters is extremeprices filter).

pquantiles

A two-dimensional vector of quantile levels for minimum and maximum price change (it works if one of the chosen filters is extremeprices filter).

dplimits

A two-dimensional vector of thresholds for maximum price drop and maximum ependiture drop (it works if one of the chosen filters is dumpprices filter).

lambda

The lambda parameter for lowsales filter (see References below).

interval

A logical value indicating whether the filtering process concerns only two periods defined by start and end parameters (then the interval is set to FALSE) or whether that function is to filter products sold during the whole time interval <start, end>, i.e. any subsequent months are compared.

retailers

A logical parameter indicating whether filtering should be done for each outlet (retID) separately. If it is set to FALSE, then there is no need to consider the retID column.

Value

This function returns a filtered data set (a reduced user's data frame). If the set of filters is empty, then the function returns the original data frame (defined by the data parameter) limited to considered months. On the other hand, if all filters are chosen, i.e. filters=c(extremeprices,dumpprices,lowsales), then these filters work independently and a summary result is returned. Please note that both variants of extremeprices filter can be chosen at the same time, i.e. plimits and pquantiles, and they work also independently.

References

Van Loon, K., Roels, D. (2018) Integrating big data in Belgian CPI. Meeting of the Group of Experts on Consumer Price Indices, Geneva.

Examples

data_filtering(milk,start="2018-12",end="2019-03",
filters=c("extremeprices"),pquantiles=c(0.01,0.99),interval=TRUE)
data_filtering(milk,start="2018-12",end="2019-03",
filters=c("extremeprices","lowsales"), plimits=c(0.25,2))

PriceIndices documentation built on July 9, 2023, 6:20 p.m.