README.md

GEKS

This package is a modification of the FEWS package. Where the individual fixed effects calculation occurs, it has been replaced by a (modified to include imputed-tornqvist and jevons) GEKS calculation from IndexNumR: A Package for Index Number Calculation.

From the IndexNumR package it uses:

Installation

GEKS is still in development. For now it can be installed from GitHub using the following code


devtools::install_github("MjStansfi/GEKS_package")

#Other suggestions
# devtools::install_github("MjStansfi/FEWS_package")
# devtools::install_github("MjStansfi/TPDdecomp")
# devtools::install_github("grahamjwhite/IndexNumR")

# Once installed, the package can be loaded as usual
library(GEKS)
library(dplyr)
library(ggplot2)

Usage

The primary function provided by the GEKS package is the GEKS() function. Running ?GEKS() should give all the required information on how using the function. An example of running the GEKS() the function is shown below.

Example

As part of the package, a couple of datasets are provided, including the Turvery dataset as found in the Consumer Price Index Manual.

ggplot(turvey, aes(x = month, y = price)) + 
  geom_line(aes(color = commodity)) + 
  geom_point(aes(color = commodity))+
  theme_bw() +
  ggtitle("Artificial prices of seasonal products Data created by R. Turvey")

The GEKS is calculated below with a mean splice and a window length of 13 months. Note the difference being the FEWS function takes ‘weights’ while the GEKS function takes quantities.

turvey_GEKS <- GEKS(times = turvey$month,
                    price = turvey$price,
                    quantity = turvey$quantity,
                    id = turvey$commodity,
                    window_length = 13,
                    splice_pos = "mean",
                    index_method = "tornqvist",
                    num_cores = NULL)

The resulting index is displayed below


ggplot(turvey_GEKS$geks, aes(x = price_date, y = fe_indexes)) + 
  geom_line() + 
  theme_bw() +
  ggtitle("GEKS with mean splice for Turvey data")+
  ylab("Price Index") + 
  xlab("Date")

We can compare the FEWS and GEKS as follows

turvey_FEWS <-FEWS(times = turvey$month,
                    logprice = log(turvey$price),
                    id = turvey$commodity,
                    window_length = 13,
                    weight = turvey$price * turvey$quantity,
                    splice_pos = "geomean",
                    num_cores = NULL)

The resulting index is displayed below

ggplot(turvey_FEWS$fews, aes(x = price_date, y = fe_indexes, colour = "FEWS")) +
  geom_line() +
  geom_line(aes(x =turvey_GEKS$geks$price_date, y= turvey_GEKS$geks$fe_indexes, colour = "GEKS-tornqvist"))+
  theme_bw() +
  ggtitle("Different multilateral, with mean splice for Turvey data")+
  ylab("Price Index") +
  xlab("Date")

Imputation Törnqvist rolling year GEKS (ITRYGEKS)

Key functions in this package have been modified to calculate a Imputation Törnqvist rolling year GEKS (ITRYGEKS) index.

For this to work you need to employ the ‘features’ parameter of the GEKS function. Below we use this feature to calculate an index based on synthetic GfK data.

str(synthetic_gfk)
#> 'data.frame':    5509 obs. of  15 variables:
#>  $ month_num : int  0 1 2 3 4 5 6 7 8 9 ...
#>  $ char11    : Factor w/ 10 levels "brand_a","brand_b",..: 1 1 1 1 1 1 1 1 1 1 ...
#>  $ char1     : num  10.6 10.6 10.6 10.6 10.6 10.6 10.6 10.6 10.6 10.6 ...
#>  $ char2     : int  16006 16006 16006 16006 16006 16006 16006 16006 16006 16006 ...
#>  $ char3     : Factor w/ 19 levels "val_a","val_b",..: 18 18 18 18 18 18 18 18 18 18 ...
#>  $ char4     : Factor w/ 4 levels "val_a","val_b",..: 1 1 1 1 1 1 1 1 1 1 ...
#>  $ char5     : Factor w/ 5 levels "val_a","val_b",..: 1 1 1 1 1 1 1 1 1 1 ...
#>  $ char6     : Factor w/ 88 levels "FBOK6552","GR3000",..: 49 49 49 49 49 49 49 49 49 49 ...
#>  $ char7     : Factor w/ 3 levels "AAA","BBB","CCC": 3 3 3 3 3 3 3 3 3 3 ...
#>  $ char8     : Factor w/ 2 levels "100M","150D": 2 2 2 2 2 2 2 2 2 2 ...
#>  $ char9     : Factor w/ 4 levels "A100","A450",..: 3 3 3 3 3 3 3 3 3 3 ...
#>  $ char10    : Factor w/ 5 levels "abb","bbb","bhy",..: 5 5 5 5 5 5 5 5 5 5 ...
#>  $ prodid_num: int  3 3 3 3 3 3 3 3 3 3 ...
#>  $ quantity  : int  280 126 148 56 69 43 22 21 9 17 ...
#>  $ value     : int  196420 85312 95920 38552 47397 28303 14812 13701 6304 10651 ...

Now we wrangle this into the necessary format

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#Ensure that for each period there is only 1 observation of a given product
#ie sum all quantity and value for given month/product
synthetic_gfk <- synthetic_gfk%>%
  group_by(month_num,prodid_num)%>%
  mutate(quantity = sum(quantity),
         value = sum(value))%>%
  ungroup()%>%
  unique

#Calculate the unit value (price)
synthetic_gfk$uv <- synthetic_gfk$value/synthetic_gfk$quantity

#Extract data.frame containing features of interest
features <-synthetic_gfk[,grepl("char",colnames(synthetic_gfk))]

Once the dataframe is in the correct format we can run it through the main function with index_method set to ‘impute-tornqvist’.

ITRYGEKS_index <- GEKS(times = synthetic_gfk$month_num,
                    price = synthetic_gfk$uv,
                    quantity = synthetic_gfk$quantity,
                    id = synthetic_gfk$prodid_num,
                    features = features,
                    window_length = 13,
                    splice_pos = "mean",
                    index_method = "impute-tornqvist",
                    num_cores = NULL)
#> Number of windows: 14 
#>   |                                                                         |                                                                 |   0%  |                                                                         |=====                                                            |   7%  |                                                                         |=========                                                        |  14%  |                                                                         |==============                                                   |  21%  |                                                                         |===================                                              |  29%  |                                                                         |=======================                                          |  36%  |                                                                         |============================                                     |  43%  |                                                                         |================================                                 |  50%  |                                                                         |=====================================                            |  57%  |                                                                         |==========================================                       |  64%  |                                                                         |==============================================                   |  71%  |                                                                         |===================================================              |  79%  |                                                                         |========================================================         |  86%  |                                                                         |============================================================     |  93%  |                                                                         |=================================================================| 100%
#> 
#> FE model complete. Splicing results together
#> 
#> Finished. It took 34.79 seconds


including_first_window <- c(ITRYGEKS_index$fixed_effects$fe_indexes[1:12], #Take first 12 observations
                            ITRYGEKS_index$geks$fe_indexes*ITRYGEKS_index$fixed_effects$fe_indexes[13]) #Manually splice on at position 13(window length)

Lets run a standard tornqvist GEKS and FEWS to compare

GEKS_index <- GEKS(times = synthetic_gfk$month_num,
                    price = synthetic_gfk$uv,
                    quantity = synthetic_gfk$quantity,
                    id = synthetic_gfk$prodid_num,
                    features = NULL,
                    window_length = 13,
                    splice_pos = "geomean",
                    index_method = "tornqvist",
                    num_cores = NULL)
#> Number of windows: 14 
#>   |                                                                         |                                                                 |   0%  |                                                                         |=====                                                            |   7%  |                                                                         |=========                                                        |  14%  |                                                                         |==============                                                   |  21%  |                                                                         |===================                                              |  29%  |                                                                         |=======================                                          |  36%  |                                                                         |============================                                     |  43%  |                                                                         |================================                                 |  50%  |                                                                         |=====================================                            |  57%  |                                                                         |==========================================                       |  64%  |                                                                         |==============================================                   |  71%  |                                                                         |===================================================              |  79%  |                                                                         |========================================================         |  86%  |                                                                         |============================================================     |  93%  |                                                                         |=================================================================| 100%
#> 
#> FE model complete. Splicing results together
#> 
#> Finished. It took 0.8 seconds

FEWS_index <- FEWS(times = synthetic_gfk$month_num,
                    logprice = log(synthetic_gfk$uv),
                    id = synthetic_gfk$prodid_num,
                    window_length = 13,
                    weight =  synthetic_gfk$value,
                    splice_pos = "geomean",
                    num_cores = NULL)
#> Number of windows: 14 
#>   |                                                                         |                                                                 |   0%  |                                                                         |=====                                                            |   7%  |                                                                         |=========                                                        |  14%  |                                                                         |==============                                                   |  21%  |                                                                         |===================                                              |  29%  |                                                                         |=======================                                          |  36%  |                                                                         |============================                                     |  43%  |                                                                         |================================                                 |  50%  |                                                                         |=====================================                            |  57%  |                                                                         |==========================================                       |  64%  |                                                                         |==============================================                   |  71%  |                                                                         |===================================================              |  79%  |                                                                         |========================================================         |  86%  |                                                                         |============================================================     |  93%  |                                                                         |=================================================================| 100%
#> 
#> FE model complete. Splicing results together
#> 
#> Finished. It took 0.65 seconds
ggplot(ITRYGEKS_index$geks, aes(x =ITRYGEKS_index$geks$price_date, y = ITRYGEKS_index$geks$fe_indexes, colour = "ITRYGEKS")) +
  geom_line() +
  geom_line(aes(x =GEKS_index$geks$price_date, y= GEKS_index$geks$fe_indexes, colour = "GEKS-tornqvist"))+
  geom_line(aes(x =FEWS_index$fews$price_date, y= FEWS_index$fews$fe_indexes, colour = "FEWS"))+
  theme_bw() +
  ggtitle("Different multilateral, with mean splice for synthetic Gfk data")+
  ylab("Price Index") +
  xlab("Date")



MjStansfi/GEKS_package documentation built on May 12, 2021, 8:44 p.m.