website/docs/GettingStarted/Walkthrough.md

sidebar_position: 2

GeoLift Walkthrough

1. Data and R Environment

Set-up of the R Environment

After installing the GeoLift package, it's important to load it into our R session.

library(GeoLift)

Data

To show an end-to-end implementation of GeoLift we will use simulated data of 40 US cities across 90 days to: design a test, select test markets, run power calculations, and finally calculate the Lift caused by the campaign. As with every GeoLift test, we start analyzing pre-test historical information. We will use the data included in the GeoLift package.

data(GeoLift_PreTest)

The GeoLift_PreTest data set contains three variables:

  1. location (city)
  2. date (in “yyyy-mm-dd” format)
  3. Y (number of conversions/KPI in each day/location).

Every GeoLift experiment should contain at least these three variables that reflect when, where, and how much of the KPI was measured. Nevertheless, if you have more data available, you can include covariates to GeoLift to improve our results through the X parameter of all GeoLift functions.

The first step to run a GeoLift test is to read the data into the proper format using the GeoDataRead function.

    GeoTestData_PreTest <- GeoDataRead(data = GeoLift_PreTest,
                                        date_id = "date",
                                        location_id = "location",
                                        Y_id = "Y",
                                        X = c(), #empty list as we have no covariates
                                        format = "yyyy-mm-dd",
                                        summary = TRUE)
    #> ##################################
    #> #####       Summary       #####
    #> ##################################
    #>
    #> * Raw Number of Locations: 40
    #> * Time Periods: 90
    #> * Final Number of Locations (Complete): 40
    head(GeoTestData_PreTest)
    #>   location time    Y
    #> 1  atlanta    1 3384
    #> 2  atlanta    2 3904
    #> 3  atlanta    3 5734
    #> 4  atlanta    4 4311
    #> 5  atlanta    5 3686
    #> 6  atlanta    6 3374

This function analyzes the data set, handles locations with missing data, and returns a data frame with time-stamps instead of dates. In this case, since we’re inputting daily data each time unit represents a day.

Note: Before reading the data into this format, always make sure that there are no missing variables, NAs, or locations with missing time-stamps as those will be dropped by the GeoDataRead() function.

A good next step is to plot the panel data with GeoPlot to observe it’s trend, contribution per location, and also to detect any data anomalies before moving on to the data analysis.

GeoPlot(GeoTestData_PreTest,
        Y_id = "Y",
        time_id = "time",
        location_id = "location")

GeoPlot

In this case we see a similar pattern that’s shared across all locations. These structural similarities between regions are the key to a successful test!

2. Power Analysis

Running a prospective power analysis is fundamental prior to executing a test. It is only through a thorough statistical analysis of our data that we can set it up for success. In general, through the power analysis we can find:

Assessing the power and selecting the test markets for a GeoLift test can be accomplished through the GeoLiftMarketSelection() function. Through a series of simulations, this algorithm will find which are the best combinations of test and control locations for the experiment. Moreover, for each of these test market selections, the function will display the Minimum Detectable Effect, minimum investment needed to run a successful test, and other important model-fit metrics that will help us select the test that best matches our goals and resources.

The key parameters needed to run this function are:

Continuing with the example and in order to explore the function’s capabilities let’s assume we have two restrictions: Chicago must be part of the test markets and we have up to $100,000 to run the test. We can include these constraints into GeoLiftMarketSelection() with the include_markets and budget parameters and proceed with the market selection. Moreover, after observing that the historical KPI values in GeoPlot() have been stable across time we will proceed with a model with Fixed Effects. Finally, given a CPIC = $7.50 obtained from a previous Lift test, a range between two to five test markets, and a duration between 10 and 15 days we obtain:

    MarketSelections <- GeoLiftMarketSelection(data = GeoTestData_PreTest,
                                              treatment_periods = c(10,15),
                                              N = c(2,3,4,5),
                                              Y_id = "Y",
                                              location_id = "location",
                                              time_id = "time",
                                              effect_size = seq(0, 0.5, 0.05),
                                              lookback_window = 1,
                                              include_markets = c("chicago"),
                                              exclude_markets = c("honolulu"),
                                              holdout = c(0.5, 1),
                                              cpic = 7.50,
                                              budget = 100000,
                                              alpha = 0.1,
                                              Correlations = TRUE,
                                              fixed_effects = TRUE,
                                              side_of_test = "two_sided")
    ## Setting up cluster.
    ## Importing functions into cluster.
    ##
    ## Deterministic setup with 2 locations in treatment.
    ##
    ## Deterministic setup with 3 locations in treatment.
    ##
    ## Deterministic setup with 4 locations in treatment.
    ##
    ## Deterministic setup with 5 locations in treatment.
    ##   ID                                           location duration EffectSize
    ## 1  1             chicago, cincinnati, houston, portland       15       0.05
    ## 2  2                                  chicago, portland       15       0.10
    ## 3  3             chicago, cincinnati, houston, portland       10       0.10
    ## 4  4                                  chicago, portland       10       0.10
    ## 5  5                         chicago, houston, portland       10       0.10
    ## 6  6 chicago, cincinnati, houston, nashville, san diego       15       0.05
    ##   Power AvgScaledL2Imbalance Investment   AvgATT Average_MDE ProportionTotal_Y
    ## 1     1            0.1971864   74118.38 159.3627  0.04829913        0.07576405
    ## 2     1            0.1738778   64563.75 290.0071  0.10117316        0.03306537
    ## 3     1            0.1966996   99027.75 316.6204  0.09552879        0.07576405
    ## 4     1            0.1682310   43646.25 300.9401  0.10378013        0.03306537
    ## 5     1            0.2305628   75389.25 350.3142  0.10502968        0.05797087
    ## 6     1            0.2699167   95755.50 146.7975  0.04282215        0.09801138
    ##   abs_lift_in_zero   Holdout rank correlation
    ## 1            0.002 0.9242359    1   0.9144814
    ## 2            0.001 0.9669346    1   0.9321104
    ## 3            0.004 0.9242359    3   0.9144814
    ## 4            0.004 0.9669346    3   0.9321104
    ## 5            0.005 0.9420291    5   0.9139549
    ## 6            0.007 0.9019886    6   0.8992280

The results of the power analysis and market selection provide us with several key metrics that we can use to select our test market. These metrics are:

The results in MarketSelection show that the test markets with the best ranks are: (chicago, cincinnati, houston, portland) and (chicago, portland), both tied at rank 2. We can plot() both of these results to inspect them further. This plot will show how the results of the GeoLift() model would look like with the latest possible test period as well as the test’s power curve across all simulations.

    # Plot for chicago, cincinnati, houston, portland for a 15 day test
    plot(MarketSelections, market_ID = 1, print_summary = FALSE)

GeoPlot

    # Plot for chicago, portland for a 15 day test
    plot(MarketSelections, market_ID = 2, print_summary = FALSE)

GeoPlot

Power output - deep dive into power curves

In order to ensure that power is consistent throughout time for these locations, we can run more than 1 simulation for each of the top contenders that came out of GeoLiftMarketSelection.

We will do this by running the GeoLiftPower method and expanding our lookback_window to 7 days, only for this treatment combination and plot their results.

NOTE: You could repeat this process for the top 5 treatment combinations that come out of GeoLiftMarketSelection, with increased lookback windows and compare their power curves. We will do it only for Chicago and Portland here.

market_id = 2
market_row <- MarketSelections$BestMarkets %>% dplyr::filter(ID == market_id)
treatment_locations <- stringr::str_split(market_row$location, ", ")[[1]]
treatment_duration <- market_row$duration
lookback_window <- 7

power_data <- GeoLiftPower(
  data = GeoTestData_PreTest,
  locations = treatment_locations,
  effect_size = seq(-0.25, 0.25, 0.01),
  lookback_window = lookback_window,
  treatment_periods = treatment_duration,
  cpic = 7.5,
  side_of_test = "two_sided"
)
#> Setting up cluster.
#> Importing functions into cluster.

plot(power_data, show_mde = TRUE, smoothed_values = FALSE, breaks_x_axis = 5) +
    labs(caption = unique(power_data$location))

GeoLiftPowerTopContenders

While both market selections perform excellent on all metrics, we will move further with the latter since it allows us to run a successful test with a smaller budget. Finally, changing the print_summary parameter of plot() to TRUE can provide us with additional information about this market selection.

    # Plot for chicago, portland for a 15 day test
    plot(MarketSelections, market_ID = 2, print_summary = TRUE)
    ## ##################################
    ## #####   GeoLift Simulation   #####
    ## ####  Simulating: 10% Lift  ####
    ## ##################################
    ##
    ## GeoLift Results Summary
    ## ##################################
    ## #####     Test Statistics    #####
    ## ##################################
    ##
    ## * Average ATT: 290.007
    ## * Percent Lift: 10.1%
    ## * Incremental Y: 8700
    ## * P-value: 0
    ##
    ## ##################################
    ## #####   Balance Statistics   #####
    ## ##################################
    ##
    ## * L2 Imbalance: 868.598
    ## * Scaled L2 Imbalance: 0.1739
    ## * Percent improvement from naive model: 82.61%
    ## * Average Estimated Bias: NA
    ##
    ## ##################################
    ## #####     Model Weights      #####
    ## ##################################
    ##
    ## * Prognostic Function: NONE
    ##
    ## * Model Weights:
    ##  * cincinnati: 0.2429
    ##  * miami: 0.2056
    ##  * baton rouge: 0.1511
    ##  * honolulu: 0.0669
    ##  * dallas: 0.0644
    ##  * nashville: 0.0641
    ##  * minneapolis: 0.0619
    ##  * san diego: 0.0394
    ##  * houston: 0.0292
    ##  * austin: 0.0237
    ##  * new york: 0.0216
    ##  * los angeles: 0.0179
    ##  * reno: 0.0113

GeoPlot

Note: Given that we are not using the complete pre-treatment data to calculate the weights in our power analysis simulations, the ones displayed by the plotting function above are not the final values. However, you can easily obtain them with the GetWeights() function.

    weights <- GetWeights(Y_id = "Y",
                          location_id = "location",
                          time_id = "time",
                          data = GeoTestData_PreTest,
                          locations = c("chicago", "portland"),
                          pretreatment_end_time = 90,
                          fixed_effects = TRUE)
    #> One outcome and one treatment time found. Running single_augsynth.

    # Top weights
    head(dplyr::arrange(weights, desc(weight)))
    #>      location     weight
    #> 1  cincinnati 0.22717797
    #> 2       miami 0.20276981
    #> 3 baton rouge 0.13353743
    #> 4 minneapolis 0.08997973
    #> 5      dallas 0.07392298
    #> 6   nashville 0.06853184

3. Analyzing the Test Results

Based on the results of the Power Calculations, a test is set-up in which a 15-day marketing campaign will be executed in the cities of Chicago and Portland while the rest of the locations will be put on holdout. Following the completion from this marketing campaign, we receive sales data that reflects these results. This new data-set contains the same format and information as the pre-test one but crucially includes results for the duration of the campaign. Depending on the vertical and product, adding a post-campaign cooldown period might be useful.

Test Data

Data for the campaign results can be accessed at GeoLift_Test.

data(GeoLift_Test)

Similarly to the process executed at the beginning of the Power Analysis phase, we read the data into GeoLift’s format using the GeoDataRead function. You can observe in the summary output that additional 15 periods are contained in the new GeoLift data object.

    GeoTestData_Test <- GeoDataRead(data = GeoLift_Test,
                                        date_id = "date",
                                        location_id = "location",
                                        Y_id = "Y",
                                        X = c(), #empty list as we have no covariates
                                        format = "yyyy-mm-dd",
                                        summary = TRUE)
    #> ##################################
    #> #####       Summary       #####
    #> ##################################
    #>
    #> * Raw Number of Locations: 40
    #> * Time Periods: 105
    #> * Final Number of Locations (Complete): 40
``` r
    head(GeoTestData_Test)
    #>   location time    Y
    #> 1  atlanta    1 3384
    #> 2  atlanta    2 3904
    #> 3  atlanta    3 5734
    #> 4  atlanta    4 4311
    #> 5  atlanta    5 3686
    #> 6  atlanta    6 3374

The results can also be plotted using the GeoPlot function. However, for post-campaign data we can include the time-stamp at which the campaign started through the treatment_start parameter to clearly separate the two periods. Plotting the time-series is always useful to detect any anomalies with the data and to start noticing patterns with the test.

GeoPlot(GeoTestData_Test,
        Y_id = "Y",
        time_id = "time",
        location_id = "location",
        treatment_start = 91)

GeoPlotTestData

GeoLift Inference

The next step in the process is to calculate the actual Lift caused by the marketing campaigns on our test locations. To do so we make use of the GeoLift() function, which will take as input the GeoLift dataframe as well as information about the test such as which were the cities in the treatment group, when the test started, and when it ended through the locations, treatment_start_time, and treatment_end_time parameters respectively.

    GeoTest <- GeoLift(Y_id = "Y",
                        data = GeoTestData_Test,
                        locations = c("chicago", "portland"),
                        treatment_start_time = 91,
                        treatment_end_time = 105)
    GeoTest
    #> One outcome and one treatment time found. Running single_augsynth.
    #>
    #> GeoLift Output
    #>
    #> Test results for 15 treatment periods, from time-stamp 91 to 105 for test markets:
    #> 1 CHICAGO
    #> 2 PORTLAND
    #> ##################################
    #> #####     Test Statistics    #####
    #> ##################################
    #>
    #> Percent Lift: 5.4%
    #>
    #> Incremental Y: 4667
    #>
    #> Average Estimated Treatment Effect (ATT): 155.556
    #>
    #> The results are significant at a 95% level. (TOTAL)
    #>
    #> There is a 0.6% chance of observing an effect this large or larger assuming treatment effect is zero.

The results show that the campaigns led to a 5.4% lift in units sold corresponding to 4667 incremental units for this 15-day test. Moreover, the Average Estimated Treatment Effect is of 155.556 units every day of the test. Most importantly, we observe that these results are statistically significant at a 95% level. In fact, there’s only a 0.6% chance of observing an effect of this magnitude or larger if the actual treatment effect was zero. In other words, it is extremely unlikely that these results are just due to chance. To dig deeper into the results, we can run the summary() of our GeoLift object.

    summary(GeoTest)
    #>
    #> GeoLift Results Summary
    #> ##################################
    #> #####     Test Statistics    #####
    #> ##################################
    #>
    #> * Average ATT: 155.556
    #> * Percent Lift: 5.4%
    #> * Incremental Y: 4667
    #> * P-value: 0.01
    #>
    #> ##################################
    #> #####   Balance Statistics   #####
    #> ##################################
    #>
    #> * L2 Imbalance: 909.489
    #> * Scaled L2 Imbalance: 0.1636
    #> * Percent improvement from naive model: 83.64%
    #> * Average Estimated Bias: NA
    #>
    #> ##################################
    #> #####     Model Weights      #####
    #> ##################################
    #>
    #> * Prognostic Function: NONE
    #>
    #> * Model Weights:
    #>  * austin: 0.0465
    #>  * baton rouge: 0.1335
    #>  * cincinnati: 0.2272
    #>  * dallas: 0.0739
    #>  * honolulu: 0.0673
    #>  * houston: 0.0046
    #>  * miami: 0.2028
    #>  * minneapolis: 0.09
    #>  * nashville: 0.0685
    #>  * new york: 0.0046
    #>  * reno: 0.0306
    #>  * san antonio: 0.0054
    #>  * san diego: 0.0451

The summary show additional test statistics such as the p-value which was equal to 0.01 confirming the highly statistical significance of these results. Moreover, the summary function provides Balance Statistics which display data about our model’s fit. The main metric of model fit used in GeoLift is the L2 Imbalance which represents how far our synthetic control was from the actual observed values in the pre-treatment period. That is, how similar the synthetic Chicago + Portland unit we crated is from the observed values of these cities in the period before the intervention. A small L2 Imbalance score means that our model did a great job replicating our test locations while a large one would indicate a poor fit. However, the L2 Imabalnce metric is scale-dependent, meaning that it can’t be compared between models with different KPIs or number of testing periods. For instance, the L2 Imbalance of a model run on grams of units sold will be significantly larger than a model ran for tons of product sold even if they represent the same basic underlying metric.

Therefore, given that it’s hard to tell whether the model had a good or poor fit by simply looking at the value of the L2 Imbalance metric, we also included the Scaled L2 Imbalance stat which is easier to interpret as it’s bounded in the range between 0 and 1. A value close to zero represents a good model fit while values nearing 1 indicate a poor performance by the Synthetic Control Model. This scaling is accomplished by comparing the Scaled L2 Imbalance of our Synthetic Control Method with the Scaled L2 Imbalance obtained by a baseline/naive model (instead of carefully calculating which is the optimal weighting scheme for the Synthetic Control, we assign equal weights to each unit in the donor pool). The latter provides an upper bound of L2 Imbalance, therefore, the Scaled L2 Imbalance shows us how much better our GeoLift model is from the baseline.

In fact, another way to look at the Scaled L2 Imbalance is the percent improvement from a naive model which can be obtained by subtracting our model’s Scaled L2 Imbalance from 100%. In this case, an improvement close to 100% (which corresponds to a Scaled L2 Imbalance close to zero) represents a good model fit. Finally, we also include the weights that generate our Synthetic Control. In this test we note that the locations that contribute the most to our GeoLift model are Cincinnati, Miami, and Baton Rouge.

    plot(GeoTest, type = "Lift")
    #> You can include dates in your chart if you supply the end date of the treatment. Just specify the treatment_end_date parameter.

LiftPlot

Plotting the results is a great way to assess the model’s fit and how effective the campaign was. Taking a close look at the pre-treatment period (period before the dotted vertical line) provides insight into how well our Synthetic Control Model fitted our data. In this specific example, we see that the observed values of the Chicago + Portland test represented in the solid black line were closely replicated by our SCM model shown as the dashed red line. Furthermore, looking at the test period, we can notice the campaign’s incrementality shown as the difference between the sales observed in the test markets and our counterfactual synthetic location. This marked difference between an almost-exact match in pre-treatment periods and gap in test time-stamps provides strong evidence of a successful campaign.

    plot(GeoTest, type = "ATT")
    #> You can include dates in your chart if you supply the end date of the treatment. Just specify the treatment_end_date parameter.

PlotATT

Looking at the Average Estimated Treatment Effect’s plot can also be extremely useful. The ATT metric shows us the magnitude of the Average Treatment Effect on a daily basis in contrast with the previous (Lift) plot which focused on aggregated effects. Moreover, this is a great example of a good GeoLift model as it has very small ATT values in the pre-treatment period and large ones when the treatment is administered to our test locations. Moreover, point-wise confidence intervals are included in this chart which help us measure how significant each day’s Lift has been.

Improving The Model

While the results obtained from the test are robust and highly significant, a useful feature of GeoLift is its ability to improve the model fit even further and reduce bias through augmentation by a prognostic function. There are several options for augmentation of the standard GeoLift model such as regularization (specifically Ridge) and an application of Generalized Synthetic Control Model (GSC). While each of these approaches provide it’s own set of advantages, for instance Ridge regularization usually performs well when the number of units and time-periods isn’t large while GSC helps improve fit for situations with many pre-treatment periods, GeoLift offers the option to let the model decide which is the best approach by setting the model parameter to “best”.

    GeoTestBest <- GeoLift(Y_id = "Y",
                            data = GeoTestData_Test,
                            locations = c("chicago", "portland"),
                            treatment_start_time = 91,
                            treatment_end_time = 105,
                            model = "best")
    #> One outcome and one treatment time found. Running single_augsynth.
    #> One outcome and one treatment time found. Running single_augsynth.
    #> One outcome and one treatment time found. Running single_augsynth.
    #> One outcome and one treatment time found. Running single_augsynth.
    #>
    #> GeoLift Output
    #>
    #> Test results for 15 treatment periods, from time-stamp 91 to 105 for test markets:
    #> 1 CHICAGO
    #> 2 PORTLAND
    #> ##################################
    #> #####     Test Statistics    #####
    #> ##################################
    #>
    #> Percent Lift: 5.5%
    #>
    #> Incremental Y: 4704
    #>
    #> Average Estimated Treatment Effect (ATT): 156.805
    #>
    #> The results are significant at a 95% level. (TOTAL)
    #>
    #> There is a 1.4% chance of observing an effect this large or larger assuming treatment effect is zero.
    summary(GeoTestBest)
    #>
    #> GeoLift Results Summary
    #> ##################################
    #> #####     Test Statistics    #####
    #> ##################################
    #>
    #> * Average ATT: 156.805
    #> * Percent Lift: 5.5%
    #> * Incremental Y: 4704
    #> * P-value: 0.01
    #>
    #> ##################################
    #> #####   Balance Statistics   #####
    #> ##################################
    #>
    #> * L2 Imbalance: 903.525
    #> * Scaled L2 Imbalance: 0.1626
    #> * Percent improvement from naive model: 83.74%
    #> * Average Estimated Bias: -1.249
    #>
    #> ##################################
    #> #####     Model Weights      #####
    #> ##################################
    #>
    #> * Prognostic Function: RIDGE
    #>
    #> * Model Weights:
    #>  * atlanta: 3e-04
    #>  * austin: 0.0467
    #>  * baltimore: 1e-04
    #>  * baton rouge: 0.1337
    #>  * boston: -4e-04
    #>  * cincinnati: 0.2273
    #>  * columbus: 1e-04
    #>  * dallas: 0.0741
    #>  * denver: 1e-04
    #>  * detroit: 1e-04
    #>  * honolulu: 0.0674
    #>  * houston: 0.0048
    #>  * indianapolis: 1e-04
    #>  * jacksonville: -1e-04
    #>  * kansas city: -1e-04
    #>  * los angeles: 2e-04
    #>  * memphis: -2e-04
    #>  * miami: 0.2029
    #>  * milwaukee: -2e-04
    #>  * minneapolis: 0.0901
    #>  * nashville: 0.0687
    #>  * new orleans: -2e-04
    #>  * new york: 0.0048
    #>  * oakland: -0.001
    #>  * oklahoma city: -7e-04
    #>  * orlando: 1e-04
    #>  * philadelphia: -4e-04
    #>  * reno: 0.0308
    #>  * saint paul: 2e-04
    #>  * salt lake city: -3e-04
    #>  * san antonio: 0.0056
    #>  * san diego: 0.0452
    #>  * san francisco: 1e-04
    #>  * tucson: -1e-04
    plot(GeoTestBest, type = "Lift")
    #> You can include dates in your chart if you supply the end date of the treatment. Just specify the treatment_end_date parameter.

PlotLiftBest

    plot(GeoTestBest, type = "ATT")
    #> You can include dates in your chart if you supply the end date of the treatment. Just specify the treatment_end_date parameter.

PlotATTBest

The new results augment the GeoLift model with a Ridge prognostic function which improves the model fit as seen in the new L2 Imbalance metrics. This additional robustness is translated in a small increase in the Percent Lift. Furthermore, by augmenting the model with a prognostic function, we have an estimate of the estimated bias that was removed by the Augmented Synthetic Control Model.



facebookincubator/GeoLift documentation built on May 31, 2024, 10:09 a.m.