MultiCellMarketSelection: Multi-Cell Sampling Method.

View source: R/MultiCell.R

MultiCellMarketSelectionR Documentation

Multi-Cell Sampling Method.

Description

[Experimental]

MultiCellMarketSelection performs a Power-Analysis driven Market Selection for a Multi-Cell GeoLift test using a specified sampling methodology to split the data set into k similar partitions.

[Experimental]

GetMultiCellWeights returns the synthetic control weights as a data frame for a given test set-up.

Usage

MultiCellMarketSelection(
  data,
  k = 2,
  sampling_method = "systematic",
  top_choices = 10,
  N = c(),
  X = c(),
  Y_id = "Y",
  location_id = "location",
  time_id = "time",
  effect_size = seq(-0.5, 0.5, 0.05),
  treatment_periods,
  lookback_window = 1,
  cpic = 1,
  alpha = 0.1,
  normalize = FALSE,
  model = "none",
  fixed_effects = TRUE,
  dtw = 0,
  Correlations = FALSE,
  run_stochastic_process = FALSE,
  parallel = TRUE,
  parallel_setup = "sequential",
  side_of_test = "two_sided",
  conformal_type = "iid",
  ns = 1000
)

## S3 method for class 'MultiCellMarketSelection'
print(x, ...)

GetMultiCellWeights(x, test_markets = list())

## S3 method for class 'MultiCellMarketSelection'
plot(
  x,
  test_markets = list(),
  type = "Lift",
  treatment_end_date = NULL,
  frequency = "daily",
  plot_start_date = NULL,
  post_treatment_periods = 0,
  title = "",
  stacked = TRUE,
  ...
)

Arguments

data

A data.frame containing the historical conversions by geographic unit. It requires a "locations" column with the geo name, a "Y" column with the outcome data (units), a time column with the indicator of the time period (starting at 1), and covariates.

k

Number of partitions or cells. k = 2 by default.

sampling_method

Sampling Method used to create the k partitions. Set to "systematic" by default.

top_choices

Number of top Market Selection choices to print for each cell. 10 by default.

N

List of number of test markets to calculate power for. If left empty (default), it will populate the list of markets with the deciles of the total number of locations.

X

List of names of covariates.

Y_id

Name of the outcome variable (String).

location_id

Name of the location variable (String).

time_id

Name of the time variable (String).

effect_size

A vector of effect sizes to test by default a sequence between 0 - 25 percent in 5 percent increments: seq(0,0.25,0.05). Make sure that the sequence includes zero.

treatment_periods

List of treatment periods to calculate power for. It is recommended to specify a single treatment length for multi-cell Market Selections.

lookback_window

A number indicating how far back in time the simulations for the power analysis should go. For instance, a value equal to 5 will simulate power for the last five possible tests. By default lookback_window = 1 which will only execute the most recent test based on the data.

cpic

Number indicating the Cost Per Incremental Conversion.

alpha

Significance Level. By default 0.1.

normalize

A logic flag indicating whether to scale the outcome which is useful to accelerate computing speed when the magnitude of the data is large. The default is FALSE.

model

A string indicating the outcome model used to augment the Augmented Synthetic Control Method. Augmentation through a prognostic function can improve fit and reduce L2 imbalance metrics.

  • "None": ASCM is not augmented by a prognostic function. Defualt.

  • "Ridge": Augments with a Ridge regression. Recommended to improve fit for smaller panels (less than 40 locations and 100 time-stamps.))

  • "GSYN": Augments with a Generalized Synthetic Control Method. Recommended to improve fit for larger panels (more than 40 locations and 100 time-stamps.

fixed_effects

A logic flag indicating whether to include unit fixed effects in the model. Set to TRUE by default.

dtw

Emphasis on Dynamic Time Warping (DTW), dtw = 1 focuses exclusively on this metric while dtw = 0 (default) relies on correlations only.

Correlations

A logic flag indicating whether an additional column with the correlations between the test regions and total control markets will be included in the final output. Set to FALSE by default.

run_stochastic_process

A logic flag indicating whether to select test markets through random sampling of the the similarity matrix. Given that interpolation biases may be relevant if the synthetic control matches the characteristics of the test unit by averaging away large discrepancies between the characteristics of the test and the units in the synthetic controls, it is recommended to only use random sampling after making sure all units are similar. This parameter is set by default to FALSE.

parallel

A logic flag indicating whether to use parallel computing to speed up calculations. Set to TRUE by default.

parallel_setup

A string indicating parallel workers set-up. Set to "sequential" by default.

side_of_test

A string indicating whether confidence will be determined using a one sided or a two sided test.

  • "two_sided": The test statistic is the sum of all treatment effects, i.e. sum(abs(x)). Defualt.

  • "one_sided": One-sided test against positive or negaative effects i.e. If the effect being applied is negative, then defaults to -sum(x). H0: ES >= 0; HA: ES < 0. If the effect being applied is positive, then defaults to sum(x). H0: ES <= 0; HA: ES > 0.

conformal_type

Type of conformal inference used. Can be either "iid" for Independent and identically distributed or "block" for moving block permutations. Set to "iid" by default.

ns

Number of resamples for "iid" permutations if ⁠conformal_type = "iid⁠. Set to 1000 by default.

x

MultiCellMarketSelection()

...

additional arguments

test_markets

List of market IDs per cell. The list must contain exactly k numeric values corresponding to the power analysis. The recommended layout is list(cell_1 = 1, cell2 = 1, cell3 = 1,...).

type

Type of plot. By default "Lift" which plots the incrementality on the outcome variable. If type is set to "ATT", the average ATT is plotted. If type is set to "Incrementality", daily incremental values are plotted.

treatment_end_date

Character that represents a date in year-month=day format.

frequency

Character that represents periodicity of time stamps. Can be either weekly or daily. Defaults to daily.

plot_start_date

Character that represents initial date of plot in year-month-day format.

post_treatment_periods

Number of post-treatment periods. Zero by default.

title

String for the title of the plot. Empty by default.

stacked

Logic flag indicating whether to stack all the Multi-Cell plots together vertically or to output each one of them separately. Set to TRUE by default.

Value

A 'MultiCellMarketSelection' object of four objects:

  • "TopChoices": Data frame with the top choices by cell.

  • "Models": The complete list of all Market Selections for each cell.

  • "data": The input data.

  • "test_details": The test details.

Data-frame with the locations and the synthetic control weights for each cell.


facebookincubator/GeoLift documentation built on May 31, 2024, 10:09 a.m.