best_matches: For each market, find the best matching control market

View source: R/functions.R

best_matchesR Documentation

For each market, find the best matching control market

Description

best_matches finds the best matching control markets for each market in the dataset using dynamic time warping (dtw package). The algorithm simply loops through all viable candidates for each market in a parallel fashion, and then ranks by distance and/or correlation.

Usage

best_matches(data=NULL,
             markets_to_be_matched=NULL,
             id_variable=NULL,
             date_variable=NULL,
             matching_variable=NULL,
             parallel=TRUE,
             warping_limit=1,
             start_match_period=NULL,
             end_match_period=NULL,
             matches=NULL,
             dtw_emphasis=1, 
             suggest_market_splits=FALSE,
             splitbins=10,
             log_for_splitting=FALSE)

Arguments

data

input data.frame for analysis. The dataset should be structured as "stacked" time series (i.e., a panel dataset). In other words, markets are rows and not columns – we have a unique row for each area/time combination.

markets_to_be_matched

Use this parameter if you only want to get control matches for a subset of markets or a single market The default is NULL which means that all markets will be paired with matching markets

id_variable

the name of the variable that identifies the markets

date_variable

the time stamp variable

matching_variable

the variable (metric) used to match the markets. For example, this could be sales or new customers

parallel

set to TRUE for parallel processing. Default is TRUE

warping_limit

the warping limit used for matching. Default is 1, which means that a single query value can be mapped to at most 2 reference values.

start_match_period

the start date of the matching period (pre period). Must be a character of format "YYYY-MM-DD" – e.g., "2015-01-01"

end_match_period

the end date of the matching period (pre period). Must be a character of format "YYYY-MM-DD" – e.g., "2015-10-01"

matches

Number of matching markets to keep in the output (to use less markets for inference, use the control_matches parameter when calling inference). Default is to keep all matches.

dtw_emphasis

Number from 0 to 1. The amount of emphasis placed on dtw distances, versus correlation, when ranking markets. Default is 1 (all emphasis on dtw). If emphasis is set to 0, all emphasis would be put on correlation, which is recommended when optimal splits are requested. An emphasis of 0.5 would yield equal weighting.

suggest_market_splits

if set to TRUE, best_matches will return suggested test/control splits based on correlation and market sizes. Default is FALSE. For this option to be invoked, markets_to_be_matched must be NULL (i.e., you must run a full match). Note that the algorithm will force test and control to have the same number of markets. So if the total number of markets is odd, one market will be left out.

splitbins

Number of size-based bins used to stratify when splitting markets into test and control. Only markets inside the same bin can be matched. More bins means more emphasis on market size when splitting. Less bins means more emphasis on correlation. Default is 10.

log_for_splitting

This parameter determines if optimal splitting is based on correlations of the raw matching metric values or the correlations of log(matching metric). Only relevant if suggest_market_splits is TRUE. Default is FALSE.

Value

Returns an object of type market_matching. The object has the following elements:

BestMatches

A data.frame that contains the best matches for each market. All stats reflect data after the market pairs have been joined by date. Thus SUMTEST and SUMCNTL can have smaller values than what you see in the Bins output table

Data

The raw data used to do the matching

MarketID

The name of the market identifier

MatchingMetric

The name of the matching variable

DateVariable

The name of the date variable

SuggestedTestControlSplits

Suggested test/control splits. SUMTEST and SUMCNTL are the total market volumes, not volume after joining with other markets. They're greater or equal to the values in the BestMatches file.

Bins

Bins used for splitting and corresponding volumes

Examples

## Not run: 
##-----------------------------------------------------------------------
## Find the best matches for the CPH airport time series
##-----------------------------------------------------------------------
library(MarketMatching)
data(weather, package="MarketMatching")
mm <- best_matches(data=weather, 
                   id="Area",
                   markets_to_be_matched=c("CPH", "SFO"),
                   date_variable="Date",
                   matching_variable="Mean_TemperatureF",
                   parallel=FALSE,
                   start_match_period="2014-01-01",
                   end_match_period="2014-10-01")
head(mm$BestMatches)

## End(Not run)


MarketMatching documentation built on May 29, 2024, 6:33 a.m.