diftrans: Obtain Transport Costs and Differences-in-Transports...

View source: R/computingOT.R

diftransR Documentation

Obtain Transport Costs and Differences-in-Transports Estimator

Description

Given the pre and post probability mass functions as well as a vector of bandwidths, this function returns the associated transport costs. If another set of pre and post probability mass functions are given for the control group, then the differences-in-transports estimator is returned.

Usage

diftrans(
  pre_main = NULL,
  post_main = NULL,
  pre_control = NULL,
  post_control = NULL,
  var = MSRP,
  bandwidth_seq = seq(0, 40000, 1000),
  estimator = ifelse(!is.null(pre_control) & !is.null(post_control), "dit", "tc"),
  conservative = F,
  quietly = F,
  suppress_progress_bar = F,
  save_dit = F,
  costm_main = NULL,
  costm_ref_main = NULL,
  costm_control = NULL,
  costm_ref_control = NULL
)

Arguments

pre_main

probability mass function (see "Details") for var of the treated group before treatment occurs

post_main

probability mass function (see "Details") for var of the treated group after treatment occurs

pre_control

probability mass function (see "Details") for var of the control group before treatment occurs; only required for the computing the differences-in-transports estimator

post_control

probability mass function (see "Details") for var of the treated group after treatment occurs; only required for the computing the differences-in-transports estimator

var

the title of the first column of pre_main, post_main, pre_control, and post_control; default is MSRP (see Daljord et al. (2021))

bandwidth_seq

a vector of bandwidth values to try; default is seq(0, 40000, 1000)

estimator

a string that takes on the value of "dit" for differences-in-transports estimator or "tc" for the transport cost; if pre_control and post_control are specified, default is "dit"; otherwise, default is "tc"

conservative

if TRUE, then the bandwidth sequence will be multiplied by 2 to provide a conservative estimate of the transport costs/ difference-in-transports estimator; default is FALSE

quietly

if TRUE, some results and will be suppressed from printing; default is FALSE

suppress_progress_bar

if TRUE, the progress bar will be suppressed; default is FALSE

save_dit

if TRUE, the differences-in-transports estimator as well as the associated bandwidth will be returned

costm_main

if NULL, the cost matrix with common support will be such that if the transport distance is greater than what is specified in bandwidth_seq, cost is 1 and 0 otherwise.

costm_ref_main

if NULL, the cost matrix referenced by transport::transport will be using the minimal support of main distributions

costm_control

if NULL, the cost matrix with common support will be such that if the transport distance is greater than what is specified in bandwidth_seq, cost is 1 and 0 otherwise.

costm_ref_control

if NULL, the cost matrix referenced by transport::transport will be using the minimal support of control distributions

Details

The pre_main, post_main, pre_control, and post_control variables are all probability mass functions. That is, they are a tibble with two columns:

  • column 1 contains the full support of var, and

  • column 2, which should be titled "count", contains the corresponding mass of each value in the support.

Since column 1 contains the full support of var and all these distributions are of var, column 1 must be the same for all distributions.

The cost matrices specified by costm should use a common support of the respective distributions. However, costm_ref matrices should use the minimal support of the respective pre and post distributions.

Value

a data.frame with the transport costs associated with each value of bandwidth_seq.

  • bandwidth: same as bandwidth_seq

  • main: transport costs associated with main distributions

  • main2d: transport costs associated with main distributions using twice the bandwidth; appears only if conservative = TRUE

  • control: transport costs associated with the control distributions; appears only if pre_control and post_control are specified

  • diff: main - control

  • diff2d: main2d - control

If save_dit = TRUE, then a list is returned, with the first element (labeled out) being the data.frame described above. The second element (labeled dit) is the differences-in-transports estimator, and the third and final element (labeled optimal_bandwidth) is the bandwidth associated with the estimator.

Examples

# Find conservative transport cost of MSRP in Beijing between 2010 and 2011 using bandwidth = 0
# # step 1: find support
support_Beijing <- Beijing_sample %>%
  dplyr::filter(ym >= as.Date("2010-01-01") & ym < "2012-01-01") %>%
  dplyr::select(MSRP) %>%
  dplyr::distinct() %>%
  dplyr::arrange(MSRP) %>%
  dplyr::filter(!is.na(MSRP)) %>%
  unlist()
temp <- data.frame(MSRP = support_Beijing)
# # step 2: prepare probability mass functions
pre_Beijing <- Beijing_sample %>%
  dplyr::filter(ym >= as.Date("2010-01-01") & ym < "2011-01-01") %>%
  dplyr::group_by(dplyr::across(c(MSRP))) %>%
  dplyr::summarise(count = sum(sales)) %>%
  dplyr::filter(!is.na(MSRP)) %>%
  dplyr::left_join(temp, .) %>%
  dplyr::select(MSRP, count) %>%
  tidyr::replace_na(list(count = 0)) %>%
  tibble::as_tibble()
post_Beijing <- Beijing_sample %>%
  dplyr::filter(ym >= as.Date("2011-01-01") & ym < "2012-01-01") %>%
  dplyr::group_by(dplyr::across(c(MSRP))) %>%
  dplyr::summarise(count = sum(sales)) %>%
  dplyr::filter(!is.na(MSRP)) %>%
  dplyr::left_join(temp, .) %>%
  dplyr::select(MSRP, count) %>%
  tidyr::replace_na(list(count = 0)) %>%
  tibble::as_tibble()
# # step 3: compute results
tc <- diftrans(pre_Beijing, post_Beijing, conservative = TRUE, bandwidth = 0)
tc$main2d

# Find transport cost of MSRP in Beijing between 2010 and 2011 using bandwidth = 10000
# tc_10000 <- diftrans(pre_Beijing, post_Beijing, bandwidth = 10000)# tc_10000$main
# Find conservative differences-in-transport estimator using Tianjin as a control
# # step 1: find support
support_Tianjin <- Tianjin_sample %>%
  dplyr::filter(ym >= as.Date("2010-01-01") & ym < "2012-01-01") %>%
  dplyr::select(MSRP) %>%
  dplyr::distinct() %>%
  dplyr::arrange(MSRP) %>%
  dplyr::filter(!is.na(MSRP)) %>%
  unlist()
temp <- data.frame(MSRP = support_Tianjin)
# # step 2: prepare probability mass functions
pre_Tianjin <- Tianjin_sample %>%
  dplyr::filter(ym >= as.Date("2010-01-01") & ym < "2011-01-01") %>%
  dplyr::group_by(dplyr::across(c(MSRP))) %>%
  dplyr::summarise(count = sum(sales)) %>%
  dplyr::filter(!is.na(MSRP)) %>%
  dplyr::left_join(temp, .) %>%
  dplyr::select(MSRP, count) %>%
  tidyr::replace_na(list(count = 0)) %>%
  tibble::as_tibble()
post_Tianjin <- Tianjin_sample %>%
  dplyr::filter(ym >= as.Date("2011-01-01") & ym < "2012-01-01") %>%
  dplyr::group_by(dplyr::across(c(MSRP))) %>%
  dplyr::summarise(count = sum(sales)) %>%
  dplyr::filter(!is.na(MSRP)) %>%
  dplyr::left_join(temp, .) %>%
  dplyr::select(MSRP, count) %>%
  tidyr::replace_na(list(count = 0)) %>%
  tibble::as_tibble()
# # step 3: compute results
dit <- diftrans(pre_Beijing, post_Beijing, pre_Tianjin, post_Tianjin,
                   conservative = TRUE, bandwidth = seq(0, 40000, 1000),
                   save_dit = TRUE)
dit$optimal_bandwidth
dit$dit

omkarakatta/diftrans documentation built on Feb. 24, 2023, 9:06 p.m.