diftrans: Obtain Transport Costs and Differences-in-Transports...
In omkarakatta/diftrans: Compute the Differences-in-Transports Estimator

View source: R/computingOT.R

diftrans

R Documentation

Obtain Transport Costs and Differences-in-Transports Estimator

Description

Given the pre and post probability mass functions as well as a vector of bandwidths, this function returns the associated transport costs. If another set of pre and post probability mass functions are given for the control group, then the differences-in-transports estimator is returned.

Usage

diftrans(
  pre_main = NULL,
  post_main = NULL,
  pre_control = NULL,
  post_control = NULL,
  var = MSRP,
  bandwidth_seq = seq(0, 40000, 1000),
  estimator = ifelse(!is.null(pre_control) & !is.null(post_control), "dit", "tc"),
  conservative = F,
  quietly = F,
  suppress_progress_bar = F,
  save_dit = F,
  costm_main = NULL,
  costm_ref_main = NULL,
  costm_control = NULL,
  costm_ref_control = NULL
)

Arguments

`pre_main`	probability mass function (see "Details") for `var` of the treated group before treatment occurs
`post_main`	probability mass function (see "Details") for `var` of the treated group after treatment occurs
`pre_control`	probability mass function (see "Details") for `var` of the control group before treatment occurs; only required for the computing the differences-in-transports estimator
`post_control`	probability mass function (see "Details") for `var` of the treated group after treatment occurs; only required for the computing the differences-in-transports estimator
`var`	the title of the first column of `pre_main`, `post_main`, `pre_control`, and `post_control`; default is `MSRP` (see Daljord et al. (2021))
`bandwidth_seq`	a vector of bandwidth values to try; default is `seq(0, 40000, 1000)`
`estimator`	a string that takes on the value of "dit" for differences-in-transports estimator or "tc" for the transport cost; if `pre_control` and `post_control` are specified, default is "dit"; otherwise, default is "tc"
`conservative`	if `TRUE`, then the bandwidth sequence will be multiplied by 2 to provide a conservative estimate of the transport costs/ difference-in-transports estimator; default is `FALSE`
`quietly`	if `TRUE`, some results and will be suppressed from printing; default is `FALSE`
`suppress_progress_bar`	if `TRUE`, the progress bar will be suppressed; default is `FALSE`
`save_dit`	if `TRUE`, the differences-in-transports estimator as well as the associated bandwidth will be returned
`costm_main`	if `NULL`, the cost matrix with common support will be such that if the transport distance is greater than what is specified in `bandwidth_seq`, cost is 1 and 0 otherwise.
`costm_ref_main`	if `NULL`, the cost matrix referenced by `transport::transport` will be using the minimal support of main distributions
`costm_control`	if `NULL`, the cost matrix with common support will be such that if the transport distance is greater than what is specified in `bandwidth_seq`, cost is 1 and 0 otherwise.
`costm_ref_control`	if `NULL`, the cost matrix referenced by `transport::transport` will be using the minimal support of control distributions

Details

The pre_main, post_main, pre_control, and post_control variables are all probability mass functions. That is, they are a tibble with two columns:

column 1 contains the full support of var, and
column 2, which should be titled "count", contains the corresponding mass of each value in the support.

Since column 1 contains the full support of var and all these distributions are of var, column 1 must be the same for all distributions.

The cost matrices specified by costm should use a common support of the respective distributions. However, costm_ref matrices should use the minimal support of the respective pre and post distributions.

Value

a data.frame with the transport costs associated with each value of bandwidth_seq.

bandwidth: same as bandwidth_seq
main: transport costs associated with main distributions
main2d: transport costs associated with main distributions using twice the bandwidth; appears only if conservative = TRUE
control: transport costs associated with the control distributions; appears only if pre_control and post_control are specified
diff: main - control
diff2d: main2d - control

If save_dit = TRUE, then a list is returned, with the first element (labeled out) being the data.frame described above. The second element (labeled dit) is the differences-in-transports estimator, and the third and final element (labeled optimal_bandwidth) is the bandwidth associated with the estimator.

Examples

# Find conservative transport cost of MSRP in Beijing between 2010 and 2011 using bandwidth = 0
# # step 1: find support
support_Beijing <- Beijing_sample %>%
  dplyr::filter(ym >= as.Date("2010-01-01") & ym < "2012-01-01") %>%
  dplyr::select(MSRP) %>%
  dplyr::distinct() %>%
  dplyr::arrange(MSRP) %>%
  dplyr::filter(!is.na(MSRP)) %>%
  unlist()
temp <- data.frame(MSRP = support_Beijing)
# # step 2: prepare probability mass functions
pre_Beijing <- Beijing_sample %>%
  dplyr::filter(ym >= as.Date("2010-01-01") & ym < "2011-01-01") %>%
  dplyr::group_by(dplyr::across(c(MSRP))) %>%
  dplyr::summarise(count = sum(sales)) %>%
  dplyr::filter(!is.na(MSRP)) %>%
  dplyr::left_join(temp, .) %>%
  dplyr::select(MSRP, count) %>%
  tidyr::replace_na(list(count = 0)) %>%
  tibble::as_tibble()
post_Beijing <- Beijing_sample %>%
  dplyr::filter(ym >= as.Date("2011-01-01") & ym < "2012-01-01") %>%
  dplyr::group_by(dplyr::across(c(MSRP))) %>%
  dplyr::summarise(count = sum(sales)) %>%
  dplyr::filter(!is.na(MSRP)) %>%
  dplyr::left_join(temp, .) %>%
  dplyr::select(MSRP, count) %>%
  tidyr::replace_na(list(count = 0)) %>%
  tibble::as_tibble()
# # step 3: compute results
tc <- diftrans(pre_Beijing, post_Beijing, conservative = TRUE, bandwidth = 0)
tc$main2d

# Find transport cost of MSRP in Beijing between 2010 and 2011 using bandwidth = 10000
# tc_10000 <- diftrans(pre_Beijing, post_Beijing, bandwidth = 10000)# tc_10000$main
# Find conservative differences-in-transport estimator using Tianjin as a control
# # step 1: find support
support_Tianjin <- Tianjin_sample %>%
  dplyr::filter(ym >= as.Date("2010-01-01") & ym < "2012-01-01") %>%
  dplyr::select(MSRP) %>%
  dplyr::distinct() %>%
  dplyr::arrange(MSRP) %>%
  dplyr::filter(!is.na(MSRP)) %>%
  unlist()
temp <- data.frame(MSRP = support_Tianjin)
# # step 2: prepare probability mass functions
pre_Tianjin <- Tianjin_sample %>%
  dplyr::filter(ym >= as.Date("2010-01-01") & ym < "2011-01-01") %>%
  dplyr::group_by(dplyr::across(c(MSRP))) %>%
  dplyr::summarise(count = sum(sales)) %>%
  dplyr::filter(!is.na(MSRP)) %>%
  dplyr::left_join(temp, .) %>%
  dplyr::select(MSRP, count) %>%
  tidyr::replace_na(list(count = 0)) %>%
  tibble::as_tibble()
post_Tianjin <- Tianjin_sample %>%
  dplyr::filter(ym >= as.Date("2011-01-01") & ym < "2012-01-01") %>%
  dplyr::group_by(dplyr::across(c(MSRP))) %>%
  dplyr::summarise(count = sum(sales)) %>%
  dplyr::filter(!is.na(MSRP)) %>%
  dplyr::left_join(temp, .) %>%
  dplyr::select(MSRP, count) %>%
  tidyr::replace_na(list(count = 0)) %>%
  tibble::as_tibble()
# # step 3: compute results
dit <- diftrans(pre_Beijing, post_Beijing, pre_Tianjin, post_Tianjin,
                   conservative = TRUE, bandwidth = seq(0, 40000, 1000),
                   save_dit = TRUE)
dit$optimal_bandwidth
dit$dit

omkarakatta/diftrans documentation built on Feb. 24, 2023, 9:06 p.m.