diftrans | R Documentation |
Given the pre and post probability mass functions as well as a vector of bandwidths, this function returns the associated transport costs. If another set of pre and post probability mass functions are given for the control group, then the differences-in-transports estimator is returned.
diftrans( pre_main = NULL, post_main = NULL, pre_control = NULL, post_control = NULL, var = MSRP, bandwidth_seq = seq(0, 40000, 1000), estimator = ifelse(!is.null(pre_control) & !is.null(post_control), "dit", "tc"), conservative = F, quietly = F, suppress_progress_bar = F, save_dit = F, costm_main = NULL, costm_ref_main = NULL, costm_control = NULL, costm_ref_control = NULL )
pre_main |
probability mass function (see "Details") for |
post_main |
probability mass function (see "Details") for |
pre_control |
probability mass function (see "Details") for |
post_control |
probability mass function (see "Details") for |
var |
the title of the first column of |
bandwidth_seq |
a vector of bandwidth values to try; default is |
estimator |
a string that takes on the value of "dit" for
differences-in-transports estimator or "tc" for the transport cost;
if |
conservative |
if |
quietly |
if |
suppress_progress_bar |
if |
save_dit |
if |
costm_main |
if |
costm_ref_main |
if |
costm_control |
if |
costm_ref_control |
if |
The pre_main
, post_main
, pre_control
, and
post_control
variables are all probability mass functions.
That is, they are a tibble with two columns:
column 1 contains the full support of var
, and
column 2, which should be titled "count", contains the corresponding mass of each value in the support.
Since column 1 contains the full support of var
and all these distributions
are of var
, column 1 must be the same for all distributions.
The cost matrices specified by costm
should use a common support of the respective distributions.
However, costm_ref
matrices should use the minimal support of the respective pre and post distributions.
a data.frame with the transport costs associated with each value of bandwidth_seq
.
bandwidth
: same as bandwidth_seq
main
: transport costs associated with main distributions
main2d
: transport costs associated with main distributions using twice the bandwidth;
appears only if conservative = TRUE
control
: transport costs associated with the control distributions;
appears only if pre_control
and post_control
are specified
diff
: main - control
diff2d
: main2d - control
If save_dit = TRUE
, then a list is returned, with the first element
(labeled out
) being the data.frame described above.
The second element (labeled dit
) is the differences-in-transports
estimator, and the third and final element (labeled optimal_bandwidth
)
is the bandwidth associated with the estimator.
# Find conservative transport cost of MSRP in Beijing between 2010 and 2011 using bandwidth = 0 # # step 1: find support support_Beijing <- Beijing_sample %>% dplyr::filter(ym >= as.Date("2010-01-01") & ym < "2012-01-01") %>% dplyr::select(MSRP) %>% dplyr::distinct() %>% dplyr::arrange(MSRP) %>% dplyr::filter(!is.na(MSRP)) %>% unlist() temp <- data.frame(MSRP = support_Beijing) # # step 2: prepare probability mass functions pre_Beijing <- Beijing_sample %>% dplyr::filter(ym >= as.Date("2010-01-01") & ym < "2011-01-01") %>% dplyr::group_by(dplyr::across(c(MSRP))) %>% dplyr::summarise(count = sum(sales)) %>% dplyr::filter(!is.na(MSRP)) %>% dplyr::left_join(temp, .) %>% dplyr::select(MSRP, count) %>% tidyr::replace_na(list(count = 0)) %>% tibble::as_tibble() post_Beijing <- Beijing_sample %>% dplyr::filter(ym >= as.Date("2011-01-01") & ym < "2012-01-01") %>% dplyr::group_by(dplyr::across(c(MSRP))) %>% dplyr::summarise(count = sum(sales)) %>% dplyr::filter(!is.na(MSRP)) %>% dplyr::left_join(temp, .) %>% dplyr::select(MSRP, count) %>% tidyr::replace_na(list(count = 0)) %>% tibble::as_tibble() # # step 3: compute results tc <- diftrans(pre_Beijing, post_Beijing, conservative = TRUE, bandwidth = 0) tc$main2d # Find transport cost of MSRP in Beijing between 2010 and 2011 using bandwidth = 10000 # tc_10000 <- diftrans(pre_Beijing, post_Beijing, bandwidth = 10000)# tc_10000$main # Find conservative differences-in-transport estimator using Tianjin as a control # # step 1: find support support_Tianjin <- Tianjin_sample %>% dplyr::filter(ym >= as.Date("2010-01-01") & ym < "2012-01-01") %>% dplyr::select(MSRP) %>% dplyr::distinct() %>% dplyr::arrange(MSRP) %>% dplyr::filter(!is.na(MSRP)) %>% unlist() temp <- data.frame(MSRP = support_Tianjin) # # step 2: prepare probability mass functions pre_Tianjin <- Tianjin_sample %>% dplyr::filter(ym >= as.Date("2010-01-01") & ym < "2011-01-01") %>% dplyr::group_by(dplyr::across(c(MSRP))) %>% dplyr::summarise(count = sum(sales)) %>% dplyr::filter(!is.na(MSRP)) %>% dplyr::left_join(temp, .) %>% dplyr::select(MSRP, count) %>% tidyr::replace_na(list(count = 0)) %>% tibble::as_tibble() post_Tianjin <- Tianjin_sample %>% dplyr::filter(ym >= as.Date("2011-01-01") & ym < "2012-01-01") %>% dplyr::group_by(dplyr::across(c(MSRP))) %>% dplyr::summarise(count = sum(sales)) %>% dplyr::filter(!is.na(MSRP)) %>% dplyr::left_join(temp, .) %>% dplyr::select(MSRP, count) %>% tidyr::replace_na(list(count = 0)) %>% tibble::as_tibble() # # step 3: compute results dit <- diftrans(pre_Beijing, post_Beijing, pre_Tianjin, post_Tianjin, conservative = TRUE, bandwidth = seq(0, 40000, 1000), save_dit = TRUE) dit$optimal_bandwidth dit$dit
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.