Adaption Costs in Public Procurement Auctions

Author: Frederik Collin

< ignore

library(restorepoint)
# facilitates error detection
# set.restore.point.options(display.restore.point=TRUE)

library(RTutor)
library(yaml)
#library(restorepoint)
setwd("C:/Users/Freddy/Desktop/MasterarbeitGitHub/RTutorProcurementAuction/inst/ps/RTutorProcurementAuction")
ps.name = "RTutorProcurementAuction"; sol.file = paste0(ps.name,"_sol.Rmd")
libs = c("foreign","dplyr","lfe","stargazer","sandwich","XML","leaflet","regtools","AER") # character vector of all packages you load in the problem set
#name.rmd.chunks(sol.file) # set auto chunk names in this file
create.ps(sol.file=sol.file, ps.name=ps.name, user.name=NULL,libs=libs, stop.when.finished=FALSE,extra.code.file="f1.R",addons="quiz", use.memoise=TRUE,var.txt.file = "variables.txt", rps.has.sol=TRUE)


show.shiny.ps(ps.name, load.sav=FALSE,  sample.solution=FALSE, is.solved=FALSE, catch.errors=TRUE, launch.browser=TRUE)
stop.without.error()

>

Welcome to this problem set which is part of my master thesis at the University of Ulm. It analyses adaption costs due to the incompleteness of contracts in public procurement auctions. It is developed from the paper "Bidding for Incomplete Contracts: An Empirical Analysis of Adaption Costs", written by Patrick Bajari, Stephanie Houghton and Steven Tadelis, published in 2014 in the American Economic Review 104 (4) to which I will refer by BHT during this problem set. You can download the paper from aeaweb.org/articles.php?doi=10.1257/aer.104.4.1288 to get more detailed information. The Stata and Matlab code can be downloaded from the same page.

You do not need to solve the exercises in the given order but it is recommended to do so as it makes most sense. Moreover later exercises expect earlier received knowledge from you. Within one chapter you need to solve the tasks in the given order except from the ones that are optional (like all quizzes that you will find and some additional code blocks). How the problem set works will be explained starting with the first task. How to solve this problem set on your own computer is described here github.com/Fcolli/RTutorProcurementAuction.

Exercise Content

  1. Overview of Public Procurement Auctions

  2. Skewed Bids

2.1 Skewed Bids Regression

  1. Characteristics Influencing the Bids

3.1 The Markup Measure

3.2 Cost Measures

3.3 Measures of Market Power

3.4 Reduced Form Estimates of the Bids

  1. Adaptions and Adaption Costs

  2. A Model of Empirical Bidding Behavior

  3. Reduced Form Estimates of Adaption Costs

6.1 Examine Adaption Costs using Project Fixed Effects

6.2 Examine Adaption Costs while Accounting for Endogeneity of Ex Post Changes

  1. Conclusion

  2. References

Exercise 1 -- Overview of Public Procurement Auctions

In this problem set we are going to analyse adaption costs in public procurement auctions. This chapter will introduce you to the main aspects of public procurement auctions as they are done by California's Department of Transportation (Caltrans). To do further analysis we need to get to know how auctions in our case work and who auctions what to whom.
Caltrans manages the state highway system in California and thus needs companys to do the roadwork for them. Auctions are the system of choice to allocate the project to a company. In a first step Caltrans' engineers prepare a list of items that describe the tasks and materials needed to complete the job. In a second step they estimate quantities for every work item. Then the job with all its items and materials as well as the estimated quantities are publicly advertised along with a set of plans and specifications that describe how the project is to be completed.

In order to get a better insight on how it works I will illustrate this with an example. We have data containing a whole bunch of information about contracts from 1999 until 2005. The first data set we want to use is biditems.dta which contains bid-item-level data meaning that there is one observation for each item of each contract, for each bidder. Let us read in our first data set. To do so we want to use the function read.dta() out of the `foreign package.

< info "read.dta()"

The command read.dta() from the foreign package reads a file data.dta in Stata version 5-12 binary format into a data frame. If you set the working directory correctly and save the data in it, it will suffice to use the name of the data.

library(foreign)
read.dta("data.dat")

You can also set the full path if you like to store the data not in the working directory:

library(foreign)
read.dta("C:/mypath/data.dat")

To store your results in a variable mydata proceed as follows:

library(foreign)
mydata=read.dta("data.dat")

If you want to know more about the read.dta() command you can take a look at stat.ethz.ch/R-manual/R-devel/library/foreign/html/read.dta.html.

>

Since this is your first task the basic structure of the command is already given. Before you start entering your code you need to press the edit button. This needs to be done in every first exercise of a chapter.

Task: Use the command read.dta() to read in the downloaded data set biditems.dta. Store it into the variable dat. If you need help how to use read.dta() check the above info box. When you are finished, click the check button.
If you need further advice, click the hint button, which contains more detailed information. If the hint does not help you, you can always access the solution with the solution button. Here you just need to uncomment (remove the # in front of the code) the code and fill the ... with the right commands.

#< task
# ...=read.dta("...")
#>
dat = read.dta("biditems.dta")
#< hint
display("Just write: dat=read.dta(\"biditems.dta\") and press check afterwards.")
#>

< award "Starter"

Welcome to this problem set. I hope you will enjoy solving it. During the the problem set you will earn awards for complicated tasks or quizzes.

>

To illustrate what I explained above we want to use a specific contract with the name 02-356604. This contract will be used as an example during this whole problem set. To do so you can use the filter() function of the dplyr package. If you are not familiar with it take a look at the info box below.

< info "filter()"

The function filter() contained in the dplyr package is used to generate a subset of a data frame. If you have a data set dat that contains a row year ranging from 2000 until 2015 and you want to generate a new data frame dat_2010 that only contains information out of 2010 you can use the following:

library(dplyr)
dat_2010 = filter(dat, year == 2010)

If you want to know more about filter() you can take a look at cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html.

>

Task: Use filter() to generate a data set example that only contains the contract 02-356604. The column contract is a string column in our data set. So, if you want to filter for a specific contract, you'll need to put quotation marks around the name of the contract you are looking for. Here you need to filter for contract == "02-356604".

#< task
# ... = filter(..., ...)
#>
example = filter(dat, contract == "02-356604")
#< hint
display("Just write: example = filter(dat, contract == \"02-356604\") and press check afterwards.")
#>

If you want to take a look at the example data press data, this will get you to the Data Explorer section and then you'll have to select example in the upper left corner. If you press Description in the Data Explorer you will get more detailed information about the content of each column.
As already mentioned above Caltrans assigns a list of items and materials needed to do the job. These items are stored in the variable itemcode, where each unique item has a different number. The column description tells us what kind of item we are looking at.

Task: Find out what itemcode 120090 stands for. All commands are already entered so you just need to press check here.

#< task
distinct(select(example, itemcode, description))
#>

Notice that if you move your mouse over the header of a column you will get additional information describing what this column stands for. The select() function which is also out of dplyr allows us to select specific columns of a data frame. For more information check the info box below. The command distinct() from dplyr, which is wrapped arround the select() command, prints out only unique rows. We need to do that here since we have three times the same entrys.

< info "select()"

The function select() contained in the dplyr package is used to select specific columns of a data frame. If you have a data set dat that contains columns year, name, country and income and you only want to access year and income, you can do so with the following command:

library(dplyr)
select(dat, year, income)

If you want to know more about the select() function you can take a look at cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html.

>

< quiz "Item Codes and what they Stand for"

question: What does the item code 120090 stand for? sc: - THERMOPLASTIC PAVEMENT MARKING - ASPHALT CONCRETE (TYPE A) - TRAFFIC CONTROL SYSTEM - CONSTRUCTION AREA SIGNS* success: Great, all answers are correct! failure: Not all answers correct. Try again.

>

Now we know that 120090 stands for construction area signs. This tells us that the item with itemcode 120090 is a material and not a task. Caltrans also provides estimated quantities for every item which are stored as estq in our data set. In addition Caltrans estimates a fair price for every item based on a collection from passed bids and market prices. These prices are stored as CCDBprice where CCDB is the Contract Cost Data Book containing passed bids and market prices. If you want to take a look at it you can do so here: dot.ca.gov/hq/esc/oe/awards/. The price estimates will play an important role later.

Task: Find out the estimated price and quantity of the item THERMOPLASTIC PAVEMENT MARKING with item code 840515. This time there is no specific way to do so. When you are done you can answer the quiz below.

#< task_notest
# You can enter your commands here
#>

#< hint
display("With the following command you can access all information needed: 
distinct(select(example, itemcode, description, estq,CCDBprice)).")
#>

< quiz "estq1"

parts: - question: 1. What is the estimated quantity for THERMOPLASTIC PAVEMENT MARKING in our example contract? answer: 350 roundto: 1

>

< award "Quizmaster Lv. 1"

You solved the first quiz without my help.

>

Now we know how Caltrans slices the whole project in to tasks and estimates quantities and prices. But how does the auction itself work?
A potential bidder needs to submit prices for every item of an auction until some fixed date. This list of bids must be sealed. At the fixed time Caltrans opens the sealed bids (more or less simultaneously) of all competitors. For every bidder a total bid is made the following way: The total bid is the item bid times the estimated quantity and then summed over all items. The per item bid is stored as unitbid, the total bid as bidtotal_est. To see how it is build up we will do it for one bidder in our example. The name of a bidder is stored in bidder. Additionally there is a unique bidder identification number bidderid in our data set. We will show the example for the BALDWIN CONTRACTING COMPANY INC, with number 23.

Task: Just press check to create a new data frame containing only information about the bidder BALDWIN CONTRACTING COMPANY INC, with bidderid equals 23, on our example contract.

#< task
# Select only the bidder with number 23
baldwin = filter(example, bidderid == 23)
# Show the itemcode, unitbid, estimated quantities and the total bid.
select(baldwin, itemcode, unitbid, estq, bidtotal_est)
#>

Task: Compute the total bid of BALDWIN CONTRACTING COMPANY INC. To do so use sum() out of base R. If you want to access only the column unitbid of the data frame baldwin, that only contains information of the bidder with number 23 on our example contract, you can do so by typing baldwin$unitbid. Recall that the estimated quantities are stored as estq. If you are not sure what to do, you can press the hint button. If even that does not help you, you can always access the solution with the solution button.

#< task
# Sum over the item bids times the estimated quantities.

# Enter your command here

#>
sum(baldwin$unitbid*baldwin$estq)
#< hint
display("Your command should look like sum(baldwin$... *baldwin$...).")
#>
#< task
# To see if you have done the right calculus I'll print out bidtotal_est.
# Since the total bid is stored in every row for one bidder, we just use the first entry.
baldwin$bidtotal_est[1]
#>

This time we have learned how to get from the item bids over the estimated quantities to the total bid. One problem in public procurement auctions is that estimated quantities and actual quantities never perfectly match. Actual quantities are those quantities that are actually used while completing the job. The actual total bid is made up by multiplying the item bid with the actual quantities and summing up over all items. It is stored as bidtotal_act in our data set. At this point of time we do not speak about a final payment since payments are altered due to some other factors that I will explain later in more detail.

Task: Just press check to see the estimated and actual total bid of BALDWIN CONTRACTING COMPANY INC. The distinct() command prints out only unique entrys of a data frame. We need to do that since otherwise we would get seven times the same row.

#< task
# The estimated and actual total bid
distinct(select(baldwin,bidtotal_est,bidtotal_act))
#>

Knowing how the auction works it is time to move on. Estimated quantities more or less never match the actual quantities due to changes in the external environment or inadequate designs and specifications. This causes the contracts to be incomplete. We will come back to this at some later point in time since it is the main focus of this work. Just notice that the estimated and actual quantities are not the same. The actual quantities are stored as actq. The variable pctover_q stores the percentage difference between actq and estq. That is $\textrm{pctover_q} = \frac{\textrm{actq}}{\textrm{estq}} - 1$. This measures the overrun and underrun of an item. Let's now return to our example data set of the company BALDWIN CONTRACTING COMPANY INC. which was called baldwin.

Task: Use the select() command to give out the variables itemcode, description, estq, actq and pctover_q in order to solve the following task.

#< task
# Enter your command here.
#>
select(baldwin, itemcode, description, estq, actq, pctover_q)
#< hint
display("Your command should look as follows:
        select(baldwin, itemcode, description, estq, actq, pctover_q).")
#>

< quiz "What overrun?"

question: Which item ran over in this auction? Type in the right item code. answer: 198007 roundto: 1

>

The fact that some items may over- or underrun together with how the total bid is constructed offers the opportunity for bidders to increase their expected profit. To make this point clear I will give a short example.
Assume that BALDWIN CONTRACTING COMPANY INC has perfect foresight about the actual quantities and knows that it wins the auction with a total bid of 832084.5 dollars which is the bid they did enter in the auction seen before. As pointed out above one item ran over while others ran under and some matched perfectly. The idea is to bid zero on items that run under or match perfectly and spend all money (here the 832084.5 dollars) on the items that run over. While the estimated total bid, used as a score in the auction, stays the same we can increase the actual total bid which is a part of our final payment.
How can we do so in our example? We bid zero on all items except the one that runs over which has the item code 198007. Since the estimated quantity of item 198007 is 1480 and we bid zero on all others we can make a unit bid of maximal $\frac{832084.5}{1480} = 562.2193$ dollars. We choose $562.21929 < 562.2193$ since this solves a rounding issue.

Task: Press check to see how the new unit bid changes the actual total bid while the estimated total bid stays the same.

#< task
# Create new item bids based on above argumentation
baldwin$new_unitbid=c(0,0,562.21929,0,0,0,0)
# Present old and new item bids
select(baldwin,unitbid,new_unitbid)
# Calculate the estimated total bid with the old item bids
sum(baldwin$unitbid*baldwin$estq)
# Calculate the estimated total bid with the new item bids
sum(baldwin$new_unitbid*baldwin$estq)
# we see that they match perfect
# Now lets look at the actual total bid:
# With the old item bids we get
sum(baldwin$unitbid*baldwin$actq)
# With the new item bids we get
sum(baldwin$new_unitbid*baldwin$actq)
# How much did we increase our actual total bid?
sum(baldwin$new_unitbid*baldwin$actq)/sum(baldwin$unitbid*baldwin$actq) -1
# We increases our actual total bid by about 56% while keeping the estimated total bid
#>

We increased our actual total bid by about 56% and the estimated total bid stayed the same. This procedure is called skewing up bids. If you want to know more about this procedure, you can take a look at Athey et al. (2001). They show that bidders in U.S. Forest Service timber auctions can raise profits by skewing their bids upwards on items that are expected to overrun or downwards on items that are expected to underrun. Bidding zero on items as done above may alert the auctioneer. Caltrans has a lot of rules to prevent bidders to systematically skew up their bids. They do not need to accept a low bid if it seems irregular and if an item ran over by more than 25%. In this case Caltrans has the opportunity to renegotiate this item price. Renegotiations can even be done if the item price differs markedly from the estimates.
For our later analysis systematic skewing of bids imposes a lot of problems so we like to prove that this is not the case in our observed data. This will be done in the next chapter.

Exercise 2 -- Skewed Bids

In this chapter we analyse if bidders systematically skew their bids. To do so, we will run a regression. We want to regress a measure of the unit price (the bid per item) on some measure of over-/underrun on items. For the measure of the unit price we choose the actual bid on an item unitbid divided by the estimate CCDBprice of the CCDB unit costs. This variable is called NCunitbid in our data set dat. For the right hand side of our regression we choose the over-/underrun on actual quantities measured in percentage above/below the estimated quantities. Recall that this is just the variable pctover_q. This sums up to a regression with pctover_q as the independent variable and NCunitbid as the dependent variable. The mathematical formulation looks the following way: $$\textrm{NCunitbid}^{(n)}_i = \beta_0 + \beta_1 \cdot \textrm{pctover_q}^{(n)}_i + \varepsilon^{(n)}_i$$ Uppercase $n$ indicates the contract and lowercase $i$ the bidder. $\beta_1$ then captures the skewing of an average bidder on an average contract item. Before we can start with our analysis we need to load the data again.

Task: Read in the data bidtitems.dta as you have done in the chapter before. Store it as dat.

#< task
# Enter your command here
#>
dat = read.dta("biditems.dta")
#< hint
display("Just write: dat=read.dta(\"biditems.dta\") and press check afterwards.")
#>

First up we'll use standard OLS. We could use the lm() function out of base R which you might now, but instead we want to use the felm() function out of lfe. We do so since we can do all regressions needed with this one function. To see how you can do linear regressions with felm() check the info box below.

< info "Linear Regressions with felm()"

The felm() function out of the lfe package can be used to do linear regressions. If you want to regress y on x1 and x2 all stored in the data set dat you can use the following code. This is just a standard linear regression like the one you could do with lm().

library(lfe)
felm(y~x1+x2, data=dat)

If you want to know more about the felm() method you can check here rdocumentation.org/packages/lfe/functions/felm.

>

Task: Run a regression of pctover_q as the independent variable on NCunitbid as the dependent variable. Store the result in a variable OLS. Since this is the first time we perform a regression just uncomment the line and fill in the ... with the right variables.

#< task
# OLS=felm(... ~ ..., data=dat)
#>
OLS=felm(NCunitbid ~ pctover_q, data=dat)
#< hint
display("Your command should look the following way: 
        OLS = felm(NCunitbid ~ pctover_q, data=dat)")
#>

To show summary statistics of regressions we will make use of the function stargazer() from the stargazer package. In the next task you will see how this function looks like.

Task: To show the summary statistics of OLS press ?check?.

#< task
stargazer(OLS, 
            type = "html", 
            style = "aer",  
            digits = 4,
            df = FALSE,
            report = "vct*",
            star.cutoffs = c(0.05, 0.01, 0.001),
            object.names = TRUE,
            model.numbers = FALSE,
            omit.stat = c("adj.rsq", "f", "ser"))
#>


Let's interpret our first regression. By definition of NCunitbid, which is the actual item bid divided by an estimate, and pctover_q, which is the percentage overrun of an item, we know that both of them are measured in percentage points (above/below the estimate). This implies that our regression result is also measured in percentage points. So, what does the regression tell us? The answer is that if a bidder expects some item to run over by one percentage point, he will shade up his bid on that item by about $0.05$ percentage points.

< quiz "Regression output"

question: By how many percentage points will a bidder shade up his bid if he expects an item to run over by 20% points? sc: - 0.5 - 1* - 2

success: Great, your answer is correct! failure: Your answer is incorrect. Try again.

>

To differentiate between winners and losers of an auction we want to perform such a regression. We do this since one could assume that incentives to skew bids are stronger among winners. Thus we need a variable indicating if a bidder has won the auction. In our data set we have the variable winner that equals one if a bidder has won the auction and zero else. To differentiate between winners and looser we can use the following regression model:
$$\textrm{NCunitbid}^{(n)}_i = \beta_0 + \beta_1 \cdot \textrm{pctover_q}^{(n)}_i + \beta_2 \cdot \textrm{winner}^{(n)}_i + \beta_3 \cdot ( \textrm{pctover_q}^{(n)}_i \cdot \textrm{winner}^{(n)}_i) + \varepsilon^{(n)}_i$$ The skewing of an average bidder who lost the auction on an average item is now $\beta_1$. The skewing of an average bidder who won the auction on an average item is now $\beta_1 + \beta_3$.

Task: Perform a regression as described above and store the result as OLS_winner. If you need help you can press the hint button.

#< task
# Enter your command here
#>
OLS_winner=felm(NCunitbid ~ pctover_q+winner+pctover_q*winner, data=dat)
#< hint
display("Your command should look like: 
        OLS_winner=felm(NCunitbid ~ pctover_q+winner+pctover_q*winner, data=dat)")
#>

< award "Regessionmaster Lv. 1"

You performed your first regression on your own.

>

As seen above the stargazer() function has a lot of options. To keep the code simple I wrote a function reg.summary() for you. Just pass your regression object/objects to this function.

Task: Give a summary of the regression above with the reg.summary() command. If you need help you can always click on the hint button.

#< task
# Enter your command here
#>
reg.summary(OLS_winner)
#< hint
display("Your command should look like: 
        reg.summary(OLS_winner)")
#>


How did the results change? A bidder who has lost the auction shaded up his bid by about $0.05$ percentage points for every expected one percentage point overrun of an item. The effect among winners was captured by $\beta_1 + \beta_3 = 0.0505 - 0.0140 = 0.0365$ thus a winning bidder shaded up his bid by about $0.04$ percentage points for every expected one percentage point overrun of an item. This indicates that the problem of skewed bids is even smaller among winners.

As you may have noticed stargazer() and also reg.summary() print in addition to the results the t-values of the regression coefficients. The t-value of pctover_q in OLS is about $1.9$ and no star behind it indicates us that this result is not significant at standard levels. Most econometricians say that a result is significant if its significance level is below $5$%. reg.summary() prints one star if the result is significant at the $5$% level, two stars if it is significant at the $1$% level and three stars if it is significant at the $0.1$% level. To analyse this problem we have to take a closer look at the assumptions on linear regression. This will be done in the next chapter.

Exercise 2.1 -- Skewed Bids Regression

In this chapter we will take a closer look at the regression theory. To do so we set up a example regression model: $$y_i = \beta_0 + \beta_1 \cdot x_i + \varepsilon_i, \; \; i \in {1, ..., n}$$ There are five principal assumptions which justify the use of such a linear regression model for the purposes of inference or prediction which can be found in Kennedy (2008, p.42):

A1: The dependent variable can be written as a linear function of a specific set of independent variables, plus an disturbance term.
A2: The expected value of the disturbance term is zero ($E[\varepsilon_i] = 0$).
A3: Disturbances have uniform variance and are uncorrelated ($Var(\varepsilon_i) = \sigma^2 \; \forall i$ and $Cov(\varepsilon_i , \varepsilon_j) = 0 \; \forall i \neq j$).
A4: Observations on independent variables can be considered fixed in repeated samples.
A5: No exact linear relationship between independent variables and more observations than independent variables.

If you want to know more about those assumptions you can check Kennedy (2008, p. 42), where it is explained in more detail.
Let us now go back to the regression from chapter 2. We assume that our regression satisfies A1, at least in some sense, but what about the others? Checking A2 mathematically is difficult since we do not have information about the disturbance term $\varepsilon_i$. If we use the residuals out of the regression instead we will always get that the expected value is zero due to the mathematics behind the linear squares minimization. A2 is fulfilled if we assume that items ran over or under independent of the bidder winning the auction. These assumptions don't seem too hard and thus we will assume that A2 is fulfilled. A5 is fulfilled as we have only one independent variable but more than $109000$ observations and they do not have an exact linear relationship. A4 also seems to be fulfilled but A3 may not. A3 consists of two parts, uniform variance (homoscedasticity) of the error term and uncorrelated errors (no autocorrelation). We exclude the problem of autocorrelation and assume that our errors are not correlated (at least not strongly correlated). To check homoscedasticity we can plot the independent variable versus the regressions residuals. If the residuals are distributed around zero the same way for all different values of the independent variable, then we have homoscedastic errors. To do so we need to load the data again and perform the same regressions as we did in chapter 2.

Task: To read in the data biditems.dta and perform the OLS regression as in the chapter before press edit and check afterwards.

#< task
dat = read.dta("biditems.dta")
OLS=felm(NCunitbid ~ pctover_q, data=dat)
#>

The below code prints the variable pctover_q versus the regression residuals of our regression OLS.

Task: Press check to get such a plot.

#< task
# Construct a vector containing all entrys of NCunitbid that are not NA.
# This comes from the fact that we do not have estimates prices for all items.
a=which(is.na(dat$NCunitbid)==FALSE)
plot(dat$pctover_q[a],OLS$residuals,xlim=c(-1.5,8.5),ylim=c(-8,8))
#>

It became clear that the residuals are not equally distributed around zero (in special for $x \in [-1,2]$). This indicates us that they do not have uniform variance meaning that A3 is not fulfilled. Thus we may have a problem with heteroskedasticity (that is the absence of homoscedasticity) and need to find a way to fix that. With heteroskedasticity OLS is still unbiased but it is not efficient anymore. Now one could assume that Calstrans systematically over- or underestimates some items. If so we would have a variable contained in our error term that is different for different items. This would mean that the distribution of the error term is different at least for some items and assumption A3 is violated. Just think of a variable indicating systematic over- or underestimation that right now is a part of our error term. To account for that we can use a fixed effects regression where the fixed effect is within an specific item code. To make this clear I will explain it for our regression. Recall that we want to estimate the following model: $$\textrm{NCunitbid}^{(n)}i = \beta_0 + \beta_1 \cdot \textrm{pctover_q}^{(n)}_i + \varepsilon^{(n)}_i$$ But we think that there is a term that differs among items. To describe this systematic over- or underestimation one can use dummy variables for each item $\delta^{(n)}{i,j}$ that equal one if there was a over estimation of the item $j$ in contract $n$ for bidder $i$ and zero if not. Then our regression would look as follows: $$\textrm{NCunitbid}^{(n)}i = \beta_0 + \beta_1 \cdot \textrm{pctover_q}^{(n)}_i + \sum{j \in \textrm{item}} \alpha_j \cdot \delta^{(n)}_{i,j} \varepsilon^{(n)}_i$$ If we performed this regression we would get a lot of coefficients, to be precise we would get two plus the number of unique items coefficients and as we have many different items this would not look nice. Another concern is that for every coefficient we loose one degree of freedom. Another way to perform such a regression is the fixed effects regression. If we use fixed effects for items we will get the same result for $\beta_1$ but we won't get all the other coefficients. Since we are only interested in $\beta_1$ we will perform such a regression. If you want to know more about how the fixed effects regression work in detail you may take a look at Kennedy (2008, chapter 18). A fixed effects regression in R can be done with the felm() function from the lfe package and is explained in the info box below.

< info "felm()"

The felm() function is used for extracting linear group fixed effects. It basically works like the standard lm() function but offers some additional features. I will only explain the functionalities we use here. But there are a lot more so if you want to learn more about it, you can check the description of the lfe package here cran.r-project.org/web/packages/lfe/lfe.pdf. If you want to regress y on x1 and x2 while projecting out the factor fixed_eff and want the standard errors to be clustered by cluster_var all stored in the data set dat you can use the following code. The 0 stands for no instrumental variable. By using instrumental variables you can read more in the above link. If you do not want to compute clustered standard errors, put a $0$ instead of the cluster_var.

felm(y~x1+x2 | fixed_eff | 0 | cluster_var , data=dat)

If you want to perform a regression just with fixed effects, you can simplify the formula to:

felm(y~x1+x2 | fixed_eff, data=dat)

To find out more about the felm() method you can check here rdocumentation.org/packages/lfe/functions/felm

>

Task: Perform a fixed effects regression of the same regression model as in OLS where the fixed effect shall be within an item code. Store your result as FELM.

#< task
# Enter your command here
#>
FELM=felm(NCunitbid ~ pctover_q | itemcode, data=dat)
#< hint
display("Your command should look like: 
        FELM=felm(NCunitbid ~ pctover_q | itemcode, data=dat)")
#>

< award "Regessionmaster Lv. 2"

You performed your first fixed effects regression on your own.

>

Task: Plot again the independent variable versus the regression residuals but this time for the regression FELM. Press check to see the plot.

#< task
# Construct a vector containing all entrys of NCunitbid that are not NA.
# This comes from the fact that we do not have estimates prices for all items.
a=which(is.na(dat$NCunitbid)==FALSE)
plot(dat$pctover_q[a],FELM$residuals,xlim=c(-1.5,8.5),ylim=c(-8,8))
#>

The residuals are now more equally distributed around zero indicating that we have solved one problem concerning the errors. But as you can see the distribution changes with pctover_q so there are still some things influencing the error term that we did not account for. Let us now compare OLS with FELM. This can be done with the reg.summary() command. Just give both regression objects to this function and separate them with a comma.

Task: Print a summary of the regressions OLS and FELM.

#< task
# Enter your command here
#>
reg.summary(OLS, FELM)
#< hint
display("Your command should look like: 
        reg.summary(OLS, FELM)")
#>


The coefficient of pctover_q stayed nearly the same but the t-value has increased slightly. We have now results that are significant at the $5$% level. But there is one last concern. It seems reasonable to assume that the pricing strategy a bidder uses for one specific item is related to all other items within one contract (auction). This would make the standard errors inaccurate so we need clustered standard errors where we cluster by contract and bidder. This grouping variable is called grp and stored in our data frame dat. To make this clear we then have a group for every bidder in every auction. So one group contains the item bids of one specific bidder in one auction.

Task: Run a regression similar to FELM. Store the result in a variable FELM1. Use the felm() function with itemcode as fixed effect and clustered standard errors as described above. If you do not remember hot to do so, just take a look at the felm() info box above.

#< task
# Enter your command here.
#>
FELM1=felm(NCunitbid ~ pctover_q | itemcode | 0 | grp,data=dat)
#< hint
display("Your command should look like: 
        FELM1=felm(NCunitbid ~ pctover_q | itemcode | 0 | grp,data=dat)")
#>

< award "Regessionmaster Lv. 3"

Now you know how to properly use clustered standard errors.

>

Now we can compare the three regression OLS, FELM and FELM1.

Task: Show the summary statistics of OLS, FELM and FELM1. Use the reg.summary() command.

#< task
# Enter your command here.
#>
reg.summary(OLS, FELM, FELM1)
#< hint
display("Your command should have the following form: 
        reg.summary(regreesion1, regression2, regression3)")
#>


Finally we can analyse the regression outputs together. Recalling the construction of NCunitbid and pctover_q we notice that both NCunitbid and pctover_q are measured in percentage points which means that our result is measured in percentage points. The intercept has no useful meaning here so we'll just interpret the coefficient on pctover_q for both regressions. Standard OLS OLS yields an coefficient of $0.0465$ on pctover_q with a t-value of about $1.9$ meaning that the result is not significant. If we allow for heteroskedasticity within an item code with the regression FELM the coefficient on pctover_q is $0.0535$ with a t-value of about $2.2$. This is significant at the $5$% level. Both estimates are close so the fixed effects do not add much explanatory power to the regression. For FELM1 we get the same coefficient as for FELM but the t-value is now $3.9$ meaning that the result is significant at the $0.1$% level.
What can we infer from this? Since all results are around $0.05$ we can now say that if a bidder expects a $1$% point overrun on some item, he will shade up his bid by about $0.05$% points. To see this more clear, if a bidder expects a $100$% point overrun on some item, that is the quantity of this item is actually doubled, he will shade up his bid by $5$% points. This seems like a small amount keeping in mind the example from above where the item 198007 ran over by about $56$% points and we increases our bid on that item by $2811$% percentage points. Together with the $R^2$ (coefficient of determination) below $0.05$ this suggests that incentives to skew are a minor determinant of the observed bids and thus we do net need to worry about skewed bids.

< info "Coefficient of Determination"

The coefficient of determination is a statistical measure of how close the data are to the fitted regression line. It is the percentage of the response variable variation that is explained by a linear model (thus $R^2 \in [0,1]$). For the regression model from the beginning $$y_i = \beta_0 + \beta_1 \cdot x_i + \varepsilon_i, \; \; i \in {1, ..., n}$$ the $R^2$ is defined in the following way: $$R^2 = \frac{\sum_{i=1}^n (\hat{y}i - \bar{y})}{\sum{i=1}^n (y_i - \bar{y})} = \frac{\textrm{explained variation}}{\textrm{total variation}}$$ Where $\hat{y}i$ are the predicted values of the regression and $\bar{y} =\frac{1}{n} \cdot \sum{i=1}^n y_i$. If a regression yields a low $R^2$, we talk about a poor model fit. If a regression yields a high $R^2$, we talk about a good model fit. But one should not only rely on the $R^2$ sinceit does not indicate whether a regression model is adequate. You can have a low $R^2$ for a good model, or a high $R^2$ for a model that does not fit the data.

If you want to find out more about the $R^2$, I suggest you to take a look at Kennedy (2008, p. 13-14 and p. 26-28).

>

Now let's do a last regression with options as in FELM1 but this time we want to get the effects for winners and losers separately.

Task: Recalling what we have done in chapter 2 for OLS_winner you need to add fixed effects and clustered standard errors as in FELM1 to perform a regression as described above. Store the result as FELM1_winner.

#< task
OLS_winner=felm(NCunitbid ~ pctover_q+winner+pctover_q*winner, data=dat)
# Enter your command here.
#>
OLS_winner=felm(NCunitbid ~ pctover_q+winner+pctover_q*winner, data=dat)
FELM1_winner = felm(NCunitbid ~ pctover_q+winner+pctover_q*winner | itemcode | 0 | grp, data=dat)
#< hint
display("Your command should look like: 
        FELM1_winner=felm((NCunitbid ~ pctover_q+winner+pctover_q*winner | itemcode | 0 | grp,data=dat)")
#>

As a last step we want to compare OLS_winner and FELM1_winner.

Task: Show the summary statistics OLS_winner and FELM1_winner.

#< task
# Enter your command here.
#>
reg.summary(OLS_winner, FELM1_winner)
#< hint
display("Your command should have the following form: 
        reg.summary(regression1, regression2)")
#>


The coefficients stayed nearly the same giving the same interpretation as in OLS_winner from chapter 2 but the significance has increased. The coefficient on pctover_q is now significant at the $0.1$% level. The significance of the coefficient on pctover_q times winner increased but is still not significant. In the end we can say that we can differentiate between winners and losers but the not significant results suggest that the we do not need to.

Before we move on to the next chapter let us recall what we have learned so far. We are looking at public procurement auctions in California where Caltrans auctions highway construction projects. Caltrans splits the whole project into items and potential bidders can submit bids per item. The estimated total bid, which is the sum over item bids times estimated quantities, is the score in the auction. We have learned that overruns and underruns on items are prevalent here and thus the itemized bids are a good tool to avoid renegotiation ex post. The incentives to skew up bids don't seem to be a major determinant of the bids. If we distinguish between winners and looser of an auction, we find that the incentive to skew up bids are smaller for winners but the results for winners are not statistically significant.

Exercise 3 -- Characteristics Influencing the Bids

In this chapter we will shortly go over characteristics influencing a firms bid and explain what we will do in the next chapters. The main focus of this problem set is to examine adaption costs in the end. To perform such an analysis we need as many characteristics influencing the bids as possible. Chapter 3.1 will explain how we can transform the bids in such a way that we get a measure of a bidders markup (even a normalized one). If we can measure a bidders markup, we can construct additional measure to capture characteristics that influence a bidders markup. We do not stick to the bid itself since if we use the markup over costs we do not need to think about variables indicating the actual costs of installing the project. As in any auction model markups are a function of private information and local market power.
Recalling that we talk about firms doing roadwork the distance to the project may be a good measure of a firms costs, which influences the markup. This and others measures of costs will be introduced in chapter 3.2. Note that a firm's costs are private information. Market power will also influence the markup. One measure of market power will be the number of bidders in an auction. In chapter 3.3 we will introduce this and other measures of market power.
If you want to go faster through this problem set, you can leave out the chapters 3.1, 3.2 and 3.3. Chapter 3.4 contains a brief review of these chapters.

Exercise 3.1 -- The Markup Measure

This chapter will present a good measure of a bidders markup. While we are doing that we'll introduce a second data set bidders.dta containing bid-level data. It means that we now have one observation for each bidder on each contract. Thus it does not contain the item bids as our data set from chapter 1, 2 and 2.1. The time of observation is exactly the same so we do have the same auctions, bidders and so on.

Task: Read in the data bidders.dta. To do so just press edit and check afterwards.

#< task
dat=read.dta("bidders.dta")
#>

As in the chapters before we want to create an example to explain what is in the data. We use the same contract as before.

Task: Press check to get the data about the same auction as in the previous chapter.

#< task
example = filter(dat, contract == "02-356604")
#>

Since this data set contains additional information we use some different notation here. The total bid, that is the sum over all items of the item bid times the estimated quantity, that was called bidtotal_est is called bidtotal now. The actual total bid that was called bidtotal_act stays the same. contract, bidrank, winner, bidderid and bidder are as before. The estimated total bid, that is the sum over all items of the item price estimate times the estimated quantity, is called engestimate. Note that this variable is new but it is made up the following way (where the variables used to construct it were discussed in chapter 1): $$\textrm{engestimate} = \sum_{i \in item} \textrm{CCDBprice} \cdot \textrm{est_q}$$
If we had information about the costs a bidder has related to his total bid, we could compute the markup by subtracting those costs from the total bid. Since we do not have exact information about the costs a bidder has related to his total bid we need to construct the markup in a different way. Willing to assume that the estimate of the total bid made by Caltrans reflects the fair value of the whole contract we could compute a bidders markup the following way: $\textrm{markup} = \frac{\textrm{bidtotal}}{\textrm{engestimate}}$. In our case the fair value can be interpreted as a cost estimate for the costs of a bidders total bid. The calculus above is already done, the result is stored in the variable normalized_bid. This name comes from the fact that in addition to our use as a markup over estimated costs on can use it as the normalized total bid.

Task: Use the select() command, as done in chapter 1, to show the columns bidder, bidtotal, engestimate and normalized_bid of our example data example.

#< task
# Enter your command here
#>
select(example, bidder, bidtotal, engestimate, normalized_bid)
#< hint
display("Your command should look as follows: 
        select(example, ..., ..., ..., ...)")
#>

How to interpret normalized_bid? It is constructed as the total bid divided by an estimate of the fair value for a whole contract so we need to interpret it in the following way. A bidders markup over the estimated fair value of a contract in percentage is $(\textrm{normalized_bid} -1) \cdot 100$.

< quiz "markup1"

question: How much percentage points is the markup of ARCADIAN ENTERPRISES above the estimated fair value ? answer: 25.6953 roundto: 0.01

>

To show that this actually captures the markup as described, we can compute the mean over all contracts and bidders.

Task: Compute the mean of normalized_bid over all contracts and bidders. Use the mean() function of base R.

#< task
# Enter your command here
#>
mean(dat$normalized_bid)
#< hint
display("Your command should look as follows: 
        mean(dat$...)")
#>

The markup over the estimated fair value across all bidders and contracts is about $4.4$ percent. As stated by BHT the highway construction industry has a competitive nature and the publicly traded firms within our data set report profit margins less than $3$ percent. Thus our implicit average markup of about $4.4$ percent seems valid. With those results we can conclude that we have found a good measure of a bidders markup. The variable normalized_bid contains this measure and can be interpreted as the markup over the estimated fair value or over the costs of a bidders total bid.

Exercise 3.2 -- Cost Measures

In this chapter we will discuss measures of a firms cost. You may ask yourself why we need measures of costs. The costs of a firm are typically not observable by others than the firm it self. Assume there are two firms competing for one job. If one firm knows the costs of its rival, it can use it to better forecast its rivals bid. Then, if its own costs are below its rivals bid, a firm can bid one dollar below the bid of its rival. In our data set the measures of costs are the distance of a bidder to the project, his utilization rate and his size.
As in the chapters before we need to load the data first.

Task: Read in the data bidders.dta and create the example data of contract "02-356604". Again you just need to press edit and check afterwards.

#< task
dat=read.dta("bidders.dta")
example = filter(dat, contract == "02-356604")
#>

The Distance to the Project as a Measure of a Firms Costs

Since we are talking about highway construction firms one part of their costs are transportation costs. They need to transport material, employees and machinery to the project. The information about where a company is located as well as where the job is to be done is publicly observable and we can construct a measure of costs by measuring the distance of a bidder to the project. The information about the location of a bidder is stored in address, the location of the project in location. The measure which is constructed from this information as described above is called distance. This variable measures the distance in miles.

Task: Use the select() command to give out the variables bidder, address, location and distance for our example data set.

#< task
# Enter your command here
#>
select(example, bidder, address, location, distance)
#< hint
display("Your command should look as follows: 
        select(example, ..., ..., ..., ...)")
#>

This is the first output of a data frame that is wider than the output window. In order to see all information use your mouse to move the bar below the output.

Task: You can enter whatever you think is needed to solve the following quiz.

#< task_notest
# Enter your command here
#>
#< hint
display("You should make us of the data set dat and the function max() to get the maximal distance.  
        One way would be to type max(dat$distance). If you want it to be more elegant
        you may use max(select(dat, distance)).")
#>

< quiz "distance"

question: What was the maximal distance in miles a bidder had to the job site over all bidders and contracts? answer: 2857 roundto: 1

>

The reading of the location is not that easy so I will explain it on the example of BALDWIN CONTRACTING COMPANY INC. Inside the location variable we find IN PLUMAS COUNTY AT VARIOUS LOCATIONS - 02-PLU-89-18.8/42.2; 02-PLU-36-0.0/4.0. It tells us that the job was in Plumas county at different locations. The first location was highway 89 mile marker 18.8 till 42.2 (02-PLU-89-18.8/42.2, the 02 up front is the district with in California). The second was highway 36 mile marker 0 till 4 (02-PLU-36-0.0/4.0).
The construction of distance that measures the distance of a bidder to the job site is not as easy as one might think. How this is done in detail, especially if there is more than one location. But this question is not what we mainly focus on in this paper. So it is enough to keep these data in the back of our minds, knowing that they exist. If you like to know how it is constructed take a look at BHT.

< quiz "distance2"

question: Which impact on the bids do you think the distance to the project has? sc: - With increasing distance to the project a bidder will lower his bid - With increasing distance to the project a bidder will increase his bid* success: Great, your answer is correct! failure: Try again.

>

< award "Quizmaster Lv. 2"

You solved the quizzes finding out that first, some bidder was nearly $3000$ miles away from the project and second, the distance to the project increases the bid.

>

Task: If you are interested in how the example mentioned above looks like press check. I have added the other two bidders as well. Note that you can click an icon on the map to get information about what it stands for. The red marker stands for the location of the project, the blue markers for the three bidders.

#< task_notest
present.map()
#>


Our assumption is that a bidder who is far away from the project has higher transportation costs and thus submits a higher bid. To proof this assumptions we can perform a regression of the normalized bid normalized_bid from chapter 3.1 on the distance to the project. Since the impact of distance is assumed to be little we have another variable dist100 that measures the distance of a bidder to the project in 100 miles. This one will be used for our regression.

Task: Perform a linear regression as described above with the felm() function. Store the result as OLS_dist.

#< task
# Enter your command here
#>
OLS_dist = felm(normalized_bid ~ dist100, data = dat)
#< hint
display("Your command should look as follows: 
        OLS_dist = felm(... ~ ..., data = dat)")
#>

Task Show the summary statistics of OLS_dist.

#< task
# Enter your command here.
#>
reg.summary(OLS_dist)
#< hint
display("Your command should have the following form: 
        reg.summary(OLS_dist)")
#>


The interpretation is quite easy. A bidder who is adjacent to the project submits a bid that is about $103.55$% of the estimated fair value. With every $100$ miles a bidder is further away he will increase his bid by about $0.95$% of the estimated fair value. The results are significant at the $1$% level (the intercept at the $0.1$% level). We can now assume that dist100 or distance controls transportation costs and thus affects the total bid.

The Utilization Rate of a Firm as a Measure of its Costs

Another measure of costs related to the free capacity of a firm is its utilization rate util. To create this BHT used two other measures, backlog_allyears and capacity. We assume that work proceeds at a constant pace over the whole project time. Then backlog_allyears is the remaining dollar value of all projects won but not yet completed by the time a new bid is submitted. capacity is defined as the maximum backlog observed for a firm over all observations. Thus we assume that they worked at some point of time (during our time of observation) at their capacity constrained. With those two variables we can define a firm's utilization rate the following way: $$\textrm{util} = \frac{\textrm{backlog_allyears}}{\textrm{capacity}}$$ If a firm never won an auction all three variables are set to zero. If you are interested how those variables are computed you can find out more at BHT in chapter three.

Task: For our example data give out bidder, bidtotal, backlog_allyears, capacity and util

#< task
# Enter your command here
#>
select(example, bidder, bidtotal, backlog_allyears, capacity, util)
#< hint
display("Your command should look as follows: 
        select(example, ..., ..., ..., ..., ...)")
#>

At the time of bid the utilization rate of BALDWIN CONTRACTING COMPANY INC was about $10$%. This indicates that they won at least one auction during our time of observation.
We assume that a company with a lower utilization rate has more free capacity and thus lowers its bid. We want to prove that util has this impact on the bids. As before we can use a regression for that.

Task: Show if the utilization rate of a bidder has an impact on his normalized bid with a regression. Store the resulting regression in OLS_util and use reg.summary() to print the summary statistics. Since you are asked to enter two commands here there are two hints. It is only possible to access the second hint after typing in the first command correctly and checking it.

#< task
# Enter your command here
#>
OLS_util = felm(normalized_bid ~ util, data = dat)
#< hint
display("Your first command should look the following: 
        OLS_util = felm(normalized_bid ~ util, data = dat) ")
#>
reg.summary(OLS_util)
#< hint
display("After your first command you shoul call the following:
        reg.summary(OLS_util)")
#>


A bidder with a utilization rate of zero will submit a bid that is about $104.4$% of the estimated fair value. With an increase of one percentage point in his utilization rate he will increase his bid by about $0.0046$ percentage points. This is exactly what we have expected but the result is not statistically significant. But we may be able to improve. In the regression before, where we used the distance of a bidder to the project, we had no problem. This was due to the fact that the impact of distance or dist100 on the bids is the same for every bidder. Even if that might not be perfectly true one can assume that it is most likely true. The impact of util may differ among bidders, just think of a company used to increase/lower the amount of employees for a certain demand. We can think of company's that have a structure allowing to adjust the amount of employees more easily to the demand than others. This would imply that they are not equally affected by their utilization rate. Thus if we want to get more precise results we should account for that. As in chapter 2.1 we can do so by using fixed effects where this time the fixed effect is among each bidder.

Task: Perform a fixed effects regression as described above and store the result as FELM_util. If you want to use fixed effects for the bidders, you can use their identification number bidderid. If you do not remember how to do so, you may take a look at the hint.

#< task
# Enter your command here
#>
FELM_util = felm(normalized_bid ~ util | bidderid, data = dat)
#< hint
display("Your command should look as follows: 
        FELM_util = felm(... ~ ... | bidderid, data = dat)")
#>

Task: Present summary statistics for the regressions OLS_util and FELM_util.

#< task
# Enter your command here
#>
reg.summary(OLS_util, FELM_util)
#< hint
display("Your command should look as follows: 
          reg.summary(OLS_util, FELM_util)")
#>


As you can see the coefficient on util is now $0.041$ which is about ten times as much as before and our results are now significant at the $5$% level. Summarizing the two regressions yields util as a measure of costs. If it is a good measure or not is open to the reader but BHT use it and thus we will do the same.

The Size of a Firm as a Measure of its Costs

The last measure in this chapter measures the size of a firm. We define fringe as a variable that equals one if a company has a market share below one percent and equals zero if not. You may ask yourself why we do not use the market share of a firm. The market share of a firm is in our case just the dollar value of all contracts won by a firm within our time of observation divided by the dollar value of the sum of all contracts. Thus it does not represent the real world market share. With this variable we can differentiate between fringe and non-fringe firms. Our assumption is that a bigger company has lower costs than a small one. Thus we assume that a fringe firm submits a higher bid. To prove that we perform a regression.

Task: Perform a regression of the normalized bid on fringe. Store the result as OLS_fringe.

#< task
# Enter your command here
#>
OLS_fringe = felm(normalized_bid ~ fringe, data = dat)
#< hint
display("Your command should look as follows: 
        OLS_fringe = felm(... ~ ..., data = dat)")
#>

Task: Present summary statistics for the regression from above.

#< task
# Enter your command here
#>
reg.summary(OLS_fringe)
#< hint
display("Your command should look as follows: 
          reg.summary(...)")
#>


We found out that the coefficient on fringe is about $2.4$ percent. This result implies that a fringe firms bid is about $2.4$ percent of the estimated fair value higher than a non-fringe one. The result is significant at the $1$% level. Thus we were right with our assumption that a bigger company may have lower costs and thus can submit a lower bid.
In this chapter we have found three measures of costs that we can use for our estimation of adaption costs later.
Now there is one last quiz in this chapter. To solve it you may use the code block below.

Task: You can enter whatever you think is needed to solve the following quiz. If you get stuck, you can always get a hint with the hint button.

#< task_notest
# Enter your command here
#>
#< hint
display("All information needed can be extracted with the following command:
        select(example, bidder, fringe, util, dist100, winner).
        Recall that util was set to zero if a compnay never won an auction during our time of observation")
#>

< quiz "last1.5"

question: Check all right statements mc: - BALDWIN CONTRACTING COMPANY INC is a fringe firm - BALDWIN CONTRACTING COMPANY INC is a non-fringe firm - The company who won the auction had the shortest distance to the project - The company who won the auction did not have the shortest distance to the project - The utilization rate of the winning firm was below 20%* success: Great, all answers are correct! failure: Not all answers correct. Try again.

>

Exercise 3.3 -- Measures of Market Power

In this chapter we are going to discuss measures reflecting the market power of a firm. Thus we are looking for variables that affect the market power of a firm. In chapter 3.2 we have seen the variables dist100 and util that capture the distance to the project and the utilization rate. We will construct similar measures as those two, but this time for market power. Those two measures will reflect attributes of the rivals. First we need to load the data.

Task: To load the data set bidders.dta and create our example data press edit and check afterwards.

#< task
dat=read.dta("bidders.dta")
example = filter(dat, contract == "02-356604")
#>

Rivals Distance to the Project as a Measure of Market Power

With dist100 we can construct another measure rivaldist100 that captures the minimal distance to the project among all competitors for each bidder. As dist100 it measures the distance in 100 miles. To make this clear take a look at our example data set.

Task: Give out the variables bidder, dist100 and rivaldist100 for our example data set.

#< task
# Enter your command here
#>
select(example, bidder, dist100, rivaldist100)
#< hint
display("Your should use the select() fuction for this task.")
#>

Note that for all bidders except the closest bidder rivaldist100 is just dist100 of the closest bidder. We assume that a company gains market power if rivaldist100 increases and thus can increase its own bid. As in chapter 3.2 we are going to prove this with a regression.

Task: Perform a regression to show the impact of rivaldist100 on the normalized bid. Store your result as OLS_rivaldist and give out the summary statistics. Note that as you are asked to enter two commands here you can access the second hint only if you have entered the first command correct and pressed check. The first hint works normal. Keep this in mind for further tasks.

#< task
# Enter your commands here
#>
OLS_rivaldist = felm(normalized_bid ~ rivaldist100, data=dat)
#< hint
display("Your command should look as follows: 
          OLS_rivaldist = felm(... ~ ..., data = dat)")
#>
reg.summary(OLS_rivaldist)
#< hint
display("Your commands should look the following:
          reg.summary(OLS_rivaldist)")
#>


We find a coefficient of $3.7$% on rivaldist100 telling us that a bidder will increase its own bid by $3.7$% of the estimate if its closest rival's distance increases by 100 miles. The result is significant at the $0.1$% level. Thus we have found our first measure of market power and our hypothesis about the impact is satisfied.

Rivals Utilization Rate as a Measure of Market Power

We can use the information from util to construct a measure that captures the minimal utilization rate among all rivals in the same fashion. This is called rivalutil in our data set dat. Our assumption is that increasing rivals utilization rate increases a bidders market power and hence he can increase his own bid. We will create a regression again to show that this thesis is true. As we have seen in chapter 3.2 the impact of the utilization rate may differ among firms and thus we will use a fixed effects regression as in chapter 3.2. We assume that the effect of rivalutil is constant among contracts for every bidder, thus we use firm fixed effects as done before.

Task: Recalling that bidderid gives us a unique number for every firm, perform a regression as described above and store the result in FELM_rivalutil. Show the summary statistics of this regression afterwards.

#< task
# Enter your commands here
#>
FELM_rivalutil = felm(normalized_bid ~ rivalutil | bidderid, data=dat)
#< hint
display("Your first commands should look the following: 
          FELM_rivalutil = felm(... ~ ... | ..., data = dat))")
#>
reg.summary(FELM_rivalutil)
#< hint
display("Your second commands should look the following:
        reg.summary(FELM_rivalutil")
#>


We find a coefficient on rivalutil of about $0.014$ meaning that a bidder will increase its bid by $0.014$% points of the estimate assuming the utilization rate of its closest rival increases by one percentage point. The result is not significant and we need to talk about that. There is probably nothing wrong with our regression and the way we calculated the standard errors. One possible explanation why the coeffcient is not significant is the following. There are many company's with a utilization rate of zero and hence rivalutil is zero for many bidders in many auctions. To be precise $3249$ of the $3661$ observations of rivalutil are zero. The question that remains is if we still trust this measure of market power. Even if we are not sure we can still use it due to the fact that BHT have done so.

The Number of Bidders as a Measure of Market Power

There is one last measure of market power we want to use. It seems naturally to assume that a bidder, who is aware of how many firms submit a bid at an auction, has more market power if there are less bidders. We have the number of bidders stored in the variable nbidders and thus can use it as a measure of market power. Our assumption is that a bidder needs to decrease his own bid with more competitors.

Task: Press check to get a first insight if our hypothesis is true. The functions used here are all from the dplyr package. If you like to know more about them you can check here: cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html.

#< task
arrange(summarize(group_by(dat,nbidders), mean(normalized_bid)))
#>

We present the average normalized bid for all different numbers of bidders per auction. To do so we use the group_by() command to group our data frame by the number of bidders. The summarize() command computes the mean of a the column normalized_bid for every group of our grouped data frame. In addition it prints out the groups. arrange() orders our output such that the number of bidders increases in every row.

< quiz "nbidders"

question: By roughly how much percent do you think, based on the table above, a bidder decreases his own bid if the number of bidders increases by one? mc: - 1%* - 10% - 100% success: Great, all answers are correct! failure: Not all answers correct. Try again.

>

We want to perform a regression to prove our intention on how the normalized bid is influenced by the number of bidders.

Task: Perform a regression to show the impact of the number of bidders on the normalized bids. Store the result as OLS_nbidders and give out the summary statistics.

#< task
# Enter your commands here
#>
OLS_nbidders = felm(normalized_bid ~ nbidders, data=dat)
#< hint
display("Your command should look as follows: 
          OLS_nbidders = felm(... ~ ..., data = dat)")
#>
reg.summary(OLS_nbidders)
#< hint
display("Your commands should look the following: 
          reg.summary(...)")
#>


We find out that a bidder will lower his bid by about $1.3$% for every additional bidder. This time the result is significant at the $0.1$% level and we can be sure that our findings are correct. In the next chapter we will perform reduced form estimates to find out what explains the total bid best. We will use all measures from chapter 3.2 and chapter 3.3 to explain the markup from chapter 3.1.

Exercise 3.4 -- Reduced Form Estimates of the Bids

In this section we'll perform reduced form regressions to figure out what explains the total bid best. In a first step we will show that the normalization of the total bid, which we have done in chapter 3.1 to find a measure of a bidders markup, also controls for bidders costs of installing the project. Then we will use the measures from chapter 3.2 and 3.3 to explain this normalized bid/markup.
As in the chapters before we need to load the data first.

Task: To load the data set bidders.dta press edit and check afterwards.

#< task
dat=read.dta("bidders.dta")
#>

The following section shortly presents the measures we have seen in chapter 3.2 and 3.3. There are four terms we use to control for firm $i$'s costs:

1) The engineers cost estimate engestimate, which is the product of blue book prices (estimated prices of every item) and estimated quantities. To confirm that it is a good control we perform a regression of the total bid on engestimate. The total bid is stored in the variable bidtotal. In later chapters we will refer to this as a bidders score during the auction.

Task: Perform a regression of bidtotal on engestimate and store it in the variable OLS_bid. Use the function felm() and give out the summary statistics with the reg.summary() command. As you are asked to give two commands keep in mind that there are two hints. You can only access the second if you entered the first command right and pressed check.

#< task
# Enter your commands here.
#>
OLS_bid=felm(bidtotal ~ engestimate, data = dat)
#< hint
display("Your first commands should look the following:
        OLS_bid=felm(... ~ ..., data = dat)")
#>
reg.summary(OLS_bid)
#< hint
display("The second commands is just:
        reg.summary(OLS_bid)")
#>


This regression yields an coefficient of 1.039, an $R^2$ of 0.982 which is high, and the whole regression is significant at the $0.1$% level, meaning that we have an excellent control variable. In chapter 3.1 we discussed the normalized total bid as a measure of a bidders markup. Recalling that $\textrm{normalized_bid} = \frac{\textrm{bidtotal}}{\textrm{engestimate}}$ and what we showed in chapter 3.1 we can use the normalized bid as a measure of an bidders markup while controlling for one part of his costs.

2) Firm $i$'s distance to the job site $\textrm{dist100}^{(n)}_i$, which in our data set is called dist100, will influence transportation costs and thus is used as a control for costs. The $i$ stands for the firm and the $n$ for the auction/project.

3) A good measure of firm $i$'s free capacity is its utilization rate $\textrm{util}^{(n)}_i$ which is the ratio of backlog to capacity and thus influences the costs. Just think of costs of capacity (with increasing capacity a bidder faces higher costs to complete the job). Recall that this measure was stored as util in our data set.

4) Fringe and non fringe firms are different thus we allow firms to differ by size if we include $\textrm{fringe}_i$ which measures if a firm is a fringe or non fringe one. This was stored as fringe. Note that fringe has no $n$ since a firm is fringe or not for all auctions in our data set.

In chapter 3.2 we showed that 2), 3) and 4) are good measures of the costs of a company. In chapter 3.1 we showed that the normalized bid is a good measure of a bidders markup since the estimate reflects a fair value that can be interpreted as a bidders costs. You may now ask yourself why we need 1). This will become clear after we have discussed the three controls for market power that we have seen in chapter 3.3:

1) $\textrm{rivaldist100}^{(n)}_i$ measures the distance of $i$'s closest competitor to the job site. Bidder $i$'s market power is assumed to increase if $\textrm{rivaldist100}^{(n)}_i$ increases. Note that this was stored as rivaldist100.

2) $\textrm{rivalutil}^{(n)}_i$ is $i$'s closest rivals utilization rate. If $\textrm{rivalutil}^{(n)}_i$ increases, bidder $i$'s market power will increase. This was stored as rivalutil.

3) $nbidders^{(n)}$ measures the number of bidders for contract $n$. If it increases we assume that $i$'s market power goes down. The number of bidders was stored as nbidders. Note that this variable is the same for all bidders in one auction thus it has no subscript $i$.

Recall that in chapter 3.3 we showed that 1), 2) and 3) are good measures of the market power of a firm. Now we have six covariates that we want to include in our regression. Notice that all of them are proportional to the engineers cost estimate engestimate. This is easily explained as one would expect that increasing $i$'s distance by 100 miles will raise his bids more on a five million dollars contract than on one with just $50$ thousand dollars. A natural assumption is that the variance of the error term of our regression model is also proportional to the engineers cost estimate. With these two arguments we can increase the efficiency of our regression estimation by dividing the total bid by the engineers cost estimate and thus use the normalized bid for our left hand side of the regression. Note that this is exactly the measure of a bidders markup that we have discussed earlier. This controls for heteroskedasticity related to project size as it normalizes the bids. With all this information we can build up our regression model to find out the impact of our covariates on the bids. The regression model then sums up to the following:

$$\begin{eqnarray} \textrm{normalized_bid}^{(n)}_i = \alpha_0 & + & \alpha_2 \cdot \textrm{dist100}^{(n)}_i + \alpha_3 \cdot \textrm{util}^{(n)}_i + \alpha_4 \cdot \textrm{fringe}_i \\ & + & \alpha_5 \cdot \textrm{rivaldist100}^{(n)}_i + \alpha_6 \cdot \textrm{rivalutil}^{(n)}_i \\ & + & \alpha_7 \cdot \textrm{nbidders}^{(n)} + \varepsilon^{(n)}_i \end{eqnarray}$$

We use the subscript $i$ to refer to a specific firm and the superscript $n$ to refer to a specific auction/project. Let us now start by using standard OLS to perform a regression of the model above. To do so you can use the felm() function that we always used, but this time please use the lm() function from base R. For standard OLS regression it works exactly the same as felm().

Task: Perform a regression of the above model with the lm() function. Store your result as OLS. If you do not know how to do so you may take a look at the hint.

#< task
# Enter your command here.
#>
OLS=lm(normalized_bid ~ dist100 + rivaldist100 + util + rivalutil + fringe + nbidders, data = dat)
#< hint
    display("Your regression should look like OLS = lm(... ~ ... + ... + ... + ... + ... + ..., data=dat), where the ... stands for the variable you need to fill in.")
#>

Before showing summary statistics of this regression we want to make use of another function that presents the results in a different form. There is a package regtools that contains a function effectplot() which can be used to show the effects of the different explanatory variables of a regression as a bar plot. The info box below explains how you can use this function.

< info "effectplot"

The function effectplot() is part of the regtools package. It can be used to show the influence of the different independent variables of a regression. The first parameter is an object of regression, such as an object received by a lm formula. The second input is the underlying data set, which should be in the form of a data frame. The last parameter is of type string and contains the quantiles, which are plugged into the probability function. It helps to compare the magnitudes of the influence of different explanatory variables. The default effect is "10-90", i.e. the effect of when changing an explanatory variable from its $10$% quantile to its $90$%quantile.

If you have performed a regression of y on x1, x2 and x3, stored as OLS, and you want to compare the impact of the three explanatory variables when changing from the $10$% quantile to the $90$% quantile you can do so with the following command:

library(regtools)
effectplot(OLS)

If you want to see the effect for the change from the $25$% quantile to the $75$% quantile and print out confidence intervals, you can use the following code:

library(regtools)
effectplot(OLS, numeric.effect = "25-75", show.ci=TRUE) 

>

Task: Use effectplot() to show the impact of the explanatory variables in OLS for a change of the explanatory variables from the $10$% quantile to the $90$% quantile (this is the default so you do not need to enter different quantiles). In addition we like to get the confidence intervals.

#< task
# Enter your command here.
#>
effectplot(OLS, show.ci = TRUE)
#< hint
    display("Your command should look as follows: 
            effectplot(OLS, show.ci = TRUE).")
#>

< award "Effect Plotter"

You found out how big the different effects on the normalized total bid are.

>

You may ask yourself what the advantage of such a plot over the standard summary statistics of a regression is. We compare the impact of the explanatory variables on the response variable for a change from the $10$% quantile of the explanatory variable to the $90$% quantile. I will explain it for the number of bidders nbidders. The plot shows how the normalized bid normalized_bid changes if the number of bidders in an auction increases from $3$ (which is the $10$% quantile of nbidders) to $9$ (which is the $90$% quantile of nbidders). Here the effect is that a bidder will lower his bid by about $9$% to account for the increased number of bidders. Notice that below nbidders you find the value of the $10$%, $50$% and $90$% quantile (here $3$, $5$ and $9$). This makes the effect of different explanatory variables easier to compare. In our plot we see that the biggest impact results from nbidders and the smallest from rivals utilization rate. Notice that all confidence intervals seem to be ok except for the one on util. As we will see below this can be interpreted as util not being statistically significant. Now that we have seen such a plot we are ready to present summary statistics.

Task: Compute summary statistics of OLS.

#< task
# Enter your commands here.
#>
reg.summary(OLS)
#< hint
display("Your commands should look the following:
        reg.summary(...)")
#>


Let us discuss the measures of costs first. dist100, util and fringe have the expected positive sign indicating that a bidder will increase his bid if he is further away, his utilization rate goes up and if he is a fringe firm. dist100 is significant at the $1$% level, util is not significant and fringe is significant at the $0.1$% level. If a bidders distance to the project increases by $100$ miles he will increase his bid by about $0.9$% points of the estimate. If his utilization rate increases by one percentage point he will increase his bid by about $0.015$% points but keep in mind that these results are not significant. A fringe firm will bid about $4.8$% more than a non fringe one.

Now we take a look at the measures of market power. rivaldist100 and nbidders have the expected negative sign while rivalutil does not. We just keep it as BHT do so. But you will see at the end of this chapter that if we allow for firm fixed effects we get the expected positive sign. rivaldist100 and rivalutil are significant at the $1$% level, nbidders is significant at the $0.1$% level. If rivals distance increases by $100$ miles a bidder will increase his own bid by about $2.2$% of the estimate. We exclude the interpretation of rivals utilization rate here and do this later when we account for firm fixed effects. If the number of bidders in an auction increases by one a bidder will lower his own bid by about $1.5$% of the estimate.

In the next two subsections we will estimate variants of the following model:

$$\begin{eqnarray} \textrm{normalized_bid}^{(n)}_i = \alpha_0 & + & \alpha_n + \alpha_i + \alpha_2 \cdot \textrm{dist100}^{(n)}_i + \alpha_3 \cdot \textrm{util}^{(n)}_i \\ & + & \alpha_4 \cdot \textrm{fringe}_i + \alpha_5 \cdot \textrm{rivaldist100}^{(n)}_i \\ & + & \alpha_6 \cdot \textrm{rivalutil}^{(n)}_i + \alpha_7 \cdot \textrm{nbidders}^{(n)} + \varepsilon^{(n)}_i \end{eqnarray}$$

With the variable $\alpha_n$ we include project fixed effects to control for information that is publicly observable by all firms but not us. With $\alpha_i$ we include firm fixed effects to control for omitted cost shifters of firms that are constant across auctions. The reason to include them is fairly simple. It is naturally to assume that there is information about the project that are observed by all but not by us. The second natural assumption is that there are effects specific to a firm but constant across auctions. Just think of the structure of a firm that may allow to increase or decrease the amount of employees fast and thus make this firm more flexible resulting in different impacts of util on firms. Effects like those are captured in $\alpha_i$. Such regressions are often used like in Porter et al. (1993) where they used such regressions to determine if bid rigging occurred in public procurement auctions in Long Island. Note that the first regression of this chapter was variant of the model above, where we didn't include project or firm fixed effects.
First we will use firm fixed effects and afterwards project fixed effects.

Firm Fixed Effects

In this section we will allow for firm fixed effects. The regression model then looks the following way:

$$\begin{eqnarray} \textrm{normalized_bid}^{(n)}_i = \alpha_0 & + & \alpha_i + \alpha_2 \cdot \textrm{dist100}^{(n)}_i + \alpha_3 \cdot \textrm{util}^{(n)}_i \\ & + & \alpha_4 \cdot \textrm{fringe}_i + \alpha_5 \cdot \textrm{rivaldist100}^{(n)}_i \\ & + & \alpha_6 \cdot \textrm{rivalutil}^{(n)}_i + \alpha_7 \cdot \textrm{nbidders}^{(n)} + \varepsilon^{(n)}_i \end{eqnarray}$$

If we use firm fixed effects we do not get a result on fringe. This is due to the mathematics behind the fixed effects regression but you can think of it the following way: If we perform a standard OLS regression with dummies for every company, than for every company fringe is constant since a company can't be fringe and non fringe at the same time and this status is not changing across auctions. If we are not aware of this and estimate the model above with felm() it will work but give out a warning. To avoid that we exclude fringe from our model and thus estimate the following:

$$\begin{eqnarray} \textrm{normalized_bid}^{(n)}_i = \alpha_0 & + & \alpha_i + \alpha_2 \cdot \textrm{dist100}^{(n)}_i + \alpha_3 \cdot \textrm{util}^{(n)}_i \\ & + & \alpha_5 \cdot \textrm{rivaldist100}^{(n)}_i + \alpha_6 \cdot \textrm{rivalutil}^{(n)}_i \\ & + & \alpha_7 \cdot \textrm{nbidders}^{(n)} + \varepsilon^{(n)}_i \end{eqnarray}$$

Task: Perform a fixed effects regression of the model above. Here the fixed effect is among bidders so use bidderid for the fixed effects option. Store the result as FELM_firm. In addition use cluster standard errors where you should cluster by bidders (bidderid). If you do not know exactly how to do so take a look at the hint.

#< task
# Enter your commands here.
#>
FELM_firm = felm(normalized_bid ~ dist100 + rivaldist100 + util + rivalutil  + nbidders | bidderid | 0 | bidderid, data = dat)
#< hint
display("Your command should look as follows:
        FELM_firm = felm(... ~ ... + ... + ... + ...  + ... | bidderid | 0 | bidderid, data = dat)")
#>

We do not present summary statistics here but in the next section after we have done a project fixed effects regression of the same model.

Project Fixed Effects

In this section we will allow for project fixed effects. The regression model then looks the following:

$$\begin{eqnarray} \textrm{normalized_bid}^{(n)}_i = \alpha_0 & + & \alpha_n + \alpha_2 \cdot \textrm{dist100}^{(n)}_i + \alpha_3 \cdot \textrm{util}^{(n)}_i \\ & + & \alpha_4 \cdot \textrm{fringe}_i + \alpha_5 \cdot \textrm{rivaldist100}^{(n)}_i \\ & + & \alpha_6 \cdot \textrm{rivalutil}^{(n)}_i + \alpha_7 \cdot \textrm{nbidders}^{(n)} + \varepsilon^{(n)}_i \end{eqnarray}$$

If we use project fixed effects we do not get a result on nbidders as the number of bidders is constant within one specific project. Thus we exclude nbidders and the regression model looks the following:

$$\begin{eqnarray} \textrm{normalized_bid}^{(n)}_i = \alpha_0 & + & \alpha_n + \alpha_2 \cdot \textrm{dist100}^{(n)}_i + \alpha_3 \cdot \textrm{util}^{(n)}_i \\ & + & \alpha_4 \cdot \textrm{fringe}_i + \alpha_5 \cdot \textrm{rivaldist100}^{(n)}_i \\ & + & \alpha_6 \cdot \textrm{rivalutil}^{(n)}_i + \varepsilon^{(n)}_i \end{eqnarray}$$

Task: Perform a project fixed regression of the above model. Notice that each project has a specific number stored in c. Use cluster standard errors where we want to cluster by projects. Store your result in FELM_project.

#< task
# Enter your commands here.
#>
FELM_project = felm(normalized_bid ~ dist100 + rivaldist100 + util + fringe + rivalutil | c | 0 | c, data = dat)
#< hint
display("Your commands should look the following:
        FELM_firm = felm(... ~ ... + ... + ... + ...  + ... | c | 0 | c, data = dat)")
#>

As the final part of this chapter we would like to present summary statistics for OLS, FELM_firm and FELM_project to compare the three different settings.

Task: Present summary statistics for the three regressions in one table. You should use the reg.summary() command for that.

#< task
# Enter your commands here.
#>
reg.summary(OLS, FELM_firm, FELM_project)
#< hint
display("Your commands should look the following:
        reg.summary(..., ..., ...)")
#>


< award "Regessionmaster Lv. 4"

You are now able to perform difficult fixed effects regression on your own.

>

For this summary we make only use of the significant results. The distance of a bidder to the project dist100 is significant at the $1$% level except for the firm fixed effects model. The coefficient of the significant results lies between $0.0076$ and $0.0093$. Thus we can say that a bidder increases his bid by nearly one percent of the estimate for an increase of $100$ miles in his distance to the project.
rivaldist100 is significant at the $1$% level for all but the project fixed effects model. We conclude from the significant results that a bidder will increase his own bid by about $2$% of the estimate if his closest rivals distance to the project increases by $100$ miles.
util is only significant at the $5$% level for the firm fixed effects model. Note that we have seen in chapter 3.2 that firm fixed effects are needed to examine the impact of util on the normalized bid. If the utilization rate of a bidder increases by one percentage point he will increase his own bid by about $0.04$% points of the estimate.
Now rivalutildoesn't seem to fit our expectation since the significant results suggest that a bidder will lower his own bid by about $0.13$% points of the estimate if his closest rivals utilization increases by one percentage point. The only result that has the expected positive sign is not significant.
The impact of the size of a company on its bids captured in fringe is significant at the $0.1$% level. A fringe firm will bid about $4$% more than a non fringe one. The number of bidders is significant at the $0.1$% level and for each additional bidder a company will lower his bid by about $1.5$% of the estimate.

We have seen that each of our covariates ( except rivalutil) influences the bids in the way we assumed. Now we can discuss the explanatory power of our three models. OLS yields a $R^2$ of about $0.04$ where if we include firm fixed effects we get an $R^2$ of $0.22$ telling us that including those effects increases the goodness of fit. The project fixed effects from FELM_project give an $R^2$ of $0.73$ and also add considerably to the goodness of fit. From the results we can conclude that firm and project fixed effects increase the goodness of fit, specially the project fixed effects that capture characteristics such as the difficulty of the task, the condition of the job site and anticipated changes.
BHT also present random fixed effects models but they do not change the results that much to present them here. Now we are done with this first part of reduced form estimates. In the next chapter we will discuss adaptions made to the contract after the time of bidding.

One last note: If you are interested why our t-values are not the same as in the original paper of BHT you can take a look at the below info box.

< info "t-Values of the Regression"

If you have also read the paper of BHT you may noticed that the t-values of our regressions in this section are different. For the OLS regression we did not use cluster robust standard errors as it is done in the paper. For the two fixed effects regression we get slightly different t-values. This is due to the different computation made to get the t-values. We could replicate STATA's t-values but as we like to keep the code simple we just use the standard clustered ones from felm(). If you want to know how you can do so you may take a look at the sandwich package which allows to do so. Here is a link to find out more cran.r-project.org/web/packages/sandwich/sandwich.pdf. If you want to reproduce the cluster robust standard erros of STATA for the OLS model you can do so with the following code:

library(sandwich)
sqrt(diag(vcovHC(OLS,type = "HC1")))

This prints out the cluster robust standard errors as in STATA.

>

Exercise 4 -- Adaptions and Adaption Costs

In this chapter we will talk about the information about adaptions to the original contract we have in our data. Note that the first adaption, the change in the quantity of an item, was already discussed in chapter 1 and 2. Adaptions as we talk about them in this chapter are called ex post changes in most economic literature. Due to the following practice made by Catrans we have information about three different kinds of adaptions.

(I) Adjustment of Compensation

If the difference between estimated and actual quantities is large (in our case this is true if the quantities vary by $25$% or more), or if it is thought to be due to negligence by one party, both sides (Caltrans and the winning bidder) will renegotiate an adjustment of compensation. In particular this means if some item ran over by more than $25$% Caltrans will not pay the item bid, instead they will negotiate an adjustment to lower the item price and hence the total bill. Our data contains adjustments as a lump sum change and records them as adjustments. Note that adjustments can also work the other way around, that is, if an item ran under negotiations to increase the item price can be made. And thus the total bill is increased.

(II) Change of Scope

There may be a change in scope. Assume the original scope was to resurface two miles of highway, but the subsurface is not stable and needs to be excavated and have gravel added. This activity was not described in the original contract. In most cases both parties will negotiate a change order that amends the scope of the contract as well as the final payment. For the case that negotiations fail, this may result in a lawsuit. Payment from changes will appear in two ways. Changes in the actual ex post quantities of specific items will be compensated for through unit prices and extra payments will be made to cover the use of unanticipated materials and other adjustment cost. Our data records them as extrawork.

(III) Deductions

The payment may be changed because of deductions. If the work is not completed on time or if it fails the specifications, Caltrans may deduct liquidated damage. This type often leads to dispute between Caltrans and the contractor. Thus the final deductions may be the outcome of negotiations or even lawsuits and arbitrations between the parties. Our data records them as deductions.
All three variables are measured in dollars.

Adaptions in more Detail

Now that you know the three new types of adaptions, it is time for a short quiz to see an example. As in the chapters before we need to load the data first. Again we want to make use of our example contract.

Task: To load the data set bidders.dta and extract the information of our example contract "02-356604" just press edit and check afterwards.

#< task
dat=read.dta("bidders.dta")
example = filter(dat, contract == "02-356604")
#>

Task: To solve the following quiz you can enter whatever you think is needed here.

#< task_notest
# Enter your code here
#>
#< hint
display("The following command prints out all information needed to solve the quiz: 
          select(example,bidder, adjustments, extrawork, deductions)")
#>

< quiz "adaptions"

parts: - question: 1. What kind of adaption was made in our example contract? sc: - adjustments, extra work and deductions - adjustments - extra work* - deductions - extra work and deductions - adjustments and deductions success: Great, your answer is correct! failure: Try again. - question: 2. How much was the total sum of the three adaptions in dollars for our example contract? answer: 13814.33 roundto: 1 success: Great, your answer is correct! failure: You may think about which adaptions occurred in this specific contract.

>

To get a better insight into those adaptions we want to compute summary statistics for all three. The function stargazer(), which we used to present summary statistics of regressions, can also be used to print a summary of a data frame. As with the regressions I wrote a function that automatically calls stargazer() with the options we use here to make the code more elegant. This function is called tab.summary(). The info block below explains you how you can make use of it.

< info "tab.summary()"

If you have a data set data containing three columns col_a, col_b and col_c and you want to present a summary of the entries contained in all three you can do so with the following code:

tab.summary(data)

If you only want the summary of col_a you can make use of the select() command that you have seen several times. The code below shows you how:

tab.summary(select(data, col_a))

The output will be the number of observations "N", the mean "Mean", the standard deviation "St. Dev." the minimum "Min" and the maximum "Max".

>

Task: To get a first understanding on the adaptions from above perform a summary of adjustments, extrawork and deductions across all contracts. To do so you need to generate a new data set dat_c that only contains one information for those three variables per contract. If you require a hint just make use of the hint button.

#< task
dat_c = filter(dat, winner == 1)
# Enter your code here
#>
tab.summary(select(dat_c, adjustments, extrawork, deductions))
#< hint
display("Your command should look as follows:
        tab.summary(select(dat, ..., ..., ...))")
#>

Now we see that an average contract contains about $142$ thousand dollars of adjustments, $176$ thousand dollars of extra work and $-8615$ dollars of deductions. The dollar value of adjustments ranges from $-196$ thousand to over $15$ million. The maximum dollar value of extra work was nearly $15$ million and the biggest deduction about$-2.5$ million. If we like to see how big these adjustments are in relation to the estimated total bid of the contract we can do so by dividing all by the estimated total bid engestimate. To achieve this we generate three new columns of dat containing the normalized adjustments, extra work and deduction. We name them normalized_adjustments, normalized_extrwork and normalized_deductions.

Task: Print the summary statistics of the three normalized adaptions as above.

#< task
dat_c$normalized_adjustments = dat_c$adjustments / dat_c$engestimate
dat_c$normalized_extrawork = dat_c$extrawork / dat_c$engestimate
dat_c$normalized_deductions = dat_c$deductions / dat_c$engestimate
# Enter your code here
#>
tab.summary(select(dat_c, normalized_adjustments, normalized_extrawork, normalized_deductions))
#< hint
display("Your command should look as follows:
        tab.summary(select(dat_c, ..., ..., ...))")
#>

With this table we are now able to say that adjustments were on average $2.1$% of the estimate. Extra work was about $6.1$% of the estimate and deductions about $0.2$%. That indicates that ex post changes seem to be big across our data, in special adjustments and extra work.
In the next chapter we are going to build up an empirical framework for our analysis of adaption costs. Then we will note adjustments as $A$, extra work as $X$ and deductions as $D$.

Let us now take a look at how the final payment changed in contrast to the winning bid. The final payment in our case is just the following: $$\textrm{final payment} = \sum_{i \in items} \textrm{unit bid}_i \cdot \textrm{actual quantity}_i + A + X + D$$ Where the sum over all items of the unit bid times the actual quantity is just the actual total bid bidtotal_act.

Task: Present summary statistics of the final payment minus the winning bid diff and the normalized one normalized_diff.

#< task
dat_c$diff = dat_c$bidtotal_act + dat_c$adjustments + dat_c$extrawork + dat_c$deductions - dat_c$winning_bid
dat_c$normalized_diff = dat_c$diff / dat_c$engestimate
# Enter your code here
#>
tab.summary(select(dat_c, diff, normalized_diff))
#< hint
display("Your command should look as follows:
        tab.summary(select(dat_c, ..., ...))")
#>

We see that the final payment exceeds the winning bid by about $190$ thousand dollars on average which is about $5.8$% of the estimate. But as you can see there where some contracts where the difference was more than $65$% of the estimate in both directions.

The last adaption we like to mention here is the one we discussed in chapter 1 and 2, the change of quantities. As we have seen, the estimated and actual quantities nearly never match. We have a column sum_ccdbover in our data set dat that records those changes in the following way: $\textrm{sum_ccdbover} = \sum_{i \in item} (\textrm{actq}_i - \textrm{estq}_i) \dot text{CCDBprice}_i$. This means that sum_ccdbover contains the dollar value of quantity changes of a specific contract where the estimated fair price is used to value the change. This will be used later when we examine adaption costs as a control variable to wipe out the quantity changes from the adaption costs estimate.

Task: Just press check to get summary statistics for sum_ccdbover and the normalized version normalized_sum_ccdbover

#< task
dat_c$normalized_sum_ccdbover = dat_c$sum_ccdbover / dat_c$engestimate
tab.summary(select(dat_c, sum_ccdbover, normalized_sum_ccdbover))
#>

The average dollar value of miss estimated quantities is $-62$ thousand dollars which is about $-0.02$% of the estimate. This does not seem a lot but keep in mind that most of the variation cancels out since items over- and underrun but we record them as one sum here.

Adaption Costs

This section introduces adaption costs. First recall that the adaptions we have seen in the last section are just a transfer of funds from Caltrans to the firms (if negative just the other way around). Adaption costs, as we denote them, are costs related to those adaptions. BHT distinguish between direct adaption costs and indirect adaption costs.
- Direct adaption costs are costs that occur to disruption of the originally planned work. An example would be addition costs that are not captured in the payment for extra work it self.
- Indirect adaption costs are costs that occur to resources devoted to contract renegotiation and dispute resolution. An example would be lawyer costs and legal expenses while haggling over the dollar value of deductions or adjustments.

We assume that these extra costs (adjustment costs) are proportional to the size of adjustments $A$, extra work $X$ and deductions $D$. Now it is useful to distinguish between positive and negative adjustments to the revenues of a bidder. We notice that extra work adds compensation to the bidder and thus $X > 0$. Deductions reduce the bidders compensation, thus $D < 0$. Adjustments can be positive and negative as stated in the section above, thus we will distinguish between positive adjustments $A_+$ and negative adjustments $A_-$. Then $A = A_+ + A_-$ and $A_+ > 0$, $A_- < 0$. Adaption costs cause the bidder to suffer a loss greater than the actual loss imposed by $A_-$ and $D$. For positive ex post income ($A_+$ and $X$) adaption costs cause surplus to be dissipated.
Adaption costs will be noted as $\tau$'s in the next chapter. $\tau_{a_+}$ and $\tau_{a_-}$ stand for the imposed adaption costs from positive and negative adjustments. $\tau_x$ will measure the adaption costs from extra work and $\tau_d$ the adaption costs from deductions. Thus we can write the total adaption costs $K$ as $K = \tau_{a_+} \cdot |A_+| + \tau{a_-} \cdot |A_-| + \tau_x \cdot |X| + \tau_d \cdot |D|$. Where the absolute value is needed for adjustments that reduce compensation to a bidder ($A_-$ and $D$). Another way of noting total adaption costs is $K = \tau_{a_+} \cdot A_+ - \tau{a_-} \cdot A_- + \tau_x \cdot X - \tau_d \cdot D$ since we know that $A_+ > 0$, $A_- < 0$, $X > 0$ and $D < 0$.
As a final part of this chapter assume that we have a contract/auction with positive adjustments of $500$ dollars, negative adjustments of $-300$ dollars, extra work of $2400$ dollars and deductions of $-1100$ dollars. Additionally assume that we found out that $\tau_{a_+} = 0.6$, $\tau_{a_-} = 1.3$, $\tau_x = 0.4$ and $\tau_d = 1.6$. Use the below code chunk to answer the following quiz.

Task: You can enter whatever you think is needed to solve the quiz here.

#< task_notest
# Enter your command here
#>

#< hint
display("Your command should look as follows:
        500*0.6 + 300*1.3 + 2400*0.4 + 1100*1.6")
#>

< quiz "adaption costs2"

question: What is the value of all adaption costs in the above example? answer: 3410 roundto: 0.1

>

Let us additionally assume that this auction had a winning actual total bid of $40$ thousand dollars. To solve the below quiz find out how many dollars the winning bidder got after he completed the project (this is exactly his final payment). In addition find out how many percent of the final payment can be attributed to adaption costs.

Task: You can enter whatever you think is needed to solve the quiz here.

#< task_notest
# Enter your command here
#>
#< hint
display("Your first command should look the following:
        40000 + 500 - 300 + 2400 - 1100, 
        your second command should look the following:
        3410/41500")
#>

< quiz "adaption costs3"

question: What was the final payment? answer: 41500 roundto: 0.1

>

< quiz "adaption costs4"

question: How many percent of the final payment were adaption costs? answer: 8.216867 roundto: 0.1

>

< award "Quizmaster Lv. 3"

You computed adaption costs and the final payment for the first time. Keep in mind that they are the main focus of this problem set.

>

Now we are finished with this chapter and can move on to the next one where we construct an empirical framework for our analysis of adaption costs.

Exercise 5 -- A Model of Empirical Bidding Behaviour

Now that we have got some insights on the auction, the bids, characteristics influencing the bids and adaptions, we would like to go further with our analysis. To do so we need some general setup to work with. This chapter will only be about the setup we use here and where it comes from. If you are not interested in the theory, you can skip this exercise and continue with the next one, where we will start with reduced form estimates of the adaption costs.

We look at procurement auctions used for highway construction in California, where they use unit-price contracts. The organisation in charge is Caltrans (California's Department of Transportation). Engineers from Caltrans first prepare a list of items that specify the tasks and materials needed to complete the project. For each of those items , engineers estimate the needed quantity to complete the job. Then this itemized list is publicly advertised. In addition they give out plans and specifications that describe how the job is to be completed. Afterwards interested firms will bid per unit price for each item on the list, the bid must be sealed and submitted before a set date. When bids are opened the bidder with the lowest estimated total bid wins. The estimated total bid is just the sum over all items of estimated quantity times item bid. Due to the fact that actual quantities most likely do not match the estimated quantities, final payments made to the contractors never equal the original bid. Since this happens most of the time, Caltrans has come up with some rules to determine the final payment differently than just calculating the sum of actual quantities times the item bid. Recall the three main reasons to do so from chapter 4:
- Adjustments of Compensation: Those adjustments $A$ are stored as adjustments in our data set and we distinguish between positive $A_+$ and negative adjustments $A_-$.
- Changes in Scope: They lead to extra work $X$ which is stored as extrawork.
- Deductions: They are noted as $D$ and stored as deductions in our data set.

It is widely believed in the industry that some firms strategically manipulate their bids in anticipation of changes in payment. Regarding our case, though we have already seen in chapter 1 that this in not a big issue. This may be due to the fact that Caltrans is not required to accept the lowest bid if it seems irregular and that they can renegotiate prices on items that over- or underrun by more than 25%. Zero unit bids are therefore unlikely.

The Basic Setup

Now that we have the structure of the contracts, we will create a basic setup for our empirical model. The model we will use is a simple variant of a standard private values auction model as explained in Krishna (2008, p. 13-17).

For every project we have tasks $t=1,...,T$, referring to the items we have seen in chapter one, and a vector of estimated quantities $q^e=(q^e_1,...,q^e_T)$ for every task. The actual (ex post) quantities are given by $q^a=(q^a_1,...,q^a_T)$ and are independent of the bidder selected to perform the work. We assume that each bidder has perfect foresight about $q^a$ while Caltrans is unaware of $q^a$ and only uses $q^e$. This extreme form of asymmetric information can be, in the case of risk neutral contractors, interpreted as the contractor having no exact information about $q^a$ but instead having symmetric uncertainty about $q^a$. This results in common ration expectations over $q^a$. We will use this interpretation for our empirical work since it generates a source of noise that is not specific to the contractors information or the observable project characteristics.

Contractors also have different private information about their costs of production. We denote the per unit costs of bidder $i$ to complete task $t$ with $c^i_t$ and $c^i=(c^i_1,...,c^i_T) \in \mathbb{R}^T_+$ denotes the vector of all per unit costs. With this notation we can write the total cost of $i$ for installing $q^a$ as $c^i \cdot q^a$. Bidder $i$'s cost type is drawn from a well behaved joined density $f_i(c^i)$ with support at a compact subset of $\mathbb{R}^T_+$. The distribution is common knowledge but only bidder $i$ knows $c^i$. We assume private values for the road work industry thus costs are independently distributed conditional on publicly observable information.
Summing up, we assume bidders to have symmetric rational expectations about what needs to be done but asymmetric private information about the costs of production.
We denote with $b^i=(b^i_1,...,b^i_T)$ the unit price vector submitted by bidder $i$, where $b^i_t$ is the unit bid of $i$ on task $t$. For our model we need a score reflecting the total bid. We denote bidder $i$'s score with $s^i=b^i \cdot q^e$. recall that this is just the total bid bidtotal of a bidder. Bidder $i$ wins the auction if $s^i < s^j$ $\forall$ $j \neq i$. This is just a simple linear scoring rule that transforms each bid vector into a score. Since bidder $i$'s total costs of producing $q^a$, which we will from now on refer to as his type, are random (drawn from $f_i(c^i)$), we denote them with $\theta^i=c^i \cdot q^a$. If we denote the gross revenue that $i$ expects if he wins with his bid of $b^i$ by $R(b^i)$ then we can write his expected profits the following way: $$\pi_i(b^i,\theta^i)=(R(b^i)-\theta^i) \cdot (P[s^i < s^j \; \forall \; j \neq i])$$

Introducing Revenues and Adaption Costs

In this section we will explain and specify the revenues $R(b^i)$ bidder $i$ expects. As mentioned before, revenues do not only consist of $b^i \cdot q^a$ due to adjustments $(A)$, extra work $(X)$ and deductions $(D)$. Those three components are assumed to be independent of the bidder who is winning the auction (like actual quantities) and bidders do not have control over them. This assumption seems not to hard to make. Each of these three components are introduced as expected values, since contractors are risk neutral and have symmetric rational expectations about adjustment costs. We include them additive into the revenue function. So our new revenue function looks the following: $$R(b^i)=\sum^T_{t=1} b^i_t \cdot q^i_t + A + X + D$$ This formula does not yet include adaption costs but our empirical work will focus mainly about them so we need to include them. They will lower the impact on profits of every dollar transferred by $A + X + D$ from the buyer to the contractor. We assume these extra costs to be proportional to the size of adjustments, extra work and deductions. This assumption comes from the fact that the contractual incompleteness that leads to adjustments, extra work and deductions is positively correlated with the direct costs from disrupting the normal flow of work and the indirect costs of renegotiation that have been introduced in the last chapter.
As in chapter 4 we will denote imposed adaption costs from adjustments, where we distinguish between positive ($A>0$) $A_+$ and negative ($A<0$) $A_-$ adjustments to revenues, as $\tau_{a_+} \cdot A_+$ and $\tau_{a_-} \cdot A_-$. Note that as before $A = A_+ + A_-$. Adaption costs from extra work are denoted as $\tau_x \cdot X$ and adaption costs from deduction as $\tau_d \cdot D$, where $X > 0$ and $D < 0$.
With this notation on hand we can write the total ex post costs of adaption $K$, as we did in chapter 4, the following way: $$K=\tau_{a_+} \cdot A_+ - \tau_{a_-} \cdot A_- + \tau_x \cdot X - \tau_d \cdot D$$ Now we can add this to the revenue function: $$R(b^i)=\sum^T_{t=1} b^i_t \cdot q^i_t + A + X + D-K$$ If we write out $K$ and recall that $A = A_+ + A_-$ we can rewrite the revenue function as: $$R(b^i)= \sum^T_{t=1} b^i_t \cdot q^i_t + A_+ \cdot (1 + \tau{a_+})+ A_- \cdot (1 + \tau{a_-}) + X \cdot (1 + \tau_x) + D \cdot (1 + \tau_d)$$ With the adaption costs variable $K$ we will later examine that the hypothesis $K=0$ is not true and thus show the presence of adaption costs. Note that our whole empirical analysis of adaption costs is just about those $\tau$'s since they capture the adaption costs we are interested in. As this is the most important analysis of this problem set I will give an example:

Assume that we find in our later analysis that $\tau_{a_+}$ equals $0.5$. That means that a bidder expects to spend $0.5$ dollars in adaption costs for every dollar of positive adjustments. Thus, the actual costs of a positive adjustment of one dollar are $1.5$ dollars (the one dollar that is paid to the bidder as adjustment compensation plus the $0.5$ dollars the bidder added to his bid in order to cover expected adaption costs from the expected positive adjustment). This may seem bizarre but in chapter 6.1 we will come up with a good story to justify such a behavior.

Task: To answer the below quiz you can enter what ever you need here.

#< task_notest
# Enter your code here
#>
#< hint
display("The right calculus for the question is the following:
        For the first:
        10*0.5 + 50*1 + 20*1.5
        For the second:
        10*0.5 + 50*1 + 20*1.5 + 200*0.3")
#>

Assume for the following quiz that $\tau_{a_+} = 0.5$, $\tau_{a_-} = 1$, $\tau_x = 0.3$ and $\tau_d=1.5$.

< quiz "adaption_costs_model1"

question: If we assume the above values, how much adaption costs will a contract with 10 dollars positive adjustments, 50 dollars negative adjustments and deductions of 20 dollars have? answer: 85 roundto: 0.1

>

< quiz "adaption_costs_model2"

question: 2. If we assume additional 200 dollars of extra work, what is the dollar value of adaption costs then? answer: 145 roundto: 0.1

>

As a last step we add a component that captures the loss from submitting an irregular bid. As seen in the example at the beginning (chapter 1), bidders would like to bid zero or nearly zero on items that are assumed to underrun. Caltrans has the opportunity to reject such bids, thus skewed bids imply expected costs to the bidder. The penalty function should be increasing in the skewness of the bid, which will depend on some reasonable price. Caltrans provides us with an estimate $\bar{b_t}$ for the unit cost of item $t$ based on a collection from passed bids and market prices (they are written down in the Caltrans Cost Data Book (CCDB)). Thus a measure of skewness would be the distance of $b^i$ from $\bar{b}=(\bar{b_1},...,\bar{b_T})$. Let's denote this penalty function by $P[b^i \mid \bar{b}]$. This function should be continuously differentiable, but we need some further assumptions:
1) No penalty from submitting a bid that matches $\bar{b}$, which mathematically means $P[\bar{b} \mid \bar{b}]=0$.
2) If a bid matches the engineers cost estimate, the first order costs of skewing are zero. We can note this mathematically by $$\frac{\partial P[b^i \mid \bar{b}]}{\partial b^i_t}\mid_{b^i_t=\bar{b_t}}=0$$
3) $P[b^i \mid \bar{b}]$ is strictly convex.
4) The penalty on bids that are nearly zero or converge to it shall be extremely high. This can be noted the following way: $$\lim\limits_{b^i_t \to 0} \frac{\partial P[b^i \mid \bar{b}]}{\partial b^i_t}=\infty$$ Assumption 1) and 2) reflect Caltrans practice. 3) and 4) guarantee an interior solution to the bidders optimization of $b^i$. To make notation simple we will drop $\bar{b}$ and from now on write $P[b^i]$. Now we have all specifications of the revenue function summing up to the following: $$\tag{1} R(b^i)=\sum^T_{t=1} b^i_t \cdot q^i_t + A + X + D - K - P[b^i]$$ We are finished with the revenue function and move on to equilibrium bidding behavior.

Equilibrium Bidding Behaviour

The concept used here is the Bayesian Nash Equilibrium of the first-price sealed-bid auction. More information can be found in Krishna (2008, p. 299-304). The game is a scoring auction with independent private values as in Che (1993). Our equilibrium behavior will be determined as if our bidders had a uni-dimensional type. This is due to the fact that given the scoring rule, the choice of $s^i=b^i \cdot q^e$ is separable from the optimal choice of the actual bid vector $b^i$. To make it clear: given a score $s$, each bidder has an optimal choice of bids conditional on winning $b^i_t(s)$ and given this optimal bid, there is an optimal score $s(\theta^i)$ that is uni-dimensional. Following Lebrun (2006) this results in a Bayesian game with a unique pure strategy monotonic equilibrium. As mentioned above we can separate bidder $i$'s problem in two parts. In the first we assume some given score $s$ and solve for the optimal bid conditional on winning the auction. With this $b^i(s)=(b^i_1(s),...,b^i_T(s))$ we will solve for the optimal score $s^i$ in the second part. The first part, where we want to choose the optimal bid function given a score $s$ is just maximizing $R(b^i)$ over $b^i(s)$: $$ \tag{2} \underset{b^i(s)}{\mathrm{max}} \sum_{t=1}^T b^i_t \cdot q^a_t -\theta^i + A + X + D - K - P[b^i]$$ with the following restriction: $$\sum_{t=1}^T b^i_t \cdot q^a_t = s$$ Solving $(2)$ yields $T+1$ first order conditions (FOC's) with the first $T$ being $$\tag{3} q^a_t - \frac{\partial P[b^i]}{\partial b^i_t} - \lambda \cdot q^e_t = 0 \; \forall \; t=1,...,T$$ and the last one is the constraint $$\sum_{t=1}^T b^i_t \cdot q^a_t = s$$

Now we assume that we have solved $(2)$ for $b^i(s^i)$ and solve the bidder's optimization problem of choosing the optimal score $s^i$. With $H_j(\cdot)$ we denote the cumulative distribution function of bidder $j$'s score $s^j$. Thus the probability that $i$ with a score of $s^i$ bids more than $j$ is given by $H_j(s^i)$ and hence the expected profit function of $i$ is $$\pi_i(s^i,\theta^i)= (R(b^i(s^i)) - \theta^i) (\prod_{j \neq i}(1 - H_j(s^i)))$$ If we substitute the revenues with $(1)$ and recall that $\theta^i = \sum_{t=1}^T c^i_t \cdot q^a_t$ we get the following FOC: $$\tag{4} \sum^T_{t=1} (b^i_t(s^i)-c^i_t)q^a_t = \frac{\sum^T_{t=1} \frac{\partial b^i_t(s^i)}{\partial s^i}(q^a_t - \frac{\partial P[b^i]}{\partial b^i_t})}{\sum_{j \neq i} \frac{h_j(s^i)}{1 - H_j(s^i)}} - A - X - D + K + P[b^i]$$ If you are interested in how to calculate that check the below info box.

< info "Calculus"

Maximizing $\pi_i(s^i,\theta^i)$ over $s^i$ is in our case the same as calculating the first derivative of $\pi_i(s^i,\theta^i)$ with respect to $s^i$ and set that derivative to zero (this can be done since our function is assumed to be concave). Thus we will do the derivation part first. If we substitute the revenues with $(1)$ and use $\theta^i = \sum_{t=1}^T c^i_t \cdot q^a_t$ the expected profit function looks the following: $$\pi_i(s^i,\theta^i)= (\sum^T_{t=1} b^i_t(s^i) \cdot q^a_t + A + X + D -K - P[b^i] - \sum_{t=1}^T c^i_t \cdot q^a_t) \cdot (\prod_{j \neq i}(1 - H_j(s^i)))$$ This can be rewritten as $$\pi_i(s^i,\theta^i)= (\sum^T_{t=1} (b^i_t(s^i) - c^i_t) \cdot q^a_t + A + X + D -K - P[b^i]) \cdot (\prod_{j \neq i}(1 - H_j(s^i)))$$ As we have two multiplicative terms we need to use the product rule. Let us first calculate the derivation of the left part of the product: $$\begin{eqnarray} & \frac{\partial}{\partial s^i} & \sum^T_{t=1} (b^i_t(s^i) - c^i_t) \cdot q^a_t + A + X + D -K - P[b^i] \\ & = & \sum^T_{t=1} \frac{\partial b^i_t(s^i)}{\partial s^i} \cdot q^a_t - \frac{\partial P[b^i]}{\partial s^i} \\ & = & \sum^T_{t=1} \frac{\partial b^i_t(s^i)}{\partial s^i}(q^a_t - \frac{\partial P[b^i]}{\partial b^i_t}) \end{eqnarray}$$ Where the last step is just a better notation. Now we calculate the derivation of the right part of the product: $$\begin{eqnarray} & \frac{\partial}{\partial s^i} & \prod_{j \neq i}(1 - H_j(s^i)) \\ & = & - \sum_{k \neq i} h_k(s^i) \cdot (\prod_{j \neq i, \; j \neq k}(1 - H_j(s^i))) \\ & = & - \sum_{k \neq i} h_k(s^i) \cdot (\frac{(1-H_k(s^i))}{(1-H_k(s^i))} \cdot \prod_{j \neq i, \; j \neq k}(1 - H_j(s^i))) \\ & = & - \sum_{k \neq i} \frac{h_k(s^i)}{1 - H_k(s^i)} \cdot (\prod_{j \neq i}(1 - H_j(s^i))) \\ & = & - \sum_{j \neq i} \frac{h_j(s^i)}{1 - H_j(s^i)} \cdot (\prod_{j \neq i}(1 - H_j(s^i))) \end{eqnarray}$$ Where we made use of the derivation of $1 - H_j(s^i)$ equalling $-h_j(s^i)$. The last step can be done as the product then does not depend on $k$ anymore. If you do not know how to do the first step it may be helpful to write out $\prod_{j \neq i}(1 - H_j(s^i))$. Now that we have both derivations we can write the derivation of $\pi_i(s^i,\theta^i)$ using the product rule: $$\begin{eqnarray} & \frac{\partial \pi_i(s^i,\theta^i)}{\partial s^i} & \\ & = & (\sum^T_{t=1} (b^i_t(s^i) - c^i_t) \cdot q^a_t + A + X + D -K - P[b^i]) \cdot (- \sum_{j \neq i} \frac{h_j(s^i)}{1 - H_j(s^i)} \cdot (\prod_{j \neq i}(1 - H_j(s^i)))) \\ & + & (\sum^T_{t=1} \frac{\partial b^i_t(s^i)}{\partial s^i}(q^a_t - \frac{\partial P[b^i]}{\partial b^i_t})) \cdot (\prod_{j \neq i}(1 - H_j(s^i))) \end{eqnarray}$$ Now we see that we can cancel the term $\prod_{j \neq i}(1 - H_j(s^i))$ on both sides: $$\begin{eqnarray} & \frac{\partial \pi_i(s^i,\theta^i)}{\partial s^i} & \\ & = & (\sum^T_{t=1} (b^i_t(s^i) - c^i_t) \cdot q^a_t + A + X + D -K - P[b^i]) \cdot (- \sum_{j \neq i} \frac{h_j(s^i)}{1 - H_j(s^i)}) \\ & + & \sum^T_{t=1} \frac{\partial b^i_t(s^i)}{\partial s^i}(q^a_t - \frac{\partial P[b^i]}{\partial b^i_t}) \end{eqnarray}$$ Since we want to maximize $\pi_i(s^i,\theta^i)$ over $s^i$, which is equivalent to setting the first derivative to zero in this case, we now need to set the our derivative to zero: $$(\sum^T_{t=1} (b^i_t(s^i) - c^i_t) \cdot q^a_t + A + X + D -K - P[b^i]) \cdot (- \sum_{j \neq i} \frac{h_j(s^i)}{1 - H_j(s^i)}) + \sum^T_{t=1} \frac{\partial b^i_t(s^i)}{\partial s^i}(q^a_t - \frac{\partial P[b^i]}{\partial b^i_t}) = 0$$ Now we can rearrange the terms and divide by $\sum_{j \neq i} \frac{h_j(s^i)}{1 - H_j(s^i)}$: $$\sum^T_{t=1} (b^i_t(s^i) - c^i_t) \cdot q^a_t = \frac{\sum^T_{t=1} \frac{\partial b^i_t(s^i)}{\partial s^i}(q^a_t - \frac{\partial P[b^i]}{\partial b^i_t})}{\sum_{j \neq i} \frac{h_j(s^i)}{1 - H_j(s^i)}} - A - X - D + K + P[b^i]$$ Which is the same as $(4)$.

>

Note that $H_j(\cdot)$ is differentiable with density $h_j(\cdot)$ by the assumptions on the density of types (uni-dimensional) and on the penalty function $P[\cdot]$ (assumptions 1)-4), specially 3) and 4) for the penalty function). The FOC's of the two stages ($(3)$ and $(4)$) are necessary and sufficient to describe optimal bidding behavior in our case. Thus a Bayesian Nash Equilibrium is given by a collection of bid functions $b^i(\cdot)$ and scores $s^i$ that simultaneously satisfy $(3)$ and $(4)$. As mentioned in the beginning there is a unique monotonic equilibrium in pure strategies therefore we will use $(4)$ as a basis for further analysis. Another point why we will stick with $(4)$ is that it introduces empirically measurable terms, most notably the adaption costs reflected in $K$. We will illustrate this in the following example:
Assume the bidder expects a deduction $D$ of $1000$ dollars then the FOC $(4)$ suggest that the bidder will raise his bid by $1000 \cdot (1+\tau_d)$ dollars. As BHT point out this indicates that the total costs of the deductions, as born by the firm, are indirectly born by Caltrans.
If we assume $q^e=q^a$ and no ex post changes $A = D = X = 0$ then the FOC $(4)$ reduces to the FOC for standard first price, private value asymmetric auction models: $$\tag{5}s^i - c^i \cdot q^e = ( \sum_{j \neq i} \frac{h_j(s^i)}{1 - H_j(s^i)} )^{-1}$$ This relates our model to the established literature of bidding without adaption costs and changes, where the left hand side, the markup, reflect a bidders cost advantage and market power. Note that we estimated $(5)$ in chapter 3.4.
This work's focus are adaption costs thus the model abstracts away from problems such as substituting the perfect foresight assumption on changes and actual quantities with the common values specification in which each bidder has signals of $A, \; X$ and $D$. But with this model on hand we will show that the assumption $\tau_l=0 \; \forall l \in {A_+, \; A_-, \; X, \; D }$ implicitly made in most theoretical and empirical literature is rejected by the data. Now that we have described the general set up we continue with the reduced form estimates.

< award "Reader"

Congratulation you made it through this long and complicated chapter.

>

Exercise 6 -- Reduced Form Estimates of Adaption Costs

In the next chapters we are going to estimate adaption costs. To do so, we will re-specify the regression model that you have seen in chapter 3.4 to match equation $(4)$ instead of $(5)$. Even though regressions like the one we did in chapter 3.4 are common, equation $(4)$ suggests that those suffer from two sources of erroneous specification. The first is that the dependent variable is the estimated total bid $b^i \cdot q^e$ instead of the actual total bid $b^i \cdot q^a$. Note that $b^i \cdot q^a$ was called bidtotal_act. The second is that we have ignored the anticipated changes to payments that result from adjustments, extra work and deductions. To account for heteroskedasticity related to project size we will divide the actual total bid by $\bar{b}^{(n)} \cdot q^{a,(n)}$ which is the sum over the item price estimate times the actual bid. This variable is called CCactprojsize in our data set. If we divide the actual total bid by this estimate we get a normalized version of the actual total bid. This variable is called Nbidtotal_act in our data set. As we did in the chapters before, we need to load the data first here as well.

Task: To load the data set bidders.dta and get the example contract we used in the chapters before, press edit and check afterwards.

#< task
dat=read.dta("bidders.dta")
example = filter(dat, contract == "02-356604")
#>

Let us take a look at those new variables for our example contract.

Task: Press check to get a better insight.

#< task
select(example, bidder, bidtotal, engestimate, normalized_bid, bidtotal_act, CCactprojsize, Nbidtotal_act)
#>

Bear in mind, that this specific contract contains only extra work but some item ran over while others ran under. In our example contract the actual total bid is slightly higher than the estimated one. The estimated fair value of this contract engestimate was below the actual project size CCactprojsize (that is in our case the sum over all items of the item price estimate times actual quantity).
Let us now build up the regression model to estimate adaption costs. To match equation $(4)$ and account for heteroskedasticity related to project size, our left hand side of the regression will be Nbidtotal_act. The right hand side will be the same as in chapter 3.4, but we need to add terms reflecting adaption to estimate adaption costs. Note that we have the data about these stored in adjustments, extrawork and deductions and that we differentiate between positive and negative adjustments which are stored as posAdj and negAdj. We would like to normalize them by dividing by the actual project size CCactprojsize. The results are stored as NPosAdj, NNegAdj, NEX and NDed. Recall that nbidders was constant for one auction and thus needs to be included differently now as you will see below. With this information we can set up a regression model: $$\begin{eqnarray} \textrm{Nbidtotal_act}^{(n)}_i =\gamma_n & + & \alpha_2 \cdot \textrm{dist100}^{(n)}_i + \alpha_3 \cdot \textrm{util}^{(n)}_i + \alpha_4 \cdot \textrm{fringe}_i \\ & + & \alpha_5 \cdot \textrm{rivaldist100}^{(n)}_i + \alpha_6 \cdot \textrm{rivalutil}^{(n)}_i + \varepsilon^{(n)}_i \end{eqnarray}$$ Where $\gamma_n$ is defined the following way: $$\begin{eqnarray} \gamma_n = \beta_0 & + & \beta_1 \cdot \textrm{nbidders}^{(n)} + \beta_2 \cdot \textrm{NPosAdj}^{(n)} + \beta_3 \cdot \textrm{NNegAdj}^{(n)} \\ & + & \beta_4 \cdot \textrm{NEX}^{(n)} + \beta_5 \cdot \textrm{NDed}^{(n)} + \varepsilon_n \end{eqnarray}$$

Also bear in mind, that we have seen in chapter 3.4 that the regression with project fixed effects seemed to best explain the bids. We will use them to estimate the first part of the regression formula. After that we will explain the project fixed effects of this regression with the second part of the regression formula (all variables there are project specific).
Note that nbbiders is constant for one specific auction and thus was moved to the part of the regression where we explicitly state the fixed effects. This regression looks similar to the regression from chapter 3.4 but is now consistent with $(4)$. This time we wish to also include over- and underruns as a covariate in our regression to separate those effects from the rest. To do so BHT have constructed a measure sum_ccdbover. This measure is just the sum over all items of the actual quantity minus the estimated quantity divided by the estimated item price (as you have seen in chapter 4). Mathematically this can be written the following way: $\textrm{sum_ccdbover} = \sum_{i \in item} \frac{q^{a_i} - q^{e_i}}{\bar{b_i}}$. Again we would like to normalize this variable by the project size. The resulting normalized overrun measure is stored as NOverrun. Thus, our regression model includes all the needed terms and looks as the above one but $\gamma_n$ contains NOverrun now:

$$\begin{eqnarray} \gamma_n = \beta_0 & + & \beta_1 \cdot \textrm{nbidders}^{(n)} + \beta_2 \cdot \textrm{NPosAdj}^{(n)} + \beta_3 \cdot \textrm{NNegAdj}^{(n)} \\ & + & \beta_4 \cdot \textrm{NEX}^{(n)} + \beta_5 \cdot \textrm{NDed}^{(n)} + \beta_6 \cdot \textrm{NOverrun}^{(n)} + \varepsilon_n \end{eqnarray}$$

There is one last thing we would like to point out. In chapter 3.4 we have seen that the covariates util and rivalutil do not behave as expected and do not seem to be statistically significant in most settings. Thus we exclude them for the analysis of adaption costs. Now we can write our regression model as follows: $$\begin{eqnarray} \tag{6} \textrm{Nbidtotal_act}^{(n)}_i & = & \gamma_n + \alpha_2 \cdot \textrm{dist100}^{(n)}_i + \alpha_3 \cdot \textrm{fringe}_i + \alpha4 \cdot \textrm{rivaldist100}^{(n)}_i + \varepsilon^{(n)}_i \\ \gamma_n & = & \beta_0 + \beta_1 \cdot \textrm{nbidders}^{(n)} + \beta_2 \cdot \textrm{NPosAdj}^{(n)} + \beta_3 \cdot \textrm{NNegAdj}^{(n)} \\ & + & \beta_4 \cdot \textrm{NEX}^{(n)} + \beta_5 \cdot \textrm{NDed}^{(n)} + \beta_6 \cdot \textrm{NOverrun}^{(n)} + \varepsilon_n \end{eqnarray}$$

In the next chapters we are going to estimate $(6)$ with project fixed effects for the first part and standard OLS as well as instrumental variables for the second part.

Exercise 6.1 -- Examine Adaption Costs using Project Fixed Effects

In this chapter we estimate $(6)$ with project fixed effects. But let us first load the data.

Task: To load the data set bidders.dta and create the a data set containing only one observation per project dat_c, press edit and check afterwards.

#< task
dat=read.dta("bidders.dta")
dat_c = filter(dat, winner == 1)
#>

We will first perform a project fixed effects regression of $$\textrm{Nbidtotal_act}^{(n)}_i =\gamma_n + \alpha_2 \cdot \textrm{dist100}^{(n)}_i + \alpha_3 \cdot \textrm{fringe}_i + \alpha4 \cdot \textrm{rivaldist100}^{(n)}_i + \varepsilon^{(n)}_i$$ and then explain the project fixed effects $\gamma_n$ of this regression with our ex post changes and the number of bidders. This results in the following model $$\begin{eqnarray} \gamma_n & = & \beta_0 + \beta_1 \cdot \textrm{nbidders}^{(n)} + \beta_2 \cdot \textrm{NPosAdj}^{(n)} + \beta_3 \cdot \textrm{NNegAdj}^{(n)} \\ & + & \beta_4 \cdot \textrm{NEX}^{(n)} + \beta_5 \cdot \textrm{NDed}^{(n)} + \beta_6 \cdot \textrm{NOverrun}^{(n)} + \varepsilon_n \end{eqnarray}$$ which we will estimate with standard OLS. Now let us start with the first part where we want to perform a project fixed effects regression of Nbidtotal_act on dist100, fringe and rivaldist100.

Task: Perform such a regression and store the result as FELM_help. We want to cluster our standard errors by c. If you need help just press the hint button.

#< task
# Enter your command here
#>
FELM_help = felm(Nbidtotal_act ~ dist100 + rivaldist100 + fringe | c |0 | c, data = dat)
#< hint
display("Your commands should look the following:
        FELM_help = felm(... ~ ... + ... + ... | c |0 | c, data = dat)")
#>

Now we need to extract the predicted project fixed effects from this regression. Since it is not nicely described how one can do so, I will do it for you. The felm object FELM_helpcontains a variable response. That is exactly what we need here. In the following chunk the entries of response are stored in our data frame as gamma. This is the $\gamma_n$ that we need to perform the second regression.

Task: Perform a normal regression of gamma on nbidders, NPosAdj, NNegAdj, NEX, NDed and NOverrun. Use clustered standard errors where we want to cluster by c (that is by each contract/auction). After you have done so, print out the summary statistics of the regressions FELM_help and FELM.

#< task
dat$gamma = FELM_help$response
# Enter your commands here
#>
FELM = felm(gamma ~ nbidders + NPosAdj + NNegAdj + NEX + NDed + NOverrun| 0 | 0 | c, data = dat)
#< hint
display("Your command should look as follows:
        FELM = felm(... ~ ... + ... + ... + ... + ... + ...| 0 | 0 | c, data = dat)")
#>
reg.summary(FELM_help, FELM)
#< hint
display("Your command should look as follows:
        reg.summary(FELM_help, FELM)")
#>


< award "Regressionmaster Lv. 5"

You have estimated adaption costs for the first time, congratulations.

>

The first regression FELM_help will not be interpreted here (since we have done so in more detail in chapter 3.4) and is shown only to get the full model. Note that the coefficients on dist100, rivaldist100, fringe and nbidders have the expected sign. We find that the coefficient on nbidders of about $0.004$ is not significant. The coefficient on positive adjustments is $0.8$ and significant at the $0.1$% level. The coefficient on negative adjustments is $-1.7$ is significant at the $5$% level. The other two adaptions are not significant. The coefficient on extra work is $0.17$, the one on deductions about $-1.4$. NOverrun is significant at the $0.1$% level and yields a coefficient of $0.006$.
But what do the coefficients on adaption costs tell us? Since this is the main focus of this problem set, I will explain how we can interpret them in detail. First we need to recall the equation we want to estimate, which we achieved with the following:

$$\tag{4} \sum^T_{t=1} (b^i_t(s^i)-c^i_t)q^a_t = \frac{\sum^T_{t=1} \frac{\partial b^i_t(s^i)}{\partial s^i}(q^a_t - \frac{\partial P[b^i]}{\partial b^i_t})}{\sum_{j \neq i} \frac{h_j(s^i)}{1 - H_j(s^i)}} - A - X - D + K + P[b^i]$$

The left hand side is the markup given actual quantities, the right hand side reflects the cost advantage and market power of a bidder as well as adaption costs and the penalty from skewing bids. Our regression model of the adaptions was the following (except for nbidders):

$$\begin{eqnarray} \gamma_n & = & \beta_0 + \beta_1 \cdot \textrm{nbidders}^{(n)} + \beta_2 \cdot \textrm{NPosAdj}^{(n)} + \beta_3 \cdot \textrm{NNegAdj}^{(n)} \\ & + & \beta_4 \cdot \textrm{NEX}^{(n)} + \beta_5 \cdot \textrm{NDed}^{(n)} + \beta_6 \cdot \textrm{NOverrun}^{(n)} + \varepsilon_n \end{eqnarray}$$

With these equations it may be easier to see how we can access adaption costs. Equation $(4)$ from chapter 5 suggests that the marginal impact of an extra dollar of change identifies the adaption costs in our model. To compare the terms reflecting adaptions in our model with the terms reflecting adaptions from the regression model, we can write one term below the other. Note that i substituted $A$ with $A_+ + A_-$. $$- A_+ - A_- - X - D + K + P[b^i]$$ $$\beta_2 \cdot \textrm{NPosAdj}^{(n)} + \beta_3 \cdot \textrm{NNegAdj}^{(n)} + \beta_4 \cdot \textrm{NEX}^{(n)} + \beta_5 \cdot \textrm{NDed}^{(n)} + \beta_6 \cdot \textrm{NOverrun}^{(n)}$$ If we now use $K = \tau_{a_+} \cdot A_+ - \tau_{a_-} \cdot A_- + \tau_x \cdot X - \tau_d \cdot D$ and rearrange the terms we get the following: $$-A_+ \cdot (1 - \tau_{a_+}) - A_- \cdot (1 + \tau_{a_-}) - X \cdot (1 - \tau_x) - D \cdot (1 + \tau_d) + P[b^i]$$ $$\beta_2 \cdot \textrm{NPosAdj}^{(n)} + \beta_3 \cdot \textrm{NNegAdj}^{(n)} + \beta_4 \cdot \textrm{NEX}^{(n)} + \beta_5 \cdot \textrm{NDed}^{(n)} + \beta_6 \cdot \textrm{NOverrun}^{(n)}$$ Now it is easy to figure out which coefficient identifies what: $$\beta_2 = - (1 - \tau_{a_+}) = 0.8 \textrm{ by FELM}$$ $$\beta_3 = - (1 + \tau_{a_-}) = -1.7 \textrm{ by FELM}$$ $$\beta_4 = - (1 - \tau_x) = 0.17 \textrm{ by FELM}$$ $$\beta_5 = - (1 + \tau_d) = -1.4 \textrm{ by FELM}$$ We can rearrange the terms for the $\tau$'s: $$\tau_{a_+} = \beta_2 + 1 = 1.8$$ $$ \tau_{a_-} = -(\beta_3 +1) = 0.7$$ $$\tau_x = \beta_4 + 1 = 1.17$$ $$\tau_d = -(\beta_5 + 1) = 0.4$$

With this on hand we can interpret the results easily. First, notice that if there were no adaption costs, then $\beta_2 = \beta_3 = \beta_4 = \beta_5 = -1$ would apply. We find different coefficients and this suggests that adaption costs are prevalent in our data set.

Now let us interpret all the findings above. The coefficient of $0.8$ on $\beta_2$, which is significant at the $0.1$% level, yields $\tau_{a_+} = 1.8$ meaning that a bidder expects to spend $1.8$ dollars in adaption costs for every dollar they obtain in positive adjustment compensation. Thus a bidder will actually increase his own bid if he expects this additional ex post income. This may seem bizarre but consider the following: Imagine Caltrans wants to impose negative adjustments of $-1$ dollar, which will be enforced if the firm does not contest. But the firm can pay haggling costs of $1.8$ dollars after which it will collect a positive adjustment of $1$ dollar instead. Since the costs of $1.8$ dollars are lower than the benefit of $2$ dollars this will result in an observation of $1$ dollar in positive adjustments ex post. But keep in mind that it actually increased the ex ante bid by $1.8$ dollars.
The coefficient of $-1.7$ on $\beta_3$, which is significant at the $1$% level, implies $\tau_{a_-} = 0.7$ suggesting that a bidder expects to spend $0.7$ dollars in adaption costs for every dollar in negative adjustments. Here you may think of haggling costs to limit the decrease of the item price done by Caltrans due to some overrun on an item.
The coefficient of $0.17$ on $\beta_4$, which is significant at the $5$% level, implies $\tau_x = 1.17$. This means that a bidder expecting one dollar of extra work assumes to spend $1.17$ dollars in adjustment costs for this one dollar. Here you can think of costs from disrupting the natural workflow.
The coefficient of $-1.4$ on $\beta_5$, which is not significant, implies $\tau_d = 0.4$. This means that a bidder expecting one dollar of deductions assumes he needs to spend $0.4$ dollars in adaption costs. Firstly, this result is not significant and secondly, the imposed adaption costs are low, thus we can assume that deductions will not cause high adaption costs for the moment.

< quiz "Adaption costs 12"

parts: - question: 1. How many dollars does a bidder expect to spend in adjustment costs if he expects one dollar of positive adjustment? sc: - 1 - 1.8 - 1.16 success: Great, your answer is correct! failure: Try again. - question: 2. How many dollars does a bidder expect to spend in adjustment costs if he expects one dollar of deductions? sc: - 0.5 - 0.7 - 0.4 success: Great, all answers are correct! failure: Not all answers correct. Try again.

>

< award "Quizmaster Lv. 4"

You really understand what adaption costs are and how to interpret them.

>

Now with these first estimates, no matter if they are good or not, let us examine total adaption costs. This is done in the next code chunk. We declare our $\tau$'s that we computed using the OLS results and then compute the total adaption costs.

Task: Just press check to get a summary of the total adaption costs, the actual project size and the total adaption costs normalized by the actual project size following our estimates from FELM.

#< task
tau_pos_a = 1.8
tau_neg_a = 0.7
tau_x = 1.17
tau_d = 0.4
dat_c$total_adaption_costs = tau_pos_a*dat_c$posAdj - tau_neg_a*dat_c$negAdj + tau_x*dat_c$extrawork - tau_d*dat_c$deductions
dat_c$Ntotal_adaption_costs = dat_c$total_adaption_costs/dat_c$CCactprojsize 
tab.summary(select(dat_c, total_adaption_costs, CCactprojsize, Ntotal_adaption_costs))
#>


An average project, which amounted to a total of about $2.6$ million dollars, contained $473$ thousand dollars of adaption costs. That is nearly $14$ percent of the actual size of an average project. Notice, that these estimates, if they were perfect, assume that bidders have perfect foresight about adaptions, in which case our findings would represent a lower bound for adaption costs. Thus our results suggest that adaption costs are high in public procurement auctions for highway construction work.

Recall that we used effectplot() to get an impression on how significant each effect in a regression is. We will do the same for adaption costs but as effectplot() does not work with felm() regressions right now we use a standard OLS regression instead. Note that this does not change the results of the coefficients. The only difference to FELM is that we do not use cluster standard errors here.

Task: Just press check to get an intuition how big the effects of the ex post changes on the normalized actual total bid are.

#< task
OLS = lm(gamma ~ nbidders + NPosAdj + NNegAdj + NEX + NDed + NOverrun, data = dat)
effectplot(OLS)
#>

We find that the biggest impact comes from positive adjustments followed by extra work. Negative adjustments and deductions seem fairly small. Our measure of quantity changes NOverrun has almost no impact as expected after we found in chapter 2 and 2.1 that incentives to skew bids are small.
Now we have done our first analysis of adaption costs where we used OLS to estimate the ex post changes. The next chapter accounts for possible endogeneity of ex post changes with an instrumental variable regression.

Exercise 6.2 -- Examine Adaption Costs while Accounting for Endogeneity of Ex Post Changes

One concern with the analysis of adaption costs made in chapter 6.1 is that ex post changes may be correlated with the error term. This could be the case since there may be omitted costs observed by the bidders that can't be accounted for. If this would apply, our error term would include them and thus A2, which was the assumption that the expected value of the error term of the regression is zero (see chapter 2.1), was violated. BHT mention the following anecdote in their paper:
A project in a more mountainous area will impose higher production costs and will be more likely to require changes due to the more challenging terrain. If this is true, projects with more changes have higher costs not because of adaption costs but because of higher production costs in such rough terrains. Another factor is that complex projects impose serious delays and difficulties that increase the labor costs of production. If these delays come from adjustments and deductions then the increased bids may actually be a consequence of the increased production costs.
Remember that we have cost data from Caltrans for each project, given actual quantities CCactprojsize we could run a regression of these costs on the actual total bid bidtotal_act and find out how much variation we can explain with that. BHT do so and find an $R^2$ above $0.97$ meaning that more than $97$% of the variation in the actual total bid can be explained by Caltrans estimate of the costsCCactprojsize. BHT mention that with this result one could nearly be sure that any omitted costs will be negligible. Nonetheless we will perform a regression that accounts for that. This can be done with an instrumental variable regression (iv-regression) if we find good instruments for the ex post changes. As we have five ex post changes we need at least five instruments. A good instrument must satisfy two conditions which can be found in Kennedy (2008, p. 141):

In our case this means that a good instrument must be uncorrelated with the unobserved project specific production costs but affect the ex post changes. BHT found instruments for that which, according to them, fulfil this purpose well. They use the identity of the Caltrans project engineer who supervised the project. Thus we will use dummy variables indicating if one specific engineer managed that specific project. These variables are stored as re2, ..., re335. Note that we have $334$ different engineers and thus more than enough instruments for our five possible endogenous variables.

Now we need to check the two conditions.
The first is not possible to verify directly, since it is an identifying assumption. According to BHT though, it is plausible to assume that our instruments and the error term are uncorrelated since Caltrans assigns the engineers in the following procedure to an auction:
First the Caltrans engineering staff draws the plans and specifications for a project. Then the project is publicly advertised, plans and specifications are made available to bidders. The location of the project allows bidders to determine which district office the engineer will be desputized from. In one district office there are just a handful of engineers and they are assigned to a project according to their expertise and availability. Most of the times, the engineer is assigned early on and the name is put on the plans to be contacted by the bidders if questions occur. After the bids are submitted, the winner is chosen and work begins. Changes to the project are made based upon work progress and site condition. Thus we know that our instrument is known at the time of auction and ex post changes occur only after that. BHT state that then according to Hansen et al. (1982) our instrument is valid since it cannot be correlated with the forecast error of payoff relevant variables.
The second assumption is easy to check with another package called AER. If you want to see how this can be done check the below info box. If not, just assume that they are highly correlated with the ex post changes. As in every chapter let us load the data first.

Task: To load the data set bidders.dta, create a data frame example containing our example contract, and create dat_c as before, press edit and check afterwards. In addition we compute the fixed effects regression of Nbidtotal_act on dist100, rivaldist100 and fringe as in chapter 6.1 and store the predicted fixed effects of this regression as gamma.

#< task
dat=read.dta("bidders.dta")
dat_c = filter(dat, winner == 1)
example = filter(dat_c, contract == "02-356604")
FELM_help=felm(Nbidtotal_act ~ dist100 + rivaldist100 + fringe | c | 0 |c,data=dat)
dat$gamma = FELM_help$response
#>

< info "Check if Instruments are Correlated with Regressors"

To check if the dummy variables, indicating whether one specific engineer managed a specific project, are correlated with the ex post changes we can make use of the package AER. This package contains a function ivreg() to perform iv-regressions and allows us to test certain hypotheses afterwards. I will not explain how to use ivreg(), so if you want to know more, take a look at inside-r.org/packages/cran/AER/docs/ivreg. The following code performs a regression of gamma on the ex post changes while using the dummy variables re2, ..., re335 for the ex post changes NPosAdj, NNegAdj, NEX, NDed and NOverrun.

# As this is an optional part but evaluated befor you start with this chapter we need to 
# read in the data and perform the same regression as done outside this chunk
dat=read.dta("bidders.dta")
FELM_help=felm(Nbidtotal_act ~ dist100 + rivaldist100 + fringe | c | 0 |c,data=dat)
dat$gamma = FELM_help$response
# This is the actual code we need
library(AER)
# These are the dummy variables for our iv regression
res = paste("re", 2:335, sep="")
# This is the whole regression formula with iv options
formula = as.formula(paste("gamma~nbidders+NPosAdj+NNegAdj+NEX+NDed+NOverrun | nbidders +", paste(res, collapse= "+")))
# This is the actual regression
IV_test = ivreg(formula, data=dat)

If we use this function we can add the parameter diagnostics=TRUE to the summary() function from base R. Then the summary statistics will include three additional tests.
- The weak instruments test, which tests if an instrument is correlated with the regressor. Here the null hypothesis is that the instruments are weak, thus if the p-value is low we can be sure that the instrument is not weak.
- The Wu-Hausman test checks if iv-regression is just as consistent as standard OLS. Notice that if this is the case, OLS should be preferred since it is more efficient. This test has the null hypothesis that they are equally consistent.
- The Sargan test is a test with the Null hypothesis that all instruments are exogenous (correlated with the error term). For this test we need at least two instruments per regressor. If the Sargan test is not rejected, i.e. has low p-values, it suggests that at least one instrument is endogenous. But if the test is rejected we do not have prove that all instruments are exogenous (uncorrelated with the error term) thus you may think of this test the following way: Not being rejected by the Sargan test can be interpreted as a necessary condition for exogenous instruments but is no evidence for them.

The following code can be used to get such a summary statistic.

summary(IV_test, diagnostics = TRUE)

We see that the weak instruments test is rejected for all ex post changes, thus our instruments are correlated with the regressors. The Wu Hausman test is also rejected indicating that it is needed to use iv-regression here. The Sargan test is rejected meaning that at least one of our instruments is endogenous. This would mean that our instruments are not valid anymore. But in our case we just want to replicate the findings of BHT and thus assume that the instruments are valid. Another problem with the Sargan test may be the amount of instruments used here, since this test is pretty sensitive to the ratio of instruments on regressors.
If you want to dig deeper into iv-regressions I suggest to look at Kennedy (2008, chapter 9).

>

Now let us use iv-regression to estimate the second part of the model $(6)$. This can be done with the felm() function. To learn how, just check the below info box.

< info "IV-Regression with felm()"

The felm() function can be used to perform iv-regressions. Assume you want to regress y on x1, x2 and x3 and you think that x2 and x3 are endogenous but you have valid instruments z2, z3 and z4 for x2 and x3. Then you can perform such a regression using z2, z3 and z4 as instruments for x2 and x3 in the following way (note that all variables need to be in the data frame dat):

felm(y ~ x1 | 0 | (x2 | x3 ~ z2 + z3 + z4) | cluster_var, data=dat)

Note that the first zero indicates that we do not use fixed effects, then we have the iv formula followed by the variable cluster_var which is the grouping variable for clustered standard errors.

If you want to know more about the felm() method you can check here rdocumentation.org/packages/lfe/functions/felm

>

This time we perform a regression of gamma, that we computed from FELM_help, on nbidders, NPosAdj, NNegAdj, NEX, NDed and NOverrun. Again, we want to use cluster robust standard errors clustered by project.

Task: Perform an iv-regression as described above. Store the result as FELM_iv and print out summary statistics. Since the iv options do not look nice i wrote a function iv.formula.felm() that you can use to get the full formula for this regression. Just call iv.formula.felm() in the part of felm() where you are asked to give a formula. The regression can be done if you uncomment the code and fill the ... with the right command. In addition print out the summary statistics of this regression. This computation may take a few seconds.

#< task
# FELM_iv = felm(..., data = ...)
#>
FELM_iv=felm(iv.formula.felm(), data=dat)
#< hint
display("Your command should look as follows:
        FELM_iv=felm(iv.formula.felm(), data=dat)")
#>
reg.summary(FELM_iv)
#< hint
display("Your second command should look the following:
        reg.summary(FELM_iv)")
#>


< award "Regressionmaster Lv. 6"

You performed your first iv-regression.

>

Now all results on the ex post changes are not significant at standard levels except for NPosAdj and NOverrun (before this we had significant results also for NNegAdj). The coefficients stayed nearly the same as in the other setting except for deductions. Here the coefficient decreased to $-2.6$. Let us now compute the total adaption costs implied by these coefficients. $$\beta_2 = - (1 - \tau_{a_+}) = 0.9 \textrm{ by FELM_iv}$$ $$\beta_3 = - (1 + \tau_{a_-}) = -1.5 \textrm{ by FELM_iv}$$ $$\beta_4 = - (1 - \tau_x) = 0.23 \textrm{ by FELM_iv}$$ $$\beta_5 = - (1 + \tau_d) = -2.6 \textrm{ by FELM_iv}$$ As before we can access the $\tau$'s the following way: $$\tau_{a_+} = \beta_2 + 1 = 1.9$$ $$ \tau_{a_-} = -(\beta_3 +1) = 0.5$$ $$\tau_x = \beta_4 + 1 = 1.23$$ $$\tau_d = -(\beta_5 + 1) = 1.6$$

With the $\tau$'s we can again compute the total adaption costs and the normalized total adaption costs as in chapters 6.1.

Task: Just press check to perform the calculus described above and present summary statistics for the two variables.

#< task
tau_pos_a = 1.9
tau_neg_a = 0.5
tau_x = 1.23
tau_d = 1.6
dat_c$total_adaption_costs = tau_pos_a*dat_c$posAdj - tau_neg_a*dat_c$negAdj + tau_x*dat_c$extrawork - tau_d*dat_c$deductions
dat_c$Ntotal_adaption_costs = dat_c$total_adaption_costs/dat_c$CCactprojsize 
tab.summary(select(dat_c, total_adaption_costs, CCactprojsize, Ntotal_adaption_costs))
#>


We find that the total adaption costs resulting from our estimates average at $508$ thousand dollars. This is nearly $15$% of the actual project size.
If we assume that bidders have perfect foresight of adaptions we must assume that they will include adaption costs into their total bid. Thus, one last question we can answer now is how many percent of the total bid can be attributed to adaption costs. We just need to divide the total adaption costs by the total bid.

Task: Do the above calculus and store the result as adaption_costs_in_bid in dat_c. Then compute the mean of this variable.

#< task
# Enter your commands here.
#>
dat_c$adaption_costs_in_bid = dat_c$total_adaption_costs / dat_c$bidtotal
#< hint
display("Your first command should look the following:
        dat_c$adaption_costs_in_bid = dat_c$total_adaption_costs / dat_c$bidtotal")
#>
mean(dat_c$adaption_costs_in_bid )
#< hint
display("Your second command should look the following:
        mean(...)")
#>

According to our regression and the resulting $\tau$'s we find that adaption costs account for $13$% of the winning bid.
Remember our example contract (contract = "02-356604") from the chapters in the beginning. Let us take a look at the ex post changes and the actual project size.

Task: Just pres check to get the above data.

#< task
select(example, bidder, posAdj, negAdj, extrawork, deductions, CCactprojsize)
#>

Below you find a quiz about adaption costs in this contract. To answer it you may use the below code chunk.

Task: You can enter what ever you think is needed to solve the quiz here.

#< task
# Enter your code here
#>
#< hint 
display("To solve the following quiz note that the only ex post change leading to 
        adaption costs was extra work, thus you can use the following:
        tau_x*example$extrawork
        The second question can be answered if you multiply the following result by 100
        (tau_x*example$extrawork)/example$CCactprojsize ")
#>

< quiz "Adaption costs example"

parts: - question: 1. If we use the last estimates of adaption costs, what were the total adaption costs in our example contract ? answer: 16991.61 roundto: 1 success: Great, your answer is correct! failure: Try again. - question: 2. State those total adaption costs as a percentage of the actual project size. answer: 1.690595 roundto: 0.1 success: Great, all answers are correct! failure: Not all answers correct. Try again.

>

< award "Finisher"

Congratulations, you made it through the whole problem set. I hope you enjoyed it.

>

Now we are finished with reduced form estimations of the adaption costs. The next chapter gives a review about our findings during this problem set.

Exercise 7 -- Conclusion

We set ourselves the goal of finding out if adaption costs are prevalent in public procurement auctions for highway construction work in California. How did we reach a conclusion? We started by showing that incentives to skew bids are not a major determinant of the bids observed and thus could be ignored. Next we took a look at characteristics influencing the bids. Here we distinguished between measures of those that influence the costs and those that influence the market power of a bidder. For the costs we found that the distance to the project and the size of a firm work well whereas the utilization rate seemed to not fulfill our expectation and thus was excluded. For the market power we used the minimal distance among a firm's rivals and the number of bidders in an auction. We took a look at the minimal utilization rate among a company's competitors but found out that it didn't work as expected and thus was excluded from further analysis. After we showed that our four measures of characteristics influencing the bids were good, we introduced adaptions. We used mainly three adaptions made to the contract. The first were adjustments of compensation, the second extra work and the third deductions. In addition we looked at the total dollar value of over- and underruns on items, since we wanted to exclude the effect of these from the impact of adaptions on the bids. Subsequently, we made up a model to describe empirical bidding behavior which we could estimate afterwards. Then we used different regression techniques accounting for possible problems while estimating adaption costs.

We found out that a bidder expected to spend $1.9$ dollars in adaption costs for every dollar he obtained in positive adjustment compensation. If a bidder faced one dollar of negative adjustments, he expected to spend $0.5$ dollars in adaption costs. Every dollar of extra work brought expected adaption costs of $1.23$ dollars. One dollar of deductions lead to expected adaption costs of $1.6$ dollars. We stated that the highway construction industry is highly competitive and thus bidders included those expected adaption costs into their bids ex ante. This implied that about $13$ percent of the total bid among winners can be traced back to adaption costs. Additionally, we stated that adaption costs are between $473$ and $508$ thousand dollars on an average project. Such an average project had an actual value of $2.6$ million dollars and thus adaption costs constituted $14$ to $15$ percent of the actual dollar value of a project. Thus we showed that adaption costs are substantial and should not be ignored. One obvious implication is to put more effort into estimating and specifying projects before auction. Since our data lacked information about the costs of estimating and specifying projects we could not investigate on costs and benefits of adding more engineering efforts ex ante.

In addition to our approach BHT formulate a structural model to estimate adaption costs. This approach is based on the two-step nonparametric estimators discussed in Elyakime et al. (1994) and in Guerre, Perrigne, and Vuong (2000). If you want a good overview on structural estimation in auction theory I suggest you to take a look at the book of Paarsch, Harry J. and Hang Hong (2006, chapter 4) where the whole procedure is explained in more detail. In a first step the density and the cumulative distribution function of the bid distributions are estimated for each project. In the second step the penalty from skewed bids and the adaption cost coefficients, $\tau_{a_+}, \; \tau_{a_-}, \; \tau_x$ and $\tau_d$, are estimated using the FOC $(4)$ to form a generalized method of moments estimator.!!!!!> BHT find that $\tau_{a_+} = 2.1$, $\tau_{a_-} = 2.4$, $\tau_x = 1.23$ and $\tau_d = 1.5$. The values are close to the reduced form estimates of adaption costs except for negative adjustments. In the future there may be a another problem set explaining the structural approach and findings of BHT.

If you want to see the entirety of the awards that you collected during this problem set, just press edit and check afterwards in the below code block. There where a total of $13$ awards to be earned.

#< task
awards()
#>

Exercise 8 -- References

Bibliography

R and Packages in R



Fcolli/RTutorProcurementAuction documentation built on May 6, 2019, 4:35 p.m.