In ErhardtP/RTutorBiddersCurse: RTutor problem set BiddersCurse

Problem Set: The Bidder's Curse

Author: Paul Erhardt

< ignore

library(restorepoint)
# facilitates error detection
# set.restore.point.options(display.restore.point=TRUE)

library(RTutor)
library(yaml)
#library(restorepoint)
#setwd("/users/student1/qad70/Desktop/Masterarbeit/Problemset")
#setwd("C:/Users/Pav/Documents/00-Wichtiges/Masterarbeit/Problemset")
setwd("C:/Users/Nadine/Desktop/Masterarbeit/Problemset")

ps.name = "BiddersCurse"; sol.file = paste0(ps.name,"_sol.Rmd")
libs = c("dplyr","ggplot2", "grid", "gridExtra","lubridate", "margins", "scales", "stargazer", "tidyr") 
# character vector of all packages you load in the problem set
#name.rmd.chunks(sol.file) # set auto chunk names in this file
create.ps(sol.file=sol.file, ps.name=ps.name, user.name=NULL,libs=libs, rps.has.sol=FALSE, stop.when.finished=FALSE, use.memoise = TRUE, addons="quiz")
show.shiny.ps(ps.name, load.sav=FALSE,  sample.solution=TRUE, is.solved=FALSE, catch.errors=TRUE, launch.browser=TRUE)
stop.without.error()

>

Exercise Overview

Welcome to this interactive problem set which is part of my master's thesis at Ulm University. It analyses the phenomenon of "overbidding" in online auctions on eBay, that is to say, bidding more for an item than it would cost when bought immediately on the same webpage. This investigation is based on the paper "The Bidder's Curse" from Ulrike Malmendier and Young Han Lee, published in 2011. However, results may slightly differ due to missing data, different calculation methods and rounding errors. You can download the paper as well as additional material like the data sets here: The Bidder's Curse.

The authors examined auctions at the American eBay platform (ebay.com) where the same item was also continuously available for immediate purchase at a fixed price, so called buy-it-now offer (BIN offer). Rational bidders are expected to never bid above that fixed price as they could switch to the BIN offer at any time and purchase the item immediately for the buy-it-now price (BIN price).

However, the authors find a large proportion of auctions with closing prices being significantly greater than the respective BIN prices (overbidding). This observation is not restricted to few specific items but is rather pervasive. It is observable for many different product categories and price levels. The authors denote this phenomenon of overbidding as the "Bidder's Curse", not to be confused with the Winners Curse. The winner's curse describes the effect, that winning bidders of a common value auction systematically pay too much due to incomplete information. When multiple bidders base their bids on their own estimated value, winning the auction tells the winning bidder that his evaluation might be an overestimate of the item's value (Kagel & Levin, 2009). The "curse" in this context describes the effect of realising a "bad deal" whenever someone is picked as winner because of the auction's design.

In the following problem set, you will derive most results of the paper by yourself, interactively using the programming language R. You will investigate the occurrence of overbidding and possible reasons that might cause such a behaviour. This way, you can improve your R programming skills while gaining an insight into an interesting part of behavioural economics. If you need an introduction to R, you can download a beginner guide from Paradis, E. (2002) here: R for Beginners.

The problem set is structured as follows:

Content

Introduction
The Phenomenon of Overbidding
Disproportional Influence of Overbidders
Overbidding at Various Products
(Excursus) - Hypothesis: Overbidding is Significant on Averages
Possible Factors Influencing Overbidding
Regression Analysis
(Excursus) - Availability of BIN Offers
Conclusion

References

Outline

In Exercise 1 I introduce you to the first data set, containing information about eBay auctions of a popular board game. We make use of some basic R functions to get a quick overview of the data.

In Exercise 2 we look at the act of overbidding and investigate how overbid auctions are distributed. For this purpose we determine in how many auctions the board game is overpayed and by how much. In doing so, we compare prices without respecting shipping costs to shipping included prices.

In Exercise 3 we compare the frequency of overbid auctions, the amount of "overbidders" and the proportion of overbids. This way we can observe how the act of overbidding influences the auctions outcome.

Exercise 4 introduces a new data set. It contains information about eBay auctions as well but for many different items. After making ourselves familiar with these items, we compare the frequency of overbidding among different product categories.

In Exercise 5 we do a hypothesis test in form of an excursus and check whether the average amount of overbidding we observe is significant.

Exercise 6 deals with factors that might be correlated to overbidding. This includes the analysis of bidder's experience and participation length in auctions as well as the division of our data into demographic groups and price levels.

In Exercise 7 we then model the relationship between the probability of a bidder submitting an overbid and some of these influencing factors by performing a probit regression.

Exercise 8 is a small excursus about the handling of time formats in R. The availability of BIN offers on eBay for price comparison is assumed to be given for any point in time. We check whether suitable BIN offers were available for actually all periods where auctions of our first data set were running.

Finally, Exercise 9 summarises our results.

How to solve this Problem Set

All exercises can be solved independently from each other. However, I recommend doing them in the given order for content-related reasons. Within an exercise, doing tasks in the right order is mandatory.

Info Boxes:

Info boxes are folded, just click on them to open and show more information. These boxes are constructed to save space as they contain detailed information about functions or variables. These boxes can be skipped, yet reading them is suggested.

Quizzes:

Quizzes are used to test your newly acquired knowledge but are not necessary to proceed. Select one or more options and press check to test your answer.

Code Chunks:

Code chunks are used to enter and run R code. In each exercise, you need to solve a chunk before you can go on with the next one. In order to interact with these chunks, you have several buttons to click on:

edit: When clicking on edit, you are able to modify the code within the chunk or enter new one. You always have to press this button first.
check: This button checks your solution and, if correct, makes the next chunk accessible to edit.
hint: If you need help solving the chunk, the hint button might give you a useful advice what to insert.
run chunk: This button runs the code without checking it against the deposited solution. This is useful if you want to try something out by running different functions.
data: This button sends you to the data explorer in which you can take a look at the data sets used.
solution: Click on this button if you are stuck. It displays a sample solution. After using this button you just have to click on check to proceed.

Tasks:

Tasks are something where your involvement is necessary. Here you are supposed to complete the code. Wherever you see a long underscore ___, there is something missing. Most of the time, you are given the body of the code and you are asked to fill in some parts like new functions. Make sure you remove the underscores when filling in some code, otherwise R don't recognizes it as runnable code. Sometimes you will find code chunks without a task to do. In this case, just press edit and check afterwards.

Awards:

You will earn awards for solving difficult tasks or larger exercises. Use awards() in any code chunk and run it to show all of them you collected so far.

Navigation

In order to navigate through the problem set, you can either use the taps for switching exercises or use the button on the bottom saying Go to next exercise... to proceed. At the start of each exercise, you need to load the required data sets again because data is only available within an exercise. Data from different exercises is not linked.

Exercise 1 -- Introduction

Let us begin with the first exercise. We will make ourselves familiar with the functioning of eBay auctions and take a brief look at the theory of rational behaviour. Furthermore, we will investigate the type of data we are using most by utilising a few data evaluation functions.

Observational Data and Auction Theory

To investigate the Bidder's Curse phenomenon, we are using data tables, generated from the American eBay platform. There are basically four data sets: The first one contains 167 eBay auctions of a popular board game from February to September 2004. The second one contains a history of bids for these auctions. The third data set contains 487 BIN offers for this particular board game from the same time period. The fourth data set consists of 1886 auctions for 94 other products from February, April and May 2007.

The eBay website is an auction platform where bidders can purchase items at. When sellers list items, they determine the auction length (usually seven days) and the start price. Bidders can place multiple bids at any time, visible for other bidders. The winner of the auction has to pay the final price which is the amount of the second highest bid plus a small increment (usually 1% to 5% of the second highest bid (eBay (2019))). We neglect this increment for reasons of simplicity. Therefore, we are basically studying bidders behaviour in a modified open-bid second-price auction. In game theory, a basic setup for this type of auction has a unique symmetric equilibrium depending on the bidder's item valuation and signals of competing bidder's (Harstad, R. M. et al. (1990)). However, multiple bidding and existence of a fixed price offer change the framework of the game. Thus, determining equilibria is difficult but it is clear that rational bidders never bid above the fixed price if there are no switching costs or kinds of uncertainty (Malmendier, U., & Lee, Y. H. (2011)).

The "Cashflow 101" Game

The first data set we use for looking at the Bidder's Curse phenomenon is a table, containing 167 eBay auctions of the board game "Cashflow 101" from February to September 2004. It is already prepared, such that it only contains non-cancelled auctions with a BIN offer available at the same time.

"Cashflow 101" was invented by Robert Kiyosaki (1996). It is more a collection of financial advises than a board game for pure entertainment and that is the reason why it is quite expensive. Do not consider buyers to be irrational just because they bid between $80 and $180 for a board game. In addition, if they do not care about prices, they would buy it instantly instead of spending their time in bidding at an auction. So this game matches our demand for a homogenous item which is also available throughout the whole auction for a stable fixed price.

cf101

Source: http://www.smartpinoyinvestor.com/wp-content/uploads/2014/02/

In order to work with the data, we first need to load it into the R environment of this problem set. There are many different file types and for every one of them, there is an appropriate read command. We will only use .rds files in this problem set for performance reasons. The associated read command is readRDS().

In the following tasks we want to get a brief overview of the Cashflow data and introduce the first bunch of important functions.

Start with loading the data set of Cashflow 101 auctions using readRDS(). After loading the Cashflow data, save it in the variable cf. To do so, just press edit and check afterwards.

#< task
cf <- readRDS("cf.rds")
#>

< award "Starter"

Good work! You have just earned your first award for importing data correctly.

>

Now we have made the data set available for use. Let us take a look at it by displaying the first few rows. Make yourself familiar with the head() function, explained in the info box below.

Task: Open the following info box.

< info "head() and tail()"

There are two very useful R basic functions: head(data, n) selects the top n rows of a data set. If n is negative, the function will select all except for the first n rows. tail(data, n) works the exact same way but refers to the end of the data set.

>

Task: Display the first four rows of the Cashflow data cf using the head() function.

#< task

# insert your code here
#>

head(cf, 4)

Sometimes the output is too large to be fully displayed (like in this case). Move the scroll bar at the bottom of the table to the right to see the other variables.

Each row represents an auction for a Cashflow 101 board game. The first auction for example starts with a price of $1, which was set by the seller when creating the listing. In addition to the final price of $132.50, the winner lopscrus has to pay $12 shipping costs which sums up to a total of $144.50. Because there is a BIN offer available throughout the whole auction (from Feb 22 to Feb 29 2004) for $129.95, the auction is considered to be overbid by $2.55. When comparing shipping included prices, the difference is even bigger ($4.60) because the BIN offer has cheaper shipping costs as well.

In the following info box, the different variables are explained in detail:

< info "Declaration of Variables - cf"

itemnumber:

This is the unique auction number, automatically assigned by eBay when a listing is created. eBay uses this continuous number to keep track of their auctions.

startprice:

The starting price of the auction [in $], set by the seller when creating the listing.

finalprice:

The highest bid when the auction is closed and therefore the final price [in $]. The bidder with the highest bid (winbidder) wins and has to pay the final price in exchange for the item.

shippinginfo:

Shipping costs, if available. Otherwise declared as 'NA'.

totalprice:

The final price including shipping costs [in $]. We call it the total price. If there are no shipping costs available, the totalprice is declared as 'NA' as well.

BIN.final:

The corresponding buy-it-now price without shipping costs. This is the lowest price the item can be purchased for, at any time while the auction is running.

numbids:

The total number of bids within the auction.

numbidders:

The amount of different active bidders at an auction. Any person submitting at least one bid is considered to be an active bidder.

winbidder:

The bidder who bids the final price and wins the auction, more specifically, his alias account name.

buyernumfeedback:

The number of feedback the winner has at that time. It states the number of rated transactions on the eBay platform, thus representing the activity and also indicating their experience with eBay auctions.

sellername:

The name of the seller.

overfinal_d:

A binary variable reflecting overbids. Coded with 1 if the auction is overbid, meaning that the final price ends up higher than the price for the BIN offer available at the same time. Coded with 0 if the final price ends up below the BIN price.

overfinal:

The amount of money [in $] by which the BIN price is exceeded. It is calculated as BIN.final-finalprice and can be negative, indicating that the auction is not overbid.

overtotal_d:

A binary variable reflecting overbids regarding the total price (includes shipping costs). Coded with 1 if the auction is overbid, meaning that the total price ends up higher than the price for the buy-it-now offer including shipping costs. Coded with 0 if the total price ends up below the BIN price.

overtotal:

The amount of money [in $] by which the BIN price with shipping is exceeded. It is calculated as totalprice - (BIN.final + BIN shipping costs) and can be negative, indicating that the auction does not end up overbid.

weekday_auctionend:

The weekday on which the auction ends [as name from Monday to Sunday].

start:

The time of the auction start [timezone "America/Los_Angeles" / UTC-7]. [as date format]

end:

The time of the auction end [timezone "America/Los_Angeles" / UTC-7]. [as date format]

>

When you look at the data, you might notice that all matching BIN prices you can see in the first few rows of our data frame are $129.95. Actually, there are only two sellers who offer Cashflow 101 games for buying it now. One requests $129.95, the other $139.95. In fact, 138 ouf of 166 observations have a matching BIN price of $129.95 which is 83% of all cases. Thus, we should find prices below that most of the time and there should not be any bidder buying the game for more than $140. Let's check this.

The capabilities of the programming language R are extended through user-created packages. The library(R-Package) command loads these additional R packages into the workspace so that you can use a whole lot of new R functions that someone has created to complement standard R functions. If you face an error of the form "could not find function "XY", try to load the appropriate package again. Loaded packages are only accessible within the same exercise tab.

< info "Function: filter()"

The function filter(data, condition) contained in the dplyr package is used to generate a subset of a data frame. If you have a data set cf that contains different eBay auctions and you want to keep only auctions that were overbid, you can use the following command:

library(dplyr)
auctions <- filter(cf, overfinal == 1)

>

Task: Find out which items are sold for a finalprice of more than $140. Use the filter() function for this task. If you are struggling with the syntax, take the code from the info box above as an example. Replace the underscore (___) with the right variable.

#< fill_in
# filter(cf, ___ > 140)
#>
filter(cf, finalprice > 140)
#< hint
cat("Filter for the variable 'finalprice'.")
#>

We observe a lot of auctions (45 out of 167) that end with a final price above $140.

Because the variable cf is a data frame, you can access single columns by using a dollar sign $ between the names of the variable and the column. Most R functions are quite intuitive, such as computing the length length(x), minimum min(x), maximum max(x), mean mean(x), median median(x) or any other quantile quantile(x) of a vector x.

Task: Find the mean final prices and shipping costs for all Cashflow games. Calculate the mean for finalprice (without shipping) and the mean of the variable shippinginfo.

Note: The argument na.rm=TRUE is necessary when a variable contains non-numeric values like text (for example there is a shipping option called "Local pickup" on eBay).

#< fill_in
# ___(cf$finalprice)
# ___(cf$___, na.rm = TRUE)
#>
mean(cf$finalprice)
mean(cf$shippinginfo, na.rm = TRUE)
#< hint
cat("Define the function 'mean' correctly. If you face non-numeric values, you need the argument 'na.rm = TRUE'.")
#>

We conclude: The mean final price of $131.96 is quite high, which is surprising as $129.95 is the buy-it-now price for a brand new item almost all of the time. One could argue that clever buyers on eBay consider shipping costs and they might be higher for BIN offers. However, the mean shipping costs for Cashflow 101 account for $12,51 which is even more than shipping costs for BIN offers (we will see later that they are $9.95 and $10.95)

In the next task, we want to find out if the Cashflow 101 board game is something that bidders want to buy several times or if they usually purchase this item just once. To find an answer, we help ourselves with another useful R base function: unique(data) removes all duplicate rows in a data set.

Quiz 1: Unique Buyers

< quiz "Unique Buyers"

question: Do you think that typical buyers of the Cashflow 101 board game like to buy several copies of it? Make a guess. choices: - YES - NO*

multiple: FALSE
success: Good guess, you are right. failure: Wrong answer. Try again.

>

Task: How many unique buyers do we have? Find it out by creating a vector of unique winbidders and calculate the length of it.

#< fill_in
# length(___(cf$___))
#>

length(unique(cf$winbidder))
#< hint
cat("Use the function 'unique' and grab the column 'winbidder' via $ sign.")
#>

167 items are sold to 164 different buyers. Hence, it seems like it is not worth buying multiple copies of a Cashflow 101 board game.

In the last task, we want to study at which days auctions typically end within the week. For this purpose, we make use of some more functions. The dplyr package provides some useful tools for data manipulation and restructuring. The function arrange() orders the rows of a data frame by a specific variable (ascending by default). The function group_by() groups a data frame by the value of one or more variables and makes sure that following operations are done for each group separately. It works very well in combination with summarise() which typically summarises a data frame to a set of single values.

< info "Data-manipulating Functions of dplyr"

The function arrange(data, variable) sorts a data frame in ascending order. As input parameter, choose a variable to sort by. If you want to sort in descending order, add the command desc() to your variable.

library(dplyr)

# sorted ascending
sorted_ascending <- arrange(cf, itemnumber)

# sortet decending
sorted_decending <- arrange(cf, desc(itemnumber))

The function group_by(data, variables) separates a data frame into groups. One group is generated for each value of the grouping variable. You can group by multiple variables as well. Alternatively, you can group by condition.

library(dplyr)

# grouped by final prices
grouped <- group_by(cf, finalprice)

#grouped by condition "finalprice > 140" TRUE or FALSE
grouped_finalprice_highlow <- group_by(cf, finalprice > 140)

The function summarise(data, functions) aggregates a data frame to a single row of values. If you are using grouped data, the output will be a data frame containing one row for each group. You can choose which functions are used for the summary but you can only take functions with single output values. Moreover, columns of the resulting data frame can be named within the summarise() function.

library(dplyr)
summarise(cf, avg_startprice= mean(startprice), avg_finalprice = mean(finalprice))

>

When using data manipulating functions, you usually have to save the output of every operation in a new variable. This produces quite an amount of code lines and slows down the run time. In order to avoid saving intermediate results or nesting a bunch of functions into each other, we will use the pipe operator (%>%).

< info "Pipe Operator %>%"

The pipe operator %>% connects functions which are used to perform operations after each other. This operator "pipes" the output from one function to the next one where it is used as an input. In order to chain functions together, you need to add the %>% operater at the end of each line of code except for the last one. Because output is forwarded, following functions do not need additional input data. The pipe operator works best with dplyr functions or R base functions. Functions from other packages might work as well but this will usually result in syntax errors.

The following example groups the data set cf by the condition that eBay's default startprice of $1 is set. As this expression can only be TRUE or FALSE, we will get two groups. After that, the mean number of bids is calculated separately for each group.

library(dplyr)
cf %>%
  group_by(startprice==1) %>%
  summarise(mean(numbids))

>

Task: Create a table which lists the number of finished auctions for each weekday. Use group_by() for the variable weekday_auctionend, summarise() the absolut frequency for each group and arrange() the data nicely from Monday to Sunday.

#< fill_in
# cf %>%
#   select(weekday_auctionend) %>%
#   group_by(___) %>%
#   ___(n = n()) %>%
#   arrange(match(weekday_auctionend, c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")))
#>
cf %>%
  select(weekday_auctionend) %>%
  group_by(weekday_auctionend) %>%
  summarise(n = n()) %>%
  arrange(match(weekday_auctionend, c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")))

#< hint
cat("Use the variable 'weekday_auctionend' for grouping and the summarise() command in the next line.")
#>

< award "Pipe Operator"

Good work! You are able to chain multiple operations using the pipe operator correctly.

>

Auctions at eBay usually last seven days (however, there are exceptions). In addition, people have more time to surf on eBay's website beside their regular job: in the evening and especially on the weekend. Therefore, it is no surprise that most auctions end on a Saturday or Sunday.

Now you know more about the board game Cashflow 101 and how it is sold on eBay. It is available for purchase at auctions where bidders compete in a modified second-price scenario. In addition, it can be purchased immediately at BIN offers for $129.95 or $139.95, depending on the observation period. Moreover, you know how rational bidders should behave in this scenario and in the next exercise we are going to find out how they actually do.

Exercise 2 -- The Phenomenon of Overbidding

As we have discussed before, overbidding in form of bidding more in an auction than the same item would cost in a BIN listing, is incomprehensible. However, this phenomenon is known in academic literature. There is evidence about large and persistent overbidding in second-price auctions, observed in laboratory studies (Cooper, D. J., & Fang, H. (2008)). In studies about bidding behaviour in English auctions, overbidding was also observed (Kagel, J. H., Harstad, R. M., & Levin, D. (1987)). In this exercise, we investigate if there is overbidding in our data of Cashflow 101 auctions and visualise how overbid auctions are distributed. For this purpose, we determine how many auctions are overpayed by how much. In doing so, we compare prices without respecting shipping costs to shipping included prices. Let us start with loading the Cashflow 101 data set again.

Load the Cashflow data again. To do so, just press edit and check afterwards.

#< task
cf <- readRDS("cf.rds")
#>

In our Cashflow data set, the column overfinal contains the amount that is payed to much compared to the buy-it-now finalprice (can be negative). Remember that overfinal ignores shipping costs and only compares the end price of the auction with the price of a buy-it-now offer available at the same time. Unfortunately, we have one row containing a "NA" value, probably because of a matching error to the BIN price. We need to cut it out though because otherwise it is counted as an observation later.

< info "Function: select()"

The function select(data, variables) from the dplyr package is used to select specific columns of a data frame. The following command for example takes the data set cf and only keeps the columns itemnumber, startprice and finalprice.

library(dplyr)
cf_prices <- select(cf, itemnumber, startprice, finalprice)

>

In order to visualise the problem, run the next code chunk. We select the columns itemnumber and overfinal of the Cashflow data set cf using the pipe operator. In addition, we only select rows containing NAs.

Press edit and check.

#< task
cf %>%
  select(itemnumber, overfinal) %>%
  filter(is.na(overfinal)==TRUE)
#>

Subsequently, we drop the erroneous data. This is done in the next task with the help of another package. The tidyr package is useful to make your data "tighter". The functions of this package can basically be used to clump or extend your data and complement the dplyr package when working with raw data sets. There are two functions of this package we are interested in: complete() and drop_na().

< info "Functions: complete() and drop_na()"

We will make use of the functions complete() and drop_na(). The first one completes a data frame with missing combination of data while drop_na() does the opposite and deletes rows with missing values. In fact, drop_na() works like filtering for NAs but does it for all columns at once.

>

Task: Use the pipe operator to select the columns itemnumber and overfinal of the data set cf, drop all rows containing NAs with the function drop_na(). After that, store the rest in the variable rem.NA.

#< fill_in
# library(tidyr)
# rem.NA <- cf %>%
# select(___) %>%
# ___

# rem.NA
#>
library(tidyr)
rem.NA <- cf %>%
  select(itemnumber, overfinal) %>%
  drop_na()

rem.NA

#< hint
cat("Select both variables, 'itemnumber' and 'overfinal'. 'drop_na()' does not need any input.")
#>

Now we have 166 auctions left to work with.

We are going to produce breaks with a length of 5, ranging from -50 to 50. Then we mutate a new column interval, where we cut the overbidding amount overfinal on the basis of these breaks. This way we assign every auction to a level of overbidding. After that, count the number of auctions for every interval.

< info "Function: cut() and mutate()"

cut(x, breaks) is an R base function which cuts a vector into intervals. Either the number of intervals or their ranges is set by the parameter breaks. For example, the following code produces intervals from 0 to 200 with a length of 50. Then the final prices of Cashflow 101 are assigned to the respective interval. The output is a vector, containing the matching intervals in the same order as the input vector.

c <- cut(cf$finalprice, breaks = c(0,50,100,150,200))

The function mutate(data, new column = calculation) from the dplyr package adds new columns to an existing data frame. If there is already a column with the same name, it will be overwritten. Calculations like summing up two variables is done row by row.

library(dplyr)
cf <- mutate(cf, totalprice = finalprice + shippinginfo)

>

Press edit and check to run the code.

Note: The command complete(interval, fill = list(n = 0)) in the last line produces zeros if intervals contain no value (otherwise we get problems when trying to plot it).

#< task
b <- c(seq(-50,50,5))
overfin_int <- rem.NA %>%
  mutate(interval = cut(rem.NA$overfinal, breaks=b)) %>%
  group_by(interval) %>%
  summarise(overfinal.n = n()) %>%
  complete(interval, fill = list(overfinal.n = 0))

tail(overfin_int)
#>

As a result, we get the vector b which defines the breaks where we want to cut the intervals of overbidding amount. In the table overfin_int above we have counted how many auctions are overbid by how much. The worst deals are two games that go for 45-50 dollars more than the buy-it-now price.

Press edit and check to do the same for the shipping-included prices of the variable overtotal which contains the amount that is overpaid with regard to shipping included prices.

#< task
overtot_int <- cf %>%
  select(itemnumber, overtotal) %>%
  drop_na() %>%
  mutate(interval = cut(overtotal, breaks=b)) %>%
  group_by(interval) %>%
  summarise(overtotal.n = n()) %>%
  complete(interval, fill = list(overtotal.n = 0))
#>

Before we can plot our results with ggplot, we need to reshape our data into long format. Run the following code and take a quick look at the joined data frame we want to plot. Make use of the function gather() to combine our columns overfinal.n and overtotal.n.

< info "Function: gather()"

The function gather(data) is part of the tidyr package. It is used for reshaping data frames. It is especially useful for transforming wide format data into long format by combining the information of multiple columns. Input parameters are: the data, the name for column of key variables key as well as the name of the column value. The parameter columns defines which variables will be combined together. Variables that are not selected will be kept as columns.

library(tidyr)
data <- data.frame(A=c("low", "medium", "high"), B=c(1,2,3), C=c(4,5,6), D=c(7,8,9))
data
gather(data, key = "Letter", value = "Value", columns = B:D)

>

Press edit and check to create a combined data frame with overbid auctions per price interval.

#< task
cf_int <- overfin_int %>%
  mutate(overtotal.n = overtot_int$overtotal.n) %>%
  gather(type, n, overfinal.n:overtotal.n)

cf_int
#>

The column interval states the over-/underbid amount in steps of 5, ranging from -$50 to +$50. The type indicates, whether the absolute frequencies of overbidding n belong to final or total prices.

The next code chunk creates a simple bar plot of this data. Just run the following code and get an overview of the overbidding phenomenon, I will explain the function used for the plot down below.

Press edit and check.

#< task
library(ggplot2)
ggplot(cf_int, aes(x=interval ,y=n)) +
  geom_bar(stat = "identity", aes(fill = type), position = "dodge") + ggtitle("Overbidding Amount") +
xlab("Ranges of over-/underpayment") +
ylab("Number of auctions")
#>

< info "Package: ggplot2"

The ggplot2 package is a powerful visualization tool. It provides many tools and functions to create graphics and offers a higher variety of options than basic functions like plot() or hist(). However, ggplot2 is not suited for 3D plots. The ggplot2 package uses a multi-layer concept, whereby layers are connected with a + sign. This allows us to plot different graphic objects together.

Here is a short overview of the functions used in the plot:

ggplot() initializes a ggplot object and the first parameter cf_int specifies the data frame that should be used. aes() defines the overall appearance of the plot, the "aesthetics". For example, assignment of axis as well as size, color or shape of the plot. stat="identity" is used to make the height of each bar equal to the values of the data. In our case, fill = type fills the bars of the bar plot with single colors, respective the type (with shipping or without). Furthermore, position="dodge" arranges the two types next to each other.
geom_...() adds a geometric object and defines its type. Every object can have its own aesthetics aes(). We use a geom_bar() object which generates a bar plot.
ggtitle() adds a title to the graphic.
xlab() and ylab() add text to the respective axes.

>

As you can see, the number of overbid auctions is quite significant. This holds for shipping included prices (blue) as well as for prices without shipping costs (red). It seems that underpayment is more frequent for final prices. We know from Exercise 1 that BIN offers have lower shipping costs in general which can be the reason for reduced underpayment in total prices. As a result, overpayment is more frequent for total prices but only in the interval [$0, $5]. For many items, shipping included or not, the prices paid are not just a few cents above the fix price but exceed it by $30 in 25% of all cases. Therefore, it is legitimate that we neglect the fixed increment. Even if eBay requires the winning bidder to pay an amount on top of the second last bid of $2.50 (increment for prices of $100-$249.99 (eBay (2019)), this cannot be the reason for the occurrence of overbidding.

It has to be said though, that our sample size of 166 auctions is rather small, in particular when we divide the data into 20 intervals like this. You can see that there is no interval with more than 30 observations. As a result, we should be careful when interpreting these results. Nevertheless, it is clearly visible that overbidding is not a marginal phenomenon. In the next Exercise, we will focus on the amount of bidders who overbid and the number of overbids submitted. Then we will evaluate the influence of such behaviour.

Exercise 3 -- Disproportional Influence of Overbidders

In this exercise, we take a closer look at proportions: The shares of overbidders, overbids and overbid auctions. We investigate if there are really so many irrational bidders like it seems and how auctions are influenced by overbidding.

We base this investigation on auctions for the Cashflow 101 board game. Besides the data set of 167 Cashflow auctions, we also have information about bids submitted for most of these auctions. The data set bidhistory contains 2353 single bids for 139 Cashflow games in its rows, sorted by the time the bid was placed.

Press edit and check to load the bidhistory data set.

#< task
bidhistory <- readRDS("bidhistory.rds")
#>

Task: Use the head() function to take a first look at the bidhistory.

#< fill_in
# head(___)
#>
head(bidhistory)

#< hint
cat("Use the variable 'bidhistory' as input.")
#>

The first few rows show consecutive bids for the same item. This can be seen for example in the columns itemnumber, winbidder or finalprice. They all share the same values whenever they refer to the same auction. The main differences thought, are the columns bidvalue, bidprice, biddername and leader. As each row represents another bid, ordered by biddate, bidprices are increasing continuously until the auctions ends. The bidprice increases as soon as a bidder submits a higher bid. If this is the case, he becomes the new leader.

The info box below specifies all variables in detail.

< info "Declaration of Variables - bidhistory"

itemnumber:

As before, the unique auction number. The reason for multiple rows containing the same item number is of course the fact that all of the respective bids belong to the same item.

startprice:

The starting price of the auction [in $], set by the seller when creating the listing.

bidvalue:

The value of the submitted bid [in $].

bidprice:

The price of the item after the bid is placed. If the bid is high enough, the price will increase to the bidvalue of the previous bid plus a small increment. The increment depends on last bidprice and usually amounts for 1% to 5%. Bid increments are smaller when the bid price is low and larger at higher price levels. For our Cashflow game auctions, the increment on the American eBay.com site, where we got the data set from, is set at $1 for bids of $25.00-$99.99 and $2.50 for bids of $100-$249.99 (eBay (2019)). In the end, the winner of an auction does not necessarily pay his last bid but is charged the bidvalue of the bidder before him plus an increment. Basically a second price auction.

Although the amount of the increment is orientated at the second last bid, we can still assume that winning bidders are most likely to pay an increment of $2.50 because almost all cf games (except for one) ended up with a final price of $100 or more. In addition, no Cashflow item reached a final price of $250 or more. For simplicity, we neglect the increment, repeated bidding within a time limit, reserve prices and progressive bid framing of eBay auctions.

biddername:

The name of the bidder who places the bid.

leader:

The current leader of the auction. This variable equals the biddername if the current bidder places a bid above the last bidvalue.

winbidder:

The bidder who bids the final price and wins the auction, more specifically, his alias name.

finalprice:

The highest bid when the auction is closed and therefore the final price [in $]. The bidder with the highest bid (winbidder) wins and has to pay the final price in exchange for the item.

shippinginfo:

Shipping costs of the auction item, if available. Otherwise declared as 'NA'.

totalprice:

The total price of the auction item, calculated as finalprice + shippinginfo. If there are no shipping costs available, the totalprice is declared as 'NA' as well.

numbids:

The total number of bids within the auction.

sellername:

The name of the seller.

auction.overfinal_d:

auction.overfinal:

The amount of money [in $] by which the BIN price is exceeded. It is calculated as BIN.final-finalprice and can be negative, indicating that the auction is not overbid.

auction.overtotal_d:

A binary variable reflecting overbids regarding the total price (includes shipping costs). Coded with 1 if the auction is overbid, meaning that the total price ends up higher than the price for the BIN offer including shipping costs. Coded with 0 if the total price ends up below the BIN price.

auction.overtotal:

The amount of money [in $] by which the BIN price with shipping is exceeded. It is calculated as totalprice - (BIN.final + BIN shipping costs) and can be negative, indicating that the auction does not end up overbid.

bid.overfinal:

A binary variable indicating if the bid is an overbid, regarding final prices. Coded with 1 if the bid is an overbid, meaning that the bid is higher than the price for the BIN offer available at the same time. Coded with 0 if the bid is below the BIN price.

bid.overtotal:

A binary variable indicating whether the bid is an overbid, regarding total prices. Coded with 1 if the bid is an overbid, meaning that the bid + shipping is higher than the price for the fitting buy-it-now offer (including shipping costs) available at the same time. Coded with 0 if the bid is below the BIN price.

overbid:

A binary variable indicating if the bidder ever overbid in this auction, regarding final prices. Coded with 1 if either the current bid or another bid from the same bidder within the same auction is an overbid (finalprice is higher than the price for the BIN offer). Coded with 0 if the bid is below the BIN price.

biddate:

The time when the bid is placed [as date format].

enddate:

The time when the auction ends [as date format].

totalleadtime_in_days:

The total time a bidder is leader at the auction, summing up all time intervals within the auction run time the bidder is lead bidder until he gets outbid by someone else or until the auction ends.

>

First, we want to make the data set slimmer by only keeping one row per auction. As the variable overfinal_d flags an auction as (not) overbid, it is (FALSE) TRUE for every bid on this item. In order to strike out redundant rows, we could use the unique() function again. However, the dplyr package contains an useful alternative, called distinct(). It is less complicated to implement when it comes to unique combinations of variables and works within a dplyr chain.

< info "Function: distinct()"

distinct(data, features, .keep_all = FALSE) is another dplyr function and only keeps unique combinations of features. Note that this function always selects the first unique combination it finds when going through a dataset and drops all following combinations when proceeding (from top to bottom). At default, all other columns are removed. If you want to keep the entire row, you need to set the parameter .keep_all on TRUE.

>

Task: Use distinct() to only select rows with unique itemnumbers. Count the number of overbid auctions without shipping and mutate a column with the corresponding percentage value. Store all in the variable influence.auction.

#< fill_in
# influence.auction <- bidhistory %>%
#   distinct(___ , .keep_all = TRUE) %>%
#   count(___) %>%
#   mutate(percentage = n/sum(n))

# influence.auction
#>
influence.auction <- bidhistory %>%
  distinct(itemnumber, .keep_all = TRUE) %>%
  count(auction.overfinal_d) %>%
  mutate(percentage = n/sum(n))

influence.auction

#< hint
cat("Use 'distinct()' for 'itemnumber' and count all overbid auctions (based on final prices).")
#>

We count 60 overbid auctions which is almost half of our data. Because a colored plot is much nicer to look at than such a table, we make use of ggplot() again. In addition, we compare the proportion of overbid auctions to the amount of overbidders and overbids.

Use the code below to plot three simple pie plots, showing the relations of overbid auctions, overbidders and the relation of exceeding bids as well. Press edit and check.

#< task
# define pie1

pie1 <- ggplot(influence.auction, aes(x="", y=percentage, fill=as.logical(auction.overfinal_d)))+
    geom_bar(width = 1, stat = "identity") + 
    coord_polar("y", start=0)+
    theme_void()+
    geom_text(aes(label=percent(percentage)), position = position_stack(vjust=0.5))+
    labs(fill="overbid", title="Does the auction end up overbid?") +
  scale_fill_brewer(palette="Paired")

# calculate data for pie2
influence.bidder <- bidhistory %>%
  group_by(biddername) %>%
  summarise("bid.overfinal"= max(bid.overfinal==1)) %>%
  count(bid.overfinal) %>%
  mutate(percentage = n/sum(n))

# define pie2
pie2 <- ggplot(influence.bidder, aes(x="", y=percentage, fill=as.logical(bid.overfinal)))+
  geom_bar(width = 1, stat = "identity") + 
  coord_polar("y", start=0)+
  theme_void()+
  geom_text(aes(label=percent(percentage)), position = position_stack(vjust=0.5))+
  labs(fill="overbid", title="Does the bidder ever overbid?")+
  scale_fill_brewer(palette="Spectral")

# calculate data for pie3
influence.bid <- bidhistory %>%
  count(bid.overfinal) %>%
  mutate(percentage = n/sum(n))

# define pie3
pie3 <- ggplot(influence.bid, aes(x="", y=percentage, fill=as.logical(bid.overfinal)))+
  geom_bar(width = 1, stat = "identity") + 
  coord_polar("y", start=0)+
  theme_void()+
  geom_text(aes(label=percent(percentage)), position = position_stack(vjust=0.5))+
  labs(fill="overbid", title="Is the bid an overbid?")+
  scale_fill_brewer(palette="PuRd")

# plot pie charts
library(gridExtra)
grid.arrange(pie1, pie2, pie3, ncol=1)

#>

We observe an amount of 43.2% of overbid auctions but the share of bidders who ever submit an overbid is only 17%. The share of bids that actually are overbids is even smaller, only 10.6%. A clear conclusion is that a high frequency of overbid auctions of 43.2% does not necessarily mean that the "typical" buyer pays too much. Instead, overbid auctions are generated by a relative small number of overbids. In summary, it can be said that a small amount of bidders with few overbids have a disproportional influence on the auctions' outcome. This is the nature of auctions of course. We proceed with our investigation in the next exercise. This time we will test whether our findings also apply for other items than the Cashflow 101 game.

Exercise 4 -- Overbidding at Various Products

In this exercise, we want to show that the phenomenon of overbidding is not restricted to a single item like the Cashflow 101 game but is also observable for other items. For this purpose, we use data of 94 various products like books, consumer electronics or cosmetics. If you want to know what items we are talking about in particular, you can take a look at the following info box. It shows a detailed list of all items for which we have data available. For our investigations however, we will use a different data set containing 1886 auctions for these products. The data set dat has one row for each auction, just like the Cashflow data set. dat is loaded below, so you can skip this info box without coming to harm.

< info "Various products - Full Item List"

Here you can see the full list of various products (everything but Cashflow 101) if there is at least one observation in form of a completed auction with corresponding BIN offer. In summary there are 1886 auctions for 94 different items.

For future use, each item is assigned to demographic groups. These groups are gender (Female, Male), age (Adult, Teenager, Young) and political conviction (Conservative, Liberal). This assignment refers to the winner of the auction and is based only on an assumption about typical consumer behaviour, thus our data is quite noisy. Products are categorised as follows:

Female products
Hair straighteners (Fourk Chi, T3 Tourmaline)
Cosmetics (Lancome Fatale/Definicils mascara)
Perfume for Women (Calvin Klein Eternity, Lovely Jessica Parker, Escada Island Kiss)
Bright iPods (pink)
Male products
Electric shavers (Braun 8995/8985, Norelco 8140)
Hair tonics (Bumble & Bumble)
Perfume for Men (Calvin Klein Eternity)
Dark iPods (blue, green, silver)
Adult products
All consumer electronics (phones, GPS navigators, iPods, MP3 players, power cords, calculators, DLP projectors, speaker systems, digital cameras)
Teenager products
Games (Super Mario Brothers)
Console supplies (PlayStation 3/Xbox controller)
Young products
Toys (Tickle Me Elmo)
Conservative
Books with conservative content (Cultural Warriors by O'Reilly)
Liberal products
Books with liberal content (Audacity of Hope by Obama)

Source: Own illustration

>

Let us import the data set dat. It contains 1886 auctions from Feb, April and May 2007, downloaded from eBay by using the advanced search for finished auctions. The variables of this data set are the same as for the Cashflow game with one exception: The overbidding amount is not given in USD this time but represent percentage values of the BIN price (overfinal_percent). It is calculated as (finalprice - BIN.final) / BIN.final. A value of 40% for example tells us that the corresponding BIN price is exceeded by 40%. Like before, this value can be negative (underpayment).

Load the data. To do so, just press edit and check afterwards.

#< task
dat <- readRDS("dat.rds")
#>

< info "Function: top_n()"

The function top_n(dat, n, wt) of the dplyr package works similar to the head() function but has an additional argument that allows you to sort your data frame before taking the first rows. The optional parameter wt specifies the variable used for ordering. If n is negative, the rows with the lowest value for wt are selected. Note that top_n() will select more than n rows, if there are lines with same values for the chosen variable wt.

library(dplyr)
top_n(dat, 3, itemnumber)

>

Task: Give yourself a short overview of the new data set of auctions. To do so, use the top_n() function and select the top 5 most expensive items.

Note: use the argument wt=finalprice.

#< fill_in
# library(dplyr)
# ___
#>
library(dplyr)
top_n(dat, 5, finalprice)

#< hint
cat("Take a look at the example from the info box above.")
#>

If you are interested in a detailed explanation of the variables used in the data set, please check the info box:

< info "Declaration of Variables - dat"

observation

A unique ongoing integer numbering the observations from 1 to 1886.

itemtype:

The category the item belongs to. There are 12 categories in total, like books or automotive products.

finalprice:

The highest bid when the auction is closed and therefore the final price [in $]. The bidder with the highest bid (winbidder) wins and has to pay the final price in exchange for the item.

shippinginfo:

The shipping costs, if available. Otherwise declared as 'NA'.

BIN.final:

The price of the related BIN offer (without shipping), used to indicate overbids. Because we have many different items in this data set, BIN.final is not steady this time.

overfinal_d:

A binary variable reflecting overbids. Coded with 1 if the auction is overbid, meaning that the final price ends up higher than the price for the related BIN offer available at the same time. Coded with 0 if the final price ends up below the BIN price.

overfinal_percent:

The amount of money by which the BIN price is exceeded as a proportion of the BIN price. It is calculated as (finalprice - BIN.final) / BIN.final and can be negative, indicating that the auction does not end up overbid.

gender, age, political:

These are categorical variables indicate the demographic group that might purchase the item at the auction. Because the eBay listing does not contain demographic information about the winner, the original authors estimate these variables based on the type of the item. For example, perfume brands indicate the gender of the buyer, buyers of an Xbox 360 controller are usually teenagers and books like "Audacity of Hope" by Obama are most likely purchased by bidders whose political conviction is liberal (Malmendier, U., & Lee, Y. H. (2011)). You find a detailed description of the categorisation of every single item in the info box "Various products - Full item list" which is located at the start of this exercise.

>

We would like to know, whether overbidding is restricted to certain item types. Therefore, the first step is to list all item types that are available.

Task: List up all item types of our data frame, accessible by $itemtype. There are 12 different groups in total. Make use of the unique() function again from last exercise to filter out duplicates.

#< fill_in
# ___
#>
unique(dat$itemtype)

#< hint
cat("Refer to the item type by using 'dat$itemtype'.")
#>

Quiz 2: Most Overpayed Item Type

< quiz "Most Overpayed Item Type"

question: Make a guess. What kind of item might get overpayed most in auctions choices: - Automotive Products - Books* - Computer hardware - Consumer electronics - Cosmetics - DVDs - Financial software - Home products - Perfume - Personal care products - Sports equipment - Toys & Games

multiple: FALSE
success: You're right, but let us check how high the frequency actually is. failure: Good guess, we will check the answer later.

>

Run the following code and take a look at the summary. Press edit and check.

#< task
overbidding_categories <- dat %>%
  rename("Itemtype"="itemtype") %>%
  group_by(Itemtype) %>%
  summarise("Observations"=  length(overfinal_percent),
            "Mean [Share of BIN]" = mean(overfinal_percent, na.rm = TRUE),
            "Overbids" = length(which(overfinal_d==1)),
            "Overbid_frequency" = length(which(overfinal_d==1))/length(overfinal_d)) %>%
  ungroup() %>%

  # add line with all types
  rbind(list("all types",
             length(dat$overfinal_percent),
             mean(dat$overfinal_percent, na.rm = TRUE),
             length(which(dat$overfinal_d==1)),
             length(which(dat$overfinal_d==1))/length(dat$overfinal_d))) %>%
  arrange(Itemtype)

# plot summary and round numbers for better readability
overbidding_categories %>%
  mutate_at(2:5,funs(round(.,digits=3)))

#>

We have got 1886 observations: completed auctions of items from different categories. The mean tells us by how much the auction price exceeds the respective BIN offer on average. The column Overbids counts all overbid auctions while Overbid_frequency presents this amount by a proportion of all observations. For example, sports equipment gets overbid in 56.4% of all cases and if it is an overbid, the buyer pays 50.2% more than the BIN price. Interestingly, books have the highest overbid frequency among all items.

Now it is time to create your first own plot.

Task: Use ggplot to visualize the overbid frequency per item type in a bar plot. Use the geom_bar for it.

Note: The parameter +theme(axis.text.x = element_text(angle = 45, hjust = 1)) is used to turn the labels by 45°.

#< fill_in

# library(ggplot2)
# ggplot(___, aes(___))+
# geom_bar(stat = "identity")+
# labs(fill="overbid", title="Overbidding by Item Iype -- Finalprice (Without Shipping)")+
# theme(axis.text.x = element_text(angle = 45, hjust = 1))

#>
library(ggplot2)
ggplot(overbidding_categories, aes(Itemtype, Overbid_frequency))+
  geom_bar(stat = "identity")+
  labs(fill="overbid", title="Overbidding by Item Type -- Finalprice (Without Shipping)")+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

#< hint
cat("First, fill in name of the table we want to plot. For the aesthetics, just use the item type and the overbid frequency as input parameters. 'ggplot()' will plot these two against each other.")
#>

< award "Graphic Designer"

Good work! You have created a barplot on your own using ggplot.

>

Here we have plotted the overbid frequencies again for a better visual comparison. We should not overstate these results however, because the number of observations we have from each item category is vastly different. Incidentally, this is why we have no bar for automotive products. Simply none of the nine auctions is overbid. What we can observe in fact is that overbidding is not just a marginal phenomenon, restricted to some item categories. In almost all categories, we find an overbid frequency of at least 24%. Automotive products are the neglectable exception here due to the small number of observations.

Finally, over all categories combined, we notice a huge amount of 48% irrational overpayment. Furthermore, it seems that overbidding is quite common and not limited to single item types.

Note that we only used final prices so far. However, in order to avoid repeating the same calculations again, I can just tell you that the results for shipping included prices are very similar with a little less overbidding in each item category. The total overbid frequency of all item types combined is 40.1%. If you are interested in more details, please open the info box below. It contains runnable code which displays the corresponding table and bar plot.

! start_note "Info: Overbidding by Item Type -- Totalprice (with Shipping)"

Press edit and check to display the table and bar plot for total prices.

Note: This will take some time to run.

#< task_notest
# load data with total prices
overbidding_categories2 <- readRDS("overbidding_categories2.rds")

# show the table
grid.table(overbidding_categories2 %>%
           mutate_at(2:5,funs(round(.,digits=3))), rows = NULL)

# plot the result
ggplot(overbidding_categories2, aes(Itemtype, Overbid_frequency))+
  geom_bar(stat = "identity")+
  labs(fill="overbid", title="Overbidding by Item Type -- Totalprice (With Shipping)")+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
#>

! end_note

Exercise 5 (Excursus) -- Hypothesis: Overbidding is Significant on Averages

The question whether overbidding at certain item groups is significant is not part of the paper, nor does it belong to the key issue. However, it is still worth investigating as one might find it interesting to see that the average amount overbid is significantly different from zero. Therefore, we aim to verify this hypothesis in form of an excursus. From the point of view of rational bidding behaviour, one might think that the amount of overbidding is at most 0. In the following section we test based on an even harder restriction: the null hypothesis that the amount of overbidding is 0 on average. We talk about statistical significance when it is very unlikely that the observed result occurred under the null hypothesis so that it can be rejected.

We begin with testing the final prices of our Cashflow game and want to reject the hypothesis that the mean of overbid amount without shipping costs is 0:

$$H_0: \mu_{overfinal} = 0$$

First, we build a confidence interval for the amount overbid without shipping overfinal. These intervals have the following form:

$$[\bar{X}_l , \bar{X}_u]$$

$x_l$ is the lower bound, $x_u$ the upper bound. We determine the bounds of our confidence interval such that the probability for the mean of our sample $\bar{X}$ being inside the interval is:

$$P(\bar{X}_l \le \bar{X} \le \bar{X}_u) = 1-\alpha$$

Based on the assumption that the overbid amount is normally distributed, our confidence interval is calculated as follows:

$$[\bar{X} - z_{(1-\frac{\alpha}{2})} \cdot \frac{\sigma}{\sqrt{n}} , \bar{X} + z_{(1-\frac{\alpha}{2})} \cdot \frac{\sigma}{\sqrt{n}} \bar{X}]$$

, where $z_{(1-\frac{\alpha}{2})}$ denotes the $(1-\frac{\alpha}{2})$ quantile of the standard normal distribution while $\sigma$ is the standard deviation of the overbid amount for our sample size $n$.

Step 1: Load the Cashflow 101 data set. To do so, just press edit and check afterwards.

#< task
cf <- readRDS("cf.rds")
#>

Step 2: Task: Calculate the number of observations and the mean of the variable overfinal. In addition, calculate the standard deviation SD as well as the standard error SE of that variable. Summarise all in the data frame overpayment_final. Remember from Exercise 1 that we have one row containing a 'NA' value, probably because of a matching error to the BIN price. Find a way to work around it in the data set.

Note: Use the argument na.rm=TRUE in your functions.

#< fill_in
# overpayment_final <- summarise(cf,
#                           Observations = ___(which(!is.na(overfinal))),
#                           Mean = mean(overfinal, ___ ),
#                           SD = sd(overfinal, ___ ),
#                           SE=(SD/sqrt(Observations)))

# overpayment_final
#>
overpayment_final <- summarise(cf, 
                        Observations=length(which(!is.na(overfinal))),
                        Mean= mean(overfinal, na.rm = TRUE), 
                        SD=sd(overfinal, na.rm = TRUE),
                        SE=(SD/sqrt(Observations)))

overpayment_final

#< hint
cat("Make use of the function 'length()' to count your observations. The functions 'mean()' and 'sd()' will have problems with non-numeric values if the argument 'na.rm = TRUE' is missing.")
#>

Step 3: Task: Calculate the bounds of the 95% confidence interval Xl and Xu and add them as columns to our data frame. Use the given a and z as well as the formula for confidence intervals.

#< fill_in
# a <- 0.05
# z <- qnorm(1-a/2)

# overpayment_final <- overpayment_final %>%
#   mutate(Xl = ___) %>%
#   mutate(Xu = ___)

# overpayment_final
#>
a <- 0.05
z <- qnorm(1-a/2)

overpayment_final <- overpayment_final %>%
  mutate(Xl = Mean-z*SE) %>%
  mutate(Xu = Mean+z*SE)

overpayment_final

#< hint
cat("The lower bound 'Xl' is 'z*SE' smaller than the 'Mean', the upper bound 'Xu' is 'z*SE' larger than it.")
#>

In order do decide whether or not the positive mean of the overbidding amount is significant, we test the null hypothesis. If the value overfinal = 0 lies outside of our confidence interval, we can reject the null hypothesis at the significance level a and conclude that overbidding on average is not just a random observation.

Step 4: Task: Check whether 0 lies outside the interval [Xl, Xu]. The function between(x, left, right) provides a logical value after checking if the zero lies between two bounds.

#< fill_in
# between(___)
#>

between(0, overpayment_final$Xl, overpayment_final$Xu)

#< hint
cat("Refer to the lower and upper bound by using 'overpayment_final$'.")
#>

As 0 lies within our confidence interval, we cannot reject the null hypothesis and therefore cannot call the phenomenon of overbidding significant at the 5% level (for the Cashflow 101 game without shipping costs). Note that we cannot conclude the opposite. This does not necessarily mean that overbidding for this item is not significant at all, it might be significant at a different level.

Press edit and check to do the same evaluation for total prices of Cashflow 101 as well as for all other items in the data set of various products. You will receive all tables listed under each other.

#< task
overpayment_cf <- readRDS("overpayment_cf.rds")
overpayment_various <- readRDS("overpayment_various.rds")

overpayment_cf
filter(overpayment_various, `Comparison price` == "final")
filter(overpayment_various, `Comparison price` == "total")
#>

As we can see, overbidding is not significant at the 5% level for some item types. Furthermore, auction prices can be significantly lower than the BIN price. For all product types of various products combined however, auction prices are significantly higher than the fix price. The same holds for the Cashflow 101 board game, but only if we consider shipping costs.

Now that we know that overbidding is not only pretty common but also significant for some item types, we are going to investigate possible factors that cause such behaviour in the next exercise.

< award "Hyphothesis Tester"

Good work! You have verified the significance of overbidding in some of our data.

>

Exercise 6 -- Possible Factors Influencing Overbidding

In this exercise, we will do some evaluations and try to find out, where the phenomenon of overbidding comes from. For this purpose, we will examine some factors that might be correlated with overbids and set up the following four theses:

Thesis 1 - Experienced bidders overbid less.
Thesis 2 - Bidders who participate longer in auctions as lead bidder overbid more.
Thesis 3 - Overbidding is caused by certain demographic groups.
Thesis 4 - Overbidding only appears at items of a certain price level.

Thesis 1 and 2 test for Cashflow games, whether the level of experience or the participation length at an auction is causing bidders to submit an overbid. Thesis 3 and 4 are based on the various products data set. We will divide this data into demographic groups using item information and price levels based on final auction prices. We only consider overbids based on final prices (without shipping). This way, we exclude overbids due to low awareness of different shipping costs.

Thesis 1: Overbidding by Experience

Our first thesis is that experienced bidders know better about eBay's auctions and fixed price offers and know better how to navigate between them.

We measure the experience someone has in being a buyer on the basis of his amount of feedback. Every bidder on eBay receives feedback for former transactions by the respective counterparty. The variable buyernumfeedback contains the amount of feedback the bidder has at the time he places his bid. Having a large amount of feedback indicates that this account has bought or sold many items on eBay. We suppose that these people overbid less than unexperienced eBay members. We take the variable buyernumfeedback out of a modified version of the Cashflow 101 data set: cf_short. It contains only relevant variables for this task to reduce the running time of the code.

Start with loading the first data set cf and downsize the number of variables. To do so just, press edit and check afterwards.

#< task
cf_short <- readRDS("cf.rds")  %>%
  select(itemnumber, winbidder, buyernumfeedback, overfinal_d)
#>

In the next step, we want to divide our data into two equally sized groups: experienced bidders and rather unexperienced. For this purpose, we compute the middle of our data: the median.

Task: Compute the median of buyernumfeedback in the data set cf_short.

Note: Use the $ sign to refer to that variable.

#< fill_in
# ___
#>
median(cf_short$buyernumfeedback)

#< hint
cat("Refer to the number of buyer feedback by using 'cf_short$'.")
#>

Now we form two groups of buyernumfeedback: larger than the median and below or equal to the median. This way, we make sure to split our data in the middle and obtain two equally sized groups, hence we can compare the overbid frequencies without being biased by different sample sizes.

Task: Use a dplyr chain to group the Cashflow data by the variable buyernumfeedback. Differentiate between > median and <= median. summarise the number of observations and the overbid frequency.

Note: For the median, use the number you calculated above, not a variable.

#< fill_in
# overbid_by_exp <- cf_short%>%
#   drop_na() %>%
#   ___ %>%
#   summarise(n = ___,Overbid_freq = ___)

# overbid_by_exp
#>
overbid_by_exp <- cf_short%>%
  drop_na() %>%
  group_by(buyernumfeedback>4) %>%
  summarise(n = length(buyernumfeedback), Overbid_freq = mean(overfinal_d))

overbid_by_exp

#< hint
cat("Use 'groupby()' for the test, whether 'buyernumfeedback' is bigger than 4. Afterwards, summarise the length of this variable and the mean of the binary variable 'overfinal_d'.")
#>

Task: Use a geom_bar to plot your results.

#< task_notest
#...
#>

After you have done that, it should look like the plot below, basically showing no difference in overbid frequencies depending on the number of buyer feedback.

Ex.5-barplot

The measurement of experience is imperfect since some eBay users do not leave feedback. Therefore, the number of feedback does not match the number of past transactions. However, our measure is sufficient to reject the hypothesis that only unexperienced bidders overbid as users with a high amount of feedback have at least the same amount of finished transactions. The number of auctions the bidder participated in without winning them might be much higher.

It seems that significant experience does not help bidders to learn how to bid more optimal. This is consistent with Garratt, R. J. et al.(2012) who also find the same amount of overbidding behaviour in eBay auctions for novices and experienced bidders.

Thesis 2: Utility from Winning - Overbidding and Participation Length at Auctions

In the following, we want to consider the quasi-endowment effect as one possible explanation for overbidding, that is evaluating goods higher if someone "quasi" possesses it. Quasi-endowment is a sense of ownership that bidders get during the auction, it causes an effect of weighting a loss from losing an item (by losing the auction) higher than the utility gained from getting another item of the same type. In other words: Bidders might be willing to pay more for the same item if they are the lead bidder and therefore in quasi-possession. This effect should become stronger as the lead time increases. Academic studies suggest that bidders are affected by the endowment effect when participating in auctions (Wolf, J. R., et al.(2005), Heyman, J. E. et al. (2004)). Even though it is questionable whether this effect can explain bidding above the BIN price, we are going to test whether bidders are more likely to overbid in an auction when they participate longer, in particular as the lead bidder.

Let us start with the data set of our bidhistory, containing all bids for the Cashflow 101 game. In our first analysis we want to filter out auction winners and take their first bid per auction. Then we can summarise for (non-) overbid auctions how much time is left until the auction ends when bidders first enter it. The variable timeleft_days suits our needs. It indicates how many days are left for the auction to go when the bid was placed. On the basis of the variable overbid we can separate our bidders into overbidders and non-overbidders. Note that we have got one item in our data set where bidder names were not accessible. Hence we have only 138 observations of auction winners where we can make statements about their participation length.

Press edit and check to summarise the mean of auction time left for overbidders and non-overbidders.

#< fill_in
# bidhistory <- readRDS("bidhistory.rds")
# 
# bidhistory %>%
#   filter(biddername!="") %>%
#   filter(biddername==winbidder) %>% # filter for only winners
#   distinct(itemnumber, biddername, .keep_all = TRUE) %>% # take only first bids
#   group_by(overbid) %>%
#   summarise("time left [days]"=mean(timeleft_days),"observations" = length(itemnumber)) %>%
#   mutate_at(2,funs(round(.,digits=3)))
#>
bidhistory <- readRDS("bidhistory.rds")

bidhistory %>%
  filter(biddername!="") %>%
  filter(biddername==winbidder) %>% # filter for only winners
  distinct(itemnumber, biddername, .keep_all = TRUE) %>% # take only first bids
  group_by(overbid) %>%
  summarise("time left [days]"=mean(timeleft_days),"observations" = length(itemnumber)) %>%
  mutate_at(2,funs(round(.,digits=3)))

#< hint
cat("Filter for bidders who are winbidders and group by the variable 'overbid'.")
#>

A simple comparison of overbid frequencies does not match our assumption: Winners who do not overbid enter the auction on average 1.46 days before the auction ends. Winners who do overbid enter the auction later and therefore participate shorter, on average 1.27 days before auction end.

In our second analysis we want to take a look at the total time a bidder is the lead bidder and if we see a possible relation to overbidding. At the start, we need to filter out lead bids. This way, we can calculate the leadtime [in days] as the time from a leadbid to the next leadbid (from a different bidder). After that we can again filter for winners and summarise their total lead time per auction.

Press edit and check to calculate the mean of total lead time for overbidders and non-overbidders.

#< task
bidhistory %>%
  filter(biddername!="") %>%
  filter(biddername==leader) %>%
  group_by(itemnumber) %>%

  # group by each item, compute time between bids and if its the last bid ('default=' case), then take the remaining auction time

  mutate(leadtime = (lead(biddate, default = first(enddate)) - biddate)/ddays(1)) %>%

  # when subtracting dates, the resulting period is given in seconds. Thats why we convert it into days
  # take only one bid per bidder and auction, compute the total lead time.

  group_by(itemnumber, biddername) %>%
  mutate(totalleadtime_in_days = sum(leadtime, na.rm = TRUE)) %>%
  distinct(itemnumber, biddername, .keep_all = TRUE) %>%
  ungroup() %>%

  # take only winning bidders

  filter(biddername==winbidder) %>%  
  group_by(overbid) %>%
  summarise("total leadtime [days]" = mean(totalleadtime_in_days, na.rm=TRUE),"observations" = length(itemnumber)) %>%
  mutate_at(2,funs(round(.,digits=3)))
#>

< info "Functions: lead() and lag()"

lead(x, n, default = NA) and lag(x, n, default = NA) are functions of the dplyr package. They are used to refer to the "next" or "previous" element in a vector. Standard is a step by n=1 and a default value of NA for missing rows (e.g. and the end of the data frame when there is no more row to refer to).

library(dplyr)
x <- c(1,2,3,4)
x

A lead of 1.

lead(x, 1, default = NA)

A lag of 2.

lag(x, 2, default = NA)

A lag of 2 with missing values set to 0.

lag(x, 2, default = 0)

>

We find the same pattern for the time beeing the lead bidder: Winners who overbid are lead bidders for 1.03 days on average by the end of the auction. Winners who do not overbid are lead bidders for 1.24 days.

Thesis 3: Overbidding by Demographic Group

There is a large literature on different consumer behaviour in online auctions depending on demographics like age, gender or education level (Yeh, J. C. et al. (2012)). Even bidders from different local regions of the USA tend to behave differently (Black, G. S. (2007)). In this section, we want to study if there is different bidding behaviour in our data when it comes to overbidding. We check whether some demographic groups tend to overbid more than others. In the data set of various products dat, which contains many different items, we have some binary variables for gender, age and political conviction. These variables are associated with the winner of the auction. Combinations like "female and adult" are possible. However, not all items can be categorised, thus sample sizes differ across the demographic variables. Because bidder demographics are not directly observable by the listing of an eBay auction, items have been categorised based on an assumption. The original authors estimate these variables based on an indication like "usually bought by a certain consumer group". For example, perfume brands indicate the gender of the buyer and PlayStation controllers are associated with teenagers. (Malmendier, U., & Lee, Y. H. (2011)). If you want to know how every item is categorised, please take a look at the info box "Various products - Full item list" at the beginning of Exercise 2 when the data set dat was introduced.

Load the data set dat. To do so just press edit and check afterwards.

#< task
dat <- readRDS("dat.rds")
#>

< info "sample_n()"

sample_n(data, n) is another function of the dplyr package working just like head()/tail() or top_n(). The difference though is, sample_n does not select the top or bottom of your data set but a random sample. Thus, n must be positive. It is commonly used to reduce the size of samples.

library(dplyr)
sample_n(dat, n)

>

Task: Take a look at 5 random rows of the data using sample_n().

#< fill_in
# sample_n(___)
#>
sample_n(dat,5)

#< hint
cat("Take a look at the example in the info box above.")
#>

Task: Use the code chunk below and do whatever is necessary to answer the following questions. Use all items in the dataset and aim for the binary variable for overbidding without shipping costs overfinal_d. Be aware, that you will face some NA values. Remove them with drop_na(), filter(!is.na()) or the parameter na.rm=TRUE for the mean() or sum() function.

#< task_notest
#...
#>

Quiz 3: Overbidding by Demographic Group

< quiz "Overbidding by Demographic Group"

parts: - question: 1. When looking at our data, which group has a higher frequency of overbidding, women or men? choices: - Women - Men multiple: FALSE
success: Great, all answers are correct! failure: Wrong answer. Try again. - question: 2. Which of these age groups tend to overbid more often according to our data set? choices: - Adult - Teenager - Young multiple: FALSE
success: Great, all answers are correct! failure: Wrong answer. Try again. - question: 3. How often do liberal bidders overpay in our data? choices: - 18% - 25% - 40%* - 62% multiple: FALSE
success: Great, all answers are correct! failure: Wrong answer. Try again.

>

< award "Quizmaster"

Good work! You have answered all quizzes about overbidding in demographical groups correctly.

>

Now it is time to look at the results. Press edit and check to plot the overbid frequencies by demographic group:

#< task
consumer_dat <- dat %>%
  select("gender", "age", "political", "overfinal_d")
p_names <- c("Group","Overbid_frequency")

# calculate data for plots

c1 <- consumer_dat %>%
  group_by(gender) %>%
  summarise(mean(overfinal_d)) %>%
  drop_na()
colnames(c1) <- p_names
c2 <- consumer_dat %>%
  group_by(age) %>%
  summarise(mean(overfinal_d)) %>%
  drop_na()
colnames(c2) <- p_names
c3 <- consumer_dat %>%
  group_by(political) %>%
  summarise(mean(overfinal_d)) %>%
  drop_na()
colnames(c3) <- p_names

# define bar plots

library(ggplot2)
bar1 <- ggplot(c1, aes(Group, Overbid_frequency))+
  geom_bar(stat = "identity")+
  ggtitle("Overbidding by Gender")+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))+
  ylim(0,1)
bar2 <- ggplot(c2, aes(Group, Overbid_frequency))+
  geom_bar(stat = "identity")+
  ggtitle("Overbidding by Age")+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))+
  ylim(0,1)
bar3 <- ggplot(c3, aes(Group, Overbid_frequency))+
  geom_bar(stat = "identity")+
  ggtitle("Overbidding by Political Conviction")+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))+
  ylim(0,1)

# arrange plots next to each other

library(gridExtra)
grid.arrange(bar1, bar2, bar3, ncol=3)
#>

Basically, there is a significant amount of overbidding in each demographic subset. Therefore, no demographic group seems to be particularly vulnerable to the irrational phenomenon of overbidding.

Thesis 4: Overbidding by Price Levels

In the last part, we take a closer look at the price categories. Intuitively, one could think that buyers of low value items are more price sensitive and therefore overbid less. We want to test, whether these items are less likely to end up overbid then high value items. In the following section, we will group all items from the data set of various products by price ranges in order to check whether the amount of overbidding is correlated with the price level. At first, we will do this for all item types together. After that, we will consider each item type separately.

Press edit and check to cut the data set into price intervals and count overbid auctions for each price level.

#< task
library(tidyr)

# define price levels and n

pricelevel <- seq(0,250,10)
n <- length(dat$overfinal_d)

# calculate data frame with overbid frequencies for each interval

overbid_pricelevel_all <- dat %>%
  group_by(pricelevel=cut(BIN.final, breaks = pricelevel))%>%
  mutate(observations = n()) %>%
  complete(pricelevel, fill = list(observations = 0)) %>%
  mutate(overbid_freq = mean(overfinal_d)) %>%
  select(pricelevel, observations, overbid_freq) %>%
  arrange(pricelevel)  %>%
  complete(pricelevel, fill = list(overbid_freq = 0)) %>%
  ungroup() %>%
  distinct(pricelevel, .keep_all = TRUE ) %>%
  drop_na()

# show 10 random rows of the data frame

sample_n(overbid_pricelevel_all %>%
             mutate_at(3,funs(round(.,digits=3)))
         ,10)
#>

Press edit and check to plot your results.

#< task
library(ggplot2)
overbid_pricelevel_all <- overbid_pricelevel_all %>%
  mutate(price =rep(seq(10,250,10))) %>%
  mutate(l = paste(overbid_freq*observations, "/", observations)) # define lables

# plot price levels for all types

ggplot(overbid_pricelevel_all, aes(x = price, y = overbid_freq))+
  geom_bar(stat = "identity", fill="dodgerblue1")+
  geom_text(size=3,aes(label=l), position=position_dodge(width=0.9), vjust=-0.25)+
  ylim(0,1) +
  labs(title="All Item Types", x="price level [$]", y="overbid frequency")
#>

Here we can see the overbid frequency for each price range on the basis of the bar height. The numbers above each bar tell you how many items receive an overbid in this price range and how many observations we have. For example, there are 347 out of 494 items overbid in the lowest price category ($0-$10). Be careful with interpreting bar heights of price ranges with very few observations, they might not allow a robust conclusion as possible outliers among these auctions have a higher weight. Please note that there are a few items missing so that we only have 1778 out of 1886 observations. This is simply the case because we plot only prices up to $250 in order to avoid a cluttered, space-consuming graphic.

The following two code chunks will count the frequency of overbids by price level for each item type separately and then plot the result.

Press edit and check.

#< task

# define price levels and n

pricelevel <- seq(0,250,10)
n <- length(dat$overfinal_d)
overbid_pricelevel <- dat %>%
  group_by(itemtype, pricelevel=cut(BIN.final, breaks = pricelevel))%>%
  mutate(observations = n()) %>%
  complete(pricelevel, fill = list(observations = 0)) %>%
  mutate(overbid_freq = mean(overfinal_d)) %>%
  select(pricelevel, itemtype, observations, overbid_freq) %>%
  arrange(itemtype)  %>%
  complete(pricelevel, fill = list(overbid_freq = 0)) %>%
  ungroup() %>%
  distinct(itemtype, pricelevel, .keep_all = TRUE ) %>%
  drop_na()

# show iterim results

sample_n(overbid_pricelevel %>%
             mutate_at(4,funs(round(.,digits=3))) , 10)
#>

Do not be confused if you see a lot of zeros in this random sample. For some item types, there are no observations at certain price categories and consequently no overbids.

Press edit and check to see the overbid frequency over all item categories. This might take more time than usual.

#< task
overbid_pricelevel <- overbid_pricelevel %>%
  mutate(price =rep(seq(10,250,10),12)) %>%
  mutate(l = paste(overbid_freq*observations, "/", observations)) # define lables

# create bar plots

p1 <- ggplot(filter(overbid_pricelevel, itemtype=="automotive_products") , aes(x = price, y = overbid_freq))+
  geom_bar(stat = "identity", fill="dodgerblue1")+
  geom_text(size=3,aes(label=l), position=position_dodge(width=0.9), vjust=-0.25)+
  ylim(0,1) +
  labs(title="Automotive products (n=9)", x="price level [$]", y="overbid frequency")
p2 <- ggplot(filter(overbid_pricelevel, itemtype=="books") , aes(x = price, y = overbid_freq))+
  geom_bar(stat = "identity", fill="dodgerblue1")+
  geom_text(size=3,aes(label=l), position=position_dodge(width=0.9), vjust=-0.25)+
  ylim(0,1) +
  labs(title="Books (n=398)", x="price level [$]", y="overbid frequency")
p3 <- ggplot(filter(overbid_pricelevel, itemtype=="computer_hardware") , aes(x = price, y = overbid_freq))+
  geom_bar(stat = "identity", fill="dodgerblue1")+
  geom_text(size=3,aes(label=l), position=position_dodge(width=0.9), vjust=-0.25)+
  ylim(0,1) +
  labs(title="Computer & Hardware (n=186)", x="price level [$]", y="overbid frequency")
p4 <- ggplot(filter(overbid_pricelevel, itemtype=="consumer_electronics") , aes(x = price, y = overbid_freq))+
  geom_bar(stat = "identity", fill="dodgerblue1")+
  geom_text(size=3,aes(label=l), position=position_dodge(width=0.9), vjust=-0.25)+
  ylim(0,1) +
  labs(title="Consumer electronics (n=332)", x="price level [$]", y="overbid frequency")
p5 <- ggplot(filter(overbid_pricelevel, itemtype=="cosmetics") , aes(x = price, y = overbid_freq))+
  geom_bar(stat = "identity", fill="dodgerblue1")+
  geom_text(size=3,aes(label=l), position=position_dodge(width=0.9), vjust=-0.25)+
  ylim(0,1) +
  labs(title="Cosmetics (n=21)", x="price level [$]", y="overbid frequency")
p6 <- ggplot(filter(overbid_pricelevel, itemtype=="dvds") , aes(x = price, y = overbid_freq))+
  geom_bar(stat = "identity", fill="dodgerblue1")+
  geom_text(size=3,aes(label=l), position=position_dodge(width=0.9), vjust=-0.25)+
  ylim(0,1) +
  labs(title="DVDs (n=74)", x="price level [$]", y="overbid frequency")
p7 <- ggplot(filter(overbid_pricelevel, itemtype=="financial_software") , aes(x = price, y = overbid_freq))+
  geom_bar(stat = "identity", fill="dodgerblue1")+
  geom_text(size=3,aes(label=l), position=position_dodge(width=0.9), vjust=-0.25)+
  ylim(0,1) +
  labs(title="Financial software (n=151)", x="price level [$]", y="overbid frequency")
p8 <- ggplot(filter(overbid_pricelevel, itemtype=="home_products") , aes(x = price, y = overbid_freq))+
  geom_bar(stat = "identity", fill="dodgerblue1")+
  geom_text(size=3,aes(label=l), position=position_dodge(width=0.9), vjust=-0.25)+
  ylim(0,1) +
  labs(title="Home products (n=29)", x="price level [$]", y="overbid frequency")
p9 <- ggplot(filter(overbid_pricelevel, itemtype=="perfume_cologne") , aes(x = price, y = overbid_freq))+
  geom_bar(stat = "identity", fill="dodgerblue1")+
  geom_text(size=3,aes(label=l), position=position_dodge(width=0.9), vjust=-0.25)+
  ylim(0,1) +
  labs(title="Perfume (n=77)", x="price level [$]", y="overbid frequency")
p10 <- ggplot(filter(overbid_pricelevel, itemtype=="personal_care_products") , aes(x = price, y = overbid_freq))+
  geom_bar(stat = "identity", fill="dodgerblue1")+
  geom_text(size=3,aes(label=l), position=position_dodge(width=0.9), vjust=-0.25)+
  ylim(0,1) +
  labs(title="Personal care products (n=282)", x="price level [$]", y="overbid frequency")
p11 <- ggplot(filter(overbid_pricelevel, itemtype=="sports_equipment") , aes(x = price, y = overbid_freq))+
  geom_bar(stat = "identity", fill="dodgerblue1")+
  geom_text(size=3,aes(label=l), position=position_dodge(width=0.9), vjust=-0.25)+
  ylim(0,1) +
  labs(title="Sports equipment (n=55)", x="price level [$]", y="overbid frequency")
p12 <- ggplot(filter(overbid_pricelevel, itemtype=="toys_games") , aes(x = price, y = overbid_freq))+
  geom_bar(stat = "identity", fill="dodgerblue1")+
  geom_text(size=3,aes(label=l), position=position_dodge(width=0.9), vjust=-0.25)+
  ylim(0,1) +
  labs(title="Toys and Games (n=164)", x="price level [$]", y="overbid frequency")

# arrange plots next to each other

grid.arrange(p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, p11, p12, ncol=2, nrow=6)

#>

Again, the height of each bar shows the overbid frequency. The numbers above each bar indicate how many items receive an overbid in this specific price range and how many observations we have. All in all, we observe no correlation between expensiveness and overbid frequency.

Summarizing our previous findings, we conclude that we find no evidence for any of these theses. However, a simple comparison of means is not conclusive. We can assume relations but we do not know how large and significant these effects are. We will take a look into the field of regression analysis in the next exercise and try to find a better explanation for the overbidding phenomenon.

Exercise 7 -- Regression Analysis

In this exercise, we are going to model a probit regression in order to predict the probability that a bidder overbids based on his behaviour. More specifically, we are interested in the effect of leadtime. In the last exercise, we took a brief look at the relationship between overbidding and total leadtime (Thesis 2). Although the comparison of means does not indicate a positive relation between leadtime and overbidding behaviour, we want to test this thesis with a model that is more accurate.

At first, we need to find an appropriate model. An auction can be overbid or not, therefore, it makes sense to express that behaviour through the binary variable overbid which can be either 1 or 0 and predict a probability for overbidding. Linear regressions are most common but not suited for us. The predicted result Y can exceed our range from 0 to 1 and they can express an overbid of 0.5 for example which is hard to interpret because an auction cannot be overbid by "some degree". Linear regressions are not capable of predicting probabilities. Therefore the method of choice is a probit regression. It models a non-linear probability score that reflects the probability of occurrence of an event. (Le, J. (2018))

The Model

We want to set up a regression framework where we can test whether the time a bidder spends as the leader affects overbidding conditional on being outbid. We only consider bidders which are not the winners and check for each auction, whether they ever overbid (overbid = 1) or not (overbid = 0). We also control for the value of the bidder's last lead bid, as well as the time and price outstanding when the bidder is outbid for the last time.

We set up the following probit model where, we can test how a set of parameters influences the probability that a bidder overbids the auction. This probability is given by:

$$ p = \mathbb{P}(overbid=1|x) = F(x, \beta) = \Phi(\beta^T \cdot x) = \frac{1}{\sqrt{2\pi }} \int_{-\propto }^{\beta^T \cdot x} exp(-\frac{1}{2} {t}^2) dt $$

with $\Phi(\beta^T \cdot x)$ denoting the cumulative distribution function (CDF) of the standard normal distribution for a set of explanatory variables $x$ and their respective weights $\beta$.

The probability that a bidder does not overbid an auction is simply given by the complementary probability

$$ \mathbb{P}(overbid=0|x) = 1-\mathbb{P}(overbid=1|x) $$

and the vector of influencing factors is given by:

$$ x = \left[ \begin{pmatrix} 1 \\ totalleadtime \\ lastleadbid \\ timeoutbid _ outstanding \\ bidprice \end{pmatrix} \right] $$

totalleadtime is the sum of all periods, the bidder is the lead bidder [in days].
lastleadbid denotes the value of the last lead bid [in $].
timeoutbid_outstanding stands for the time left in the auction when the bidder is outbid for the final time [in days].
bidprice represents the last price outstanding when the bidder is outbid for the final time [in $].

The calculation of coefficient vector $\beta$ is based on a maximum likelihood estimation: As the auctions are considered to be independent, our n observations are drawn from a Bernoulli distribution and the probability function for a bidder to overbid is:

$$ y = p^{overbid} \cdot (1-p)^{1-overbid} $$

The likelihood function is defined as the product of each individual probability. $$ L=\prod_{i=1}^{n} y_i = \prod_{i=1}^{n} p^{overbid_i} \cdot (1-p)^{1-overbid_i} $$

This function is maximized afterwards with respect to $\beta$ in order to find the best fitting parameter weights. Instead of maximizing the likelihood function, it is much easier in most cases to maximize the logarithmic likelihood function instead. Because the first order condition leads to an non-linear system of equations, an iterative procedure like the Newton-Raphson method is necessary to solve the problem. If you are interested in a detailed description of this approach, please take a look at Davidson, R., & MacKinnon, J. G. (2004).

The following data set regdata is our basis and contains bids for all Cashflow 101 auctions. These bids are limited to leading bidders that are outbid at some point. On top of that, only the last bid per bidder and item is used to not let our observations be influenced by bidding multiple times on the same item. This means all bids where the bidvalue is higher than the bidvalue of the previous bidder are taken, the winning bids subtracted (because they do not get outbid) and only one observation per bidder (his last bid) is kept.

Load the data set regdata. To do so, just press edit and check. afterwards.

#< task
regdata <- readRDS("regdata.rds")
#>

Take a look at the data regdata we use for the regression. Press edit and check

#< task
head(regdata)
#>

< info "Declaration of Variables - regdata"

itemnumber:

This is the unique auction number, automatically generated when creating a listing. eBay uses this continuous number to keep track of their auctions.

bidvalue:

The value of the bid placed [in $]. Because this data set contains only lead bids, this variable is considered to be the lastleadbid of the bidder.

Note: This data set contains only the last bid of each bidder per auction.

bidprice:

The price that is publicly shown after the bid is placed. It is calculated as previous bidprice + increment but only if the bidvalue is higher than that. It represents the price for which it would be sold if the auction ends now.

finalprice:

The highest bid when the auction is closed and therefore the final price [in $]. The bidder with the highest bid (winbidder) wins and has to pay the final price in exchange for the item.

biddername:

The name of the bidder which places that bid.

leader:

The lead bidder of the auction after the bid is placed. It equals the biddername, if the current bid is higher than the previous one. When the auction ends, the bidder being the leader at this moment wins.

Note: This data set contains only bids which make the person a new lead bidder.

winbidder:

The bidder who bids the final price and wins the auction, more specifically, his alias name.

bid.overfinal:

overbid:

A binary variable indicating whether the bidder overbids the current auction at any time. It equals 1 if this biddername submit a bid higher than the BIN price (bid.overfinal == 1) for the current itemnumber (else 0).`

firstbid.overbid:

A binary variable indicating whether the first bid of the bidder at this auction is an overbid (1) or not (0) respecting final prices.

Note: This variable is needed for restriction 1 later on.

firstleadbid.overbid:

A binary variable indicating whether the first bid of the bidder which makes him the leadbidder at this auction is an overbid (1) or not (0) respecting final prices.

Note: This variable is needed for restriction 2 later on.

biddate:

The time when the bid is placed [as date format].

enddate:

The time when the auction ends [as date format].

timeleft_days:

The time that is left until the auction ends when the bid is placed [in days].

timeoutbid:

The time when the current bidder is outbid, it equals the biddate of the next leader [as date format].

timeoutbid_outstanding:

The time that is left until the auction ends when the bidder is outbid, it equals the timeleft_days of the next leader [in days].

leadtime_bid_days:

The amount of time the bidder is leader after the bid is placed, until the next bidder outbids him and achieves leading position [in days].

Note: All bids with a lead time of 0 were removed from this data set.

totalleadtime:

The summed amount of periods the bidder is leader in this auction [in days].

Note: Because winning bids were removed from this data set, periods from the very last bid to the auction end are not counted in.

bidmax_per_bidder:

The maximum of all his bids a bidder submits in this auction [in $].

>

Because we are not using all of the variables in our regression, we set up a new data frame with only a few selected variables.

Press edit and check to select only the variables we want to make use of in our model.

#< task
mydata <- regdata %>%
  select(overbid, totalleadtime, lastleadbid=bidvalue, timeoutbid_outstanding, bidprice)

head(mydata)
#>

The following code models the probability of overbidding explained by the variables totalleadtime, lastleadbid, timeoutbid_outstanding and price.

Press edit and check.

#< task
myprobit <- glm(overbid ~ totalleadtime + lastleadbid + timeoutbid_outstanding + bidprice, family = binomial(link = "probit"), data = mydata)
#>

< info "Function: glm()"

# glm(Y ~ X1 + X2 + X3, family, data)

Modelling a probit regression works quite similar like modelling linear models. We will make use of the R standard function glm() because it fits a wider variety of models capturing non-linear relationships better than the standard lm() functions for linear models. In order to compute a probit regression, we set the parameter family accordingly: family = binomial(link = "probit")

>

Use the function stargazer() from the identically named package to show a summary of our regression. One could also use the R standard function summary() for a slightly different summary. stargazer() however, supports a larger number of models and has additional parameters to work with. The option report=('vc*p') is used to display p-values instead of standard errors. With the option omit.stat statistical ratios can be hidden in the output.

Task: Create a summary of the regression myprobit using stargazer.

#< fill_in
# library(stargazer)
# ___(myprobit, title="Results", type = "text", report=('vc*p'), omit.stat = "AIC")
#>
library(stargazer)
stargazer(myprobit, title="Results", type = "text", report=('vc*p'), omit.stat = "AIC")

#< hint
cat("Just use the command 'stargazer()'.")
#>

< award "Regression Analyst"

Good work! You have created a nice regression, ready to be interpreted.

>

On the right, it is recalled that overbid is our dependent variable, predicted by the model. Beneath you can see the best-estimate for the coefficients $\beta$ with the p-values right below.

< info "The p-value"

Some coefficients are tagged with stars. These stars label the respective variable as significant on a certain level. These levels are commonly defined as the following:

* = 5%
** = 1%
*** = 0.1%

No stars means the coefficient is not significant on the 5% level. These p-values are calculated on the basis of a two-tailed testing of the null hypothesis that the coefficient is equal to zero (Goodman, S. (2008)).

>

Observations counts the sum of unique bidders per auction.

Log likelihood denotes the maximum value of the log likelihood function after the last iteration.

As we can see, all coefficients are positive, assuming these variables have a positive influence on the probability of overbidding. We observe two significant effects: the value of the last lead bid and the time left when being outbid. However, we find no significant relationship between the total time a bidder has led the auction and the probability of overbidding.

Marginal Effects: How Large is the Effect of a Variable on the Overbid Probability?

The coefficients in the output of glm() are often not directly interpretable as they only make sense for linear models (denotes the expected change in overbid given a unit change in one variable $x_i$, holding all other variables constant). For this reason, researchers normally opt for alternatives like the marginal effects.

< info "Marginal Effects"

Marginal effects describe the effect that an explanatory variable has on the dependent variable, in our case on the probability of overbidding y. Remember that this probability is given by:

$$ y(x) = \Phi(\beta^T \cdot x) $$

Marginal effects is a measure of influence that a change in a particular influencing variable has on the predicted probability of overbidding, when the other covariates are kept fixed. Therefore, the marginal effect of a parameter $x_i$ is obtained by computing the derivative of the probability function with respect to $x_i$:

$$ \frac{\partial y(x)}{\partial x_i} = \frac{\partial \Phi(\beta^T \cdot x)}{\partial (\beta^T \cdot x)} \cdot \beta_i = \frac{1}{\sqrt{2\pi }} \cdot exp(-\frac{1}{2} {(\beta^T \cdot x)^2}) \cdot \beta_i $$

We see that marginal effects do not simply depend on just one parameter $\beta_i$ but on the value of $x_i$ and all other influencing variables. Hence, marginal effects are naturally not constant for non-linear models and the computation is usually done based on a mean.

MEMs (Marginal Effects at the Means) One way of calculating marginal effects is by changing one variable while setting all covariates to their means within the sample. Afterwards, the "average" effect of these changes on the dependent variable is calculated. Because it is easier using this method, we will do a calculation together with a visualisation of marginal effects in the next task.
AMEs (Average Marginal Effects) A second way is to calculate marginal effects for each variable individually with their observed levels of covariates, before taking the average across all individuals. Usually this way of computing marginal effects is preferred because AMEs average across the variability in the fitted outcomes. This way, AMEs provide a more natural measure as they do not take unrealistic means like MEMs sometimes do (like building a mean of 0.5 for binary variables which only assumes 0 or 1). (Leeper, T.J. (2018)).

If you are interested in further information about marginal effects, you can take a closer look at Bartus, T (2005).

>

Next, we want to visualize the marginal effect of the total lead time on the probability that the auction is overbid. For this purpose, we use the MEMs by creating a vector of some specific values for the variable totalleadtime while setting the other variables equal to their mean (e.g. bidprice = 79.24). Because auctions usually last 7 days, we choose a sequence of values from 0 to 7 for the total leadtime (there is only 1 case of totalleadtime > 7 days). Once we created the data frame, we add the probability of an overbid predicted by the model.

Just press edit and check to plot the marginal effect of the total leadtime on the probability that a bidder overbids the auction.

#< task

# set total leadtime to a squence from 0 to 7 and other variables to their means

newdata <- regdata %>%
  mutate(totalleadtime = seq(from = 0, to = 7, length.out = n())) %>%
  mutate(lastleadbid = rep(mean(bidvalue),  n())) %>%
  mutate(timeoutbid_outstanding = rep(mean(timeoutbid_outstanding),  n())) %>%
  mutate(bidprice = rep(mean(bidprice), n())) %>%
  select(totalleadtime, lastleadbid, timeoutbid_outstanding, bidprice)

# draw prob of overbidding against total lead time

newdata[, c("overbid")] <- predict(myprobit, newdata, type = "response")

head(round(newdata,6))
#>

The table shows the first few rows of the data plotted directly below it. The total leadtime is set to a sequence of values, ranging from 0 to 7 days. All other variables are set equal to their means respectively. This is done for the value of the last lead bid. It also applies for the time and price outstanding when being outbid for the last time. The last column shows the probability for an overbid, predicted by the model with:

$$ p = \Phi(\beta^T \cdot x) = \Phi \left[ \begin{pmatrix} -10.579 & 0.006 & 0.078 & 0.125 & 0.002 \end{pmatrix} \cdot \begin{pmatrix} 1 \\ totalleadtime \\ lastleadbid = 89.40 \\ timeoutbid _ outstanding = 3.21 \\ bidprice = 79.24 \end{pmatrix}\right] $$

Press edit and check to plot the table.

#< task
ggplot(newdata, aes(x = totalleadtime, y = overbid)) + geom_line() + facet_wrap(~timeoutbid_outstanding)
#>

< award "Effect Illustrator"

Good work! You have calculated marginal effects and illustrated them professionally.

>

Looking at this plot, we observe that the probability for receiving an overbid increases with the time a bidder leads the auction. Despite that the graph looks like a straight up going line, marginal effects are usually not linear as we know. However, this positive effect is almost neglectable as the probability increases by 0.0019% when the total leadtime amounts for 2 days instead of 1. Furthermore, this effect is not significant anyway as we have seen before.

Usually, marginal effects are not calculated by hand of course. There is a command for average marginal effects (AME). It is called margins(). Using summary() along with it, some additional output is produced: standard error, z and p-value as well as the 95% confidence interval.

Press edit and check to see the AME of our Variables.

#< task
library(margins)
x <- c("totalleadtime", "lastleadbid", "timeoutbid_outstanding", "bidprice")
m <- summary(margins(myprobit)) %>% arrange(match(factor, x))

m
#>

We can see the least average impact for the item price before the bidder was outbid and the total lead time. The highest impact has the time left when being outbid for the last time. A change of timeoutbid_outstanding by 1 unit (= 1 day) increases the probability of overbidding by 1.2% on average. In summary, as we have already seen at the coefficient table: The bid price outstanding when being outbid and the total lead time are not significant.

The next code chunk visualizes this table of average marginal effects so that we can compare the results better with a single view.

Press edit and check to visualise this summary of average marginal effects.

#< task

# define lables

m <- m %>%
  mutate(unit= c("\n+$1", "\n+$1", "\n+1 day", "\n+1 day")) %>%
  mutate(details= paste(factor, unit)) %>%
  mutate(order = factor(details, as.character(details)))

# plot marginal effects

ggplot(data=m, aes(y=m$AME)) +
  geom_bar(aes(x=m$order), stat="identity", fill="steelblue") +
  geom_text(aes(x=m$order, label=percent(m$AME)), position=position_dodge(width=0.9), vjust=-0.25)+
  ylab("change of probability in %")+
  xlab("parameter")+
  ggtitle("Average marginal effects on the probability of overbidding")
#>

Remember that price and totalleadtime are not significant. Therefore, we should not consider these effects being relevant.

Restriction 1: First Bid is Not an Overbid

To check the robustness of our regression, we are going to modify our data. Because we want to measure the effect of the participation length in form of the total lead time, bidders who overbid from the start distort our results. Thus, we remove those bidders with strange estimates of item prices and restrict our sample to bidders whose first bid is not an overbid.

Press edit and check to run the code.

#< task

# filter first bids

mydata1 <- regdata %>%
  filter(firstbid.overbid==0) %>%
  select(overbid.restr1=overbid, totalleadtime, lastleadbid=bidvalue, timeoutbid_outstanding, bidprice)

# regress

myprobit1 <- glm(overbid.restr1 ~ totalleadtime + lastleadbid + timeoutbid_outstanding + bidprice, family = binomial(link = "probit"), data = mydata1)

# summarise

stargazer(myprobit, myprobit1, title="Results", type = "text", report=('vc*p'), omit.stat = "AIC")
#>

We actually reduced our sample size by 48 observations. When we compare the resulting table to the summary of our original data set, we recognise small changes in the coefficients. However, this modification does not change any level of significance.

Restriction 2: First Lead Bid is Not an Overbid

Following the concept of total lead time effecting overbidding, we can restrict our sample even further to only bidders whose first bid that makes them the lead bidder is not an overbid. This way, we strike out bidders with a first bid in every auction being below the BIN price but who nevertheless did not have any leadtime before submitting an overbid.

Press edit and check to run the code.

#< task

# filter first lead bids

mydata2 <- regdata %>%
  filter(firstleadbid.overbid==0) %>%
  select(overbid.restr2=overbid, totalleadtime, lastleadbid=bidvalue, timeoutbid_outstanding, bidprice)

# regress

myprobit2 <- glm(overbid.restr2 ~ totalleadtime + lastleadbid + timeoutbid_outstanding + bidprice, family = binomial(link = "probit"), data = mydata2)

# summarise

stargazer(myprobit, myprobit1, myprobit2, title="Results", type = "text", report=('vc*p'), omit.stat = "AIC")
#>

We find an overall reduction of the p-values which indicates that we actually decrease noise in our data by restricting it this way. However, this effect is not large enough to increase the level of significance. So far, as it concerns the parameters, we see only small deviations.

We find significant positive effects for the value of the last lead bid. This is quite intuitive as overbidding is defined as exceeding a certain threshold for the bid price: The BIN price, which is constant in 83% of all cases ($129.95). The time outstanding at the last outbid has also a significant positive effect which is not as intuitive as the bid price. It is plausible however, when we consider the share of overbidders (17%) and their disproportional influence on auctions. As irrational bidders are the minority, most bidders will not continue bidding when there has been an overbid. An early overbid should win the auction right away in many cases. We find no significant effect for the value of the last bid price. Although it depends on the last lead bid, the bid price varies depending on the price before and the increment. Unfortunately, we find no effect for our primary variable: the relationship between the total lead time a bidder leads an auction and the probability of overbidding. The same holds if we restrict our sample to only bidders who do not overbid with their first bid or first lead bid. Therefore, we find no direct evidence for the quasi-endowment effect explaining overbidding behaviour.

Exercise 8 (Excursus) -- Availability of BIN Offers

The data sets we work with in this problem set were already prepared, such that every auction definitely has a related BIN offer. In fact, we only consider auctions where a BIN offer for the same item is available throughout the entire auction period. Otherwise, our observations would be falsified if bidders do not always have the option to buy the item immediately for a fixed price outside the auction.

In addition to the original paper, we investigate in this exercise the presence of "gaps" where BIN offers are not available for the Cashflow game. For this purpose, we take a look into the data set BIN which contains all buy-it-now offers for Cashflow 101 from Feb 16 to Sep 02 of 2004. Furthermore, you will learn a bit more about how to deal with time formats in R.

Start with loading the data set BIN. To do so, just press edit and check afterwards.

#< task
BIN <- readRDS("BIN.rds")
#>

Task: Take a first look at the BIN data using the head() function

#< fill_in
# head(___)
#>
head(BIN)

#< hint
cat("Use the variable 'BIN' as input.")
#>

The columns start and end contain so called POSIXct elements, an R intern data type classifying the variable as a time object. In order to make it easier to do calculations and comparisons with them, we convert these variables into numeric numbers. The function as.numeric() converts POSIXct objects into seconds counted from a fixed point in time. This fixed point is "1970-01-01 0:00:00"" by default. However, it is irrelevant for doing calculations as long as all numbers have the same basis. More details can be found in the info box below:

< info "POSIXct and Other Time Objects"

The POSIXct class stores date and time values as a list of components (year, date, hour, seconds etc.). This way, it is easier to extract information you are interested in. POSIXct elements can have a number of different shapes, such as only displaying a date or only hours. Furthermore, they can handle different time zones as well as different formats like the American way of writing. The rear part "ct" stands for calender time, it stores the number of seconds since the beginning of 1970.

For example the as.POSIXct function converts a string into a POSIXct object. By defining the format and time zone, almost any shape of string can be converted.

ct <- as.POSIXct("02/16/2004 19:27:03", format="%m/%d/%Y %H:%M:%S", tz = "America/Los_Angeles")
ct

The function as.numeric converts a time object back into a numeric number.

as.numeric(ct)

The function as.POSIXct also converts seconds counted from a basis point but in this case, the basis point (origin) must be supplied.

as.POSIXct(1076988423, origin="1970-01-01", tz = "America/Los_Angeles")

Also, there is a second POSIXt type: POSIXlt keeps the date as a list of time attributes, accessible by $ signs ("lt"" stands for local time).

lt <- as.POSIXlt("2004-02-16 19:27:03")
cat( lt$hour, lt$min, lt$sec)

Furthermore, there exist a few extensions like the "chron" and "lubridate" packages with even more options to play around time formats. If you wish to know more about how to handle times and dates, please take a look at: Handling date-times in R from Beck.C (2012)

>

Task: Add 2 new columns to the dataset containing the start and end times in seconds. Make use of the R base function as.numeric() which converts different types of variables (like date format) to numeric values.

#< fill_in
# BIN <- BIN %>%
#   mutate(start.numeric = ___) %>%
#   mutate(end.numeric = ___)

# head(BIN)
#>
BIN <- BIN %>%
  mutate(start.numeric = as.numeric(start)) %>%
  mutate(end.numeric = as.numeric(end))

head(BIN)

#< hint
cat("You need the function as.numeric() in front of 'start' and 'end'.")
#>

Now we make use of another support function: cummax(). It computes for each row the maximum of end times up to this point. The data frame is sorted by start in ascending order, so we can check whether the maximum endtime overlaps with the start time of the next BIN offer. In this case, the next BIN offer starts before the last one ends. Check the info box for a detailed description of this concept:

< info "Function: cummax()"

The R base function cummax(data) computes for each index in an vector the maximum of the vector from the beginning up to the current index. Here is a little example:

df <- data.frame(S = c(1, 2, 4, 8), E = c(3, 10, 5, 9))
df %>%
  mutate(cummax = cummax(E))

In this example, you see that if you would just compare the end time of a BIN offer with the start of the next one, you would flag rows 3 and 4 as not overlapping, although there is always a BIN offer active from 1 to 10.

>

Now as we have our tools together, we use the lead() function again to refer to the next row of the data frame and check for overlaps.

Press edit and check to add a new column to the data frame which signals overlapping BIN offers.

#< task
BIN <- BIN %>%
  mutate(cummax = cummax(end.numeric)) %>%
  mutate(overlaps = cummax>lead(start.numeric, default=NA))

head(BIN)
#>

The Column overlaps is TRUE for all overlapping periods and indicates a missing BIN offer between the current line and the next one by being FALSE.

Task: Find out at which times there is no overlap of BIN offers. You can for example use the R standard function which() or the function filter() from the dplyr package (then you will need to call library(dplyr) again).

Note that you can display lines from row number x to y by using the command BIN[x:y,].

#< task_notest
#...
#>
#< hint
cat("Hint: No overlaps are in lines 459:460 and 473:474")
#>

Quiz 4: Missing BIN Offers

< quiz "Missing BIN Offers"

question: At which times are BIN offers missing? Note that you can select more than one option. choices: - 2004-05-03 02:45:00 -- 2004-05-05 21:14:50 - 2004-05-27 18:45:00 -- 2004-05-30 18:45:00 - 2004-07-15 23:45:00 -- 2004-07-28 23:30:00 - 2004-05-10 19:08:55 -- 2004-05-13 19:08:55 - 2004-08-14 23:15:00 -- 2004-08-20 20:48:22 multiple: TRUE
success: Great, all answers are correct! failure: Not all answers correct. Try again.

>

Finally, we can conclude that in the period of collecting data for the Cashflow game from Feb 16 to Sep 02 of 2004, there is always a BIN offer active except for 2 time periods of about a week. Therefore, we should not evaluate auctions within that time (which has already been done before creating the data set cf).

< award "Time Manager"

Good work! You have dealt with different time formats and identified missing data.

>

Exercise 9 -- Conclusion

In this interactive problem set, we investigated the Bidder's Curse phenomenon of bidding more than an item actually costs and showing others the own irrational behaviour when being picked as a winner subsequently. In Exercise 1, we made ourselves familiar with the auction platform eBay and its functioning. For this assessment, we used the availability of BIN offers where the same item can be purchased for a fixed price at the same time. We found a high proportion of bidders who bid more than it would cost at the corresponding BIN listing. This applies for the board game "Cashflow 101" (Exercise 2) as well as for almost all other types of items (with automotive products being the only exception) (Exercises 4 and 5). We saw in Exercise 3 that this type of irrational bidders is indeed the minority, however, the design of auctions chooses them as winners.

In Exercise 6, we explored possible explanations for this behaviour and found overbidding being very persistent throughout all demographic groups and price levels. Furthermore, it seems that experience does not prevent from making unreasonable decisions regarding bidding at auctions. Overbidding among experienced bidders is as common as it is among inexperienced ones. We also considered gaining extra "utility from winning" as one possible reason and used the quasi-endowment effect statistic. We analysed the influence of accumulated leading time at auctions on the probability of overbidding in Exercise 7 and found no significant effect. The lack of significant correlation between the time spent and overbidding probability also rules out other approaches like considering sunk costs. "Individuals who are outbid by others may feel the need to justify their previous bids and their time investments, leading them to continue bidding even when they have reached their limits." (Ku, G., Malhotra, D., & Murnighan, J. K. (2005))

Another explanation for overbidding in auctions is that bidders make estimation errors and the framework of auctions induces the selection of overoptimistic bidders (Compte, O. (2004)). However, this literature investigates auctions only, without a possibility to buy at fixed price offers. In our framework, the BIN offer serves as a reference point for an item's valuation and should eliminate wrong estimations. Approaches like belief-based estimations about the value of items or about the behaviour of other bidders (Eyster, E., & Rabin, M. (2005)) are not suited to explain overbidding in our data as it would be optimal to switch to the BIN offer once the fixed price is exceeded.

Unfortunately, we cannnot provide an intuitive explanation for the observed results. It is possible though, that bidders fail to remember the BIN listing when rebidding. When someone is outbid, eBay messages him with a note saying "You have been outbid!" along with a direct link to the auction. This message can be a reason for limited attention towards the fixed price, leading to a different behaviour from what traditional auction theory suggests.

If you want to see all the awards you have collected in this problem set, press edit and check afterwards. There is a maximum number of 8 awards achievable.

#< task
awards()
#>

I hope you enjoyed our journey of learning more about bidders' behaviour in auctions and improved your data handling skills in R. If you like to solve more exercises of this kind, feel free to check out other problem sets about different economic articles at GitHub.

Exercise References

Bibliography

Bartus, T. (2005): "Estimation of marginal effects using margeff". The Stata Journal, 5(3), 309-329.
Beck, C. (2012). "Handling date-times in R".
Black, G. S. (2007). Consumer demographics and geographics: Determinants of retail success for online auctions. Journal of Targeting, Measurement and Analysis for Marketing, 15(2), 93-102.
Compte, O. (2004). Prediction errors and the winner’s curse. Unpublished manuscript.
Cooper, D. J., & Fang, H. (2008). Understanding overbidding in second price auctions: An experimental study. The Economic Journal, 118(532), 1572-1595.
Davidson, R., & MacKinnon, J. G. (2004). "Econometric theory and methods (Vol. 5)". New York: Oxford University Press.
eBay (2019): "Automatic bidding". https://www.eBay.com/help/buying/bidding/automatic-bidding?id=4014 (20.02.2019).
Eyster, E., & Rabin, M. (2005). Cursed equilibrium. Econometrica, 73(5), 1623-1672.
Garratt, R. J., Walker, M., & Wooders, J. (2012). Behavior in second-price auctions by highly experienced eBay buyers and sellers. Experimental Economics, 15(1), 44-57.
Goodman, S. (2008, July). A dirty dozen: twelve p-value misconceptions. In Seminars in hematology (Vol. 45, No. 3, pp. 135-140). WB Saunders.
Harstad, R. M., Kagel, J. H., & Levin, D. (1990). Equilibrium bid functions for auctions with an uncertain number of bidders. Economics Letters, 33(1), 35-40.
Heyman, J. E., Orhun, Y., & Ariely, D. (2004). "Auction fever: The effect of opponents and quasi-endowment on product valuations". Journal of interactive Marketing, 18(4), 7-21.
Kagel, J. H., Harstad, R. M., & Levin, D. (1987). Information impact and allocation rules in auctions with affiliated private values: A laboratory study. Econometrica: Journal of the Econometric Society, 1275-1304.
Kagel, J. H., & Levin, D. (2009): "Common value auctions and the winner's curse". Princeton University Press.
Kiyosaki, R. (1996): "Cashflow 101". http://www.richdad.com/about/rich-dad (16.01.2019).
Ku, G., Malhotra, D., & Murnighan, J. K. (2005). Towards a competitive arousal model of decision-making: A study of auction fever in live and Internet auctions. Organizational Behavior and Human decision processes, 96(2), 89-103.
Le, J. (2018): "Logistic Regression in R Tutorial". https://www.datacamp.com/community/tutorials/logistic-regression-R (27.03.2019).
Leeper, T. J. (2017): "Interpreting regression results using average marginal effects with R’s margins". Available at the comprehensive R Archive Network (CRAN).
Malmendier, U., & Lee, Y. H. (2011): "The Bidder's Curse". American Economic Review, 101(2), 749-87.
Paradis, E. (2002). "R for Beginners".
Wagner, C. H. (1982): "Simpson's paradox in real life". The American Statistician, 36(1), 46-48.
Wolf, J. R., Arkes, H. R., & Muhanna, W. A. (2005). Is Overbidding in Online Auctions the Result of a Pseudo-Endowment Effect?.
Yeh, J. C., Hsiao, K. L., & Yang, W. N. (2012). A study of purchasing behavior in Taiwan's online auction websites: Effects of uncertainty and gender differences. Internet Research, 22(1), 98-115.

R Packages

Auguie, B. (2017): gridextra. “Miscellaneous Functions for "Grid” Graphics", R package version 2.3, http://CRAN.R-project.org/package=gridExtra
Hlavac, M. (2018): stargazer. “Well-Formatted Regression and Summary Statistics Tables”, R package version 5.2.2, http://CRAN.R-project.org/package=stargazer
Kranz, S. (2019): RTutor. “Creating interactive R Problem Sets. Automatic hints and solution checks.”, R package version 2019.02.11, https://github.com/skranz/RTutor
Leeper, T. J. (2018): margins: "Marginal Effects for Model Objects", R package version 0.3.23, https://CRAN.R-project.org/package=margins
Wickham, H., Francois, R., Henry, L., Muller, K. (2019): dplyr. "A Grammar of Data Manipulation", R package version 0.7.8, http://CRAN.R-project.org/package=dplyr
Wickham, H., Chang, W., Henry, L., Pedersen, T., L., Takahashi, K., Wilke, C., Woo, K. (2018): ggplot2. “Create Elegant Data Visualisations Using the Grammar of Graphics”, R package version 2.2.1, http://CRAN.R-project.org/package=ggplot2
Wickham, H. (2018): scales. "Scale Functions for Visualization", R package version 1.0.0, https://CRAN.R-project.org/package=scales
Wickham, H., Henry, L. (2019): tidyr. "Easily Tidy Data with 'spread()' and 'gather()' Functions", R package version 0.8.2 https://CRAN.R-project.org/package=tidyr

ErhardtP/RTutorBiddersCurse documentation built on May 31, 2019, 12:40 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ErhardtP/RTutorBiddersCurse RTutor problem set BiddersCurse

In ErhardtP/RTutorBiddersCurse: RTutor problem set BiddersCurse

Problem Set: The Bidder's Curse

< ignore

>

Exercise Overview

Content

Outline

How to solve this Problem Set

Exercise 1 -- Introduction

Observational Data and Auction Theory

The "Cashflow 101" Game

< award "Starter"

>

< info "head() and tail()"

>

< info "Declaration of Variables - cf"

>

< info "Function: filter()"

>

Quiz 1: Unique Buyers

< quiz "Unique Buyers"

>

< info "Data-manipulating Functions of dplyr"

>

< info "Pipe Operator %>%"

>

< award "Pipe Operator"

>

Exercise 2 -- The Phenomenon of Overbidding

< info "Function: select()"

>

< info "Functions: complete() and drop_na()"

>

< info "Function: cut() and mutate()"

>

< info "Function: gather()"

>

< info "Package: ggplot2"

>

Exercise 3 -- Disproportional Influence of Overbidders

< info "Declaration of Variables - bidhistory"

>

< info "Function: distinct()"

>

Exercise 4 -- Overbidding at Various Products

< info "Various products - Full Item List"

>

< info "Function: top_n()"

>

< info "Declaration of Variables - dat"

>

Quiz 2: Most Overpayed Item Type

< quiz "Most Overpayed Item Type"

>

< award "Graphic Designer"

>

! start_note "Info: Overbidding by Item Type -- Totalprice (with Shipping)"

! end_note

Exercise 5 (Excursus) -- Hypothesis: Overbidding is Significant on Averages

< award "Hyphothesis Tester"

>

Exercise 6 -- Possible Factors Influencing Overbidding

Thesis 1: Overbidding by Experience

Thesis 2: Utility from Winning - Overbidding and Participation Length at Auctions

< info "Functions: lead() and lag()"

>

Thesis 3: Overbidding by Demographic Group

< info "sample_n()"

>

Quiz 3: Overbidding by Demographic Group

< quiz "Overbidding by Demographic Group"

>

< award "Quizmaster"

>

Thesis 4: Overbidding by Price Levels

Exercise 7 -- Regression Analysis

The Model

< info "Declaration of Variables - regdata"

>

ErhardtP/RTutorBiddersCurse
RTutor problem set BiddersCurse