Problem Set Product Attribute Trade-Offs in the Automobile Sector

ps.dir =  'C:/Users/MariusPC/Desktop/AttributeTradeoffs' # set to the folder in which this file is stored
ps.file = 'Attribute Tradeoffs.Rmd' # set to the name of this file
user.name = '' # set to your user name

library(RTutor)
check.problem.set('Attribute Tradeoffs', ps.dir, ps.file, user.name=user.name, reset=FALSE)

# To check your solution in RStudio save (Ctrl-S) and then run all chunks (Ctrl-Alt-R)

Author: Marius Breitmayer

In his paper "Automobiles on Steroids: Product Attribute Trade-Offs and Technological Progress in the Automobile Sector" Christopher R. Knittel (2012) estimates the technological progress since 1980 and the trade-offs faced when choosing between different attributes such as weight, fuel economy or engine power characteristics. In this interactive problem set, we are going to reproduce his study and discuss it.
(The public data as well as the article are provided on the website of the American Economic Association. You can simply click here to download it.)

Exercise Overview

Have you ever been at the gas station and wondered how much fuel your car would need if all the innovation achieved over the last 25 years has improved fuel economy instead of engine power characteristics, weight or other characteristics?

Manufacturers as well as consumers face technological trade-offs when choosing between fuel economy, engine power characteristics or even weight of a car every time they want to produce or buy one. The goal of this problem set is not only to better understand these trade-offs, but also to try to estimate the technological progress that has occurred over the observed timeframe.
Using data from 1980 to 2006, we will reproduce the estimates for technological trade-offs that manufacturers and consumers face when choosing between fuel economy, weight and engine power characteristics. We will also examine how the relationships between these different factors have changed over that time.

In this problem set we would like to find an answer to the central question of how fuel economy in 2006 would compare to fuel economy in 1980 if we had held size and power constant.

This problem set has the following structure:

Exercise 1: Descriptive statistics

a) Loading the required data

Before we are able to work with data, we need to load it into our working space.

Usually in R this works as simple as assigning a name to the imported data: new variable = read.table("data_name") .

But because in our case the data is coming from STATA code, and therefore being saved as a .dta file, the read.table() command will not work.

In many cases R-Packages are great ways to save a lot of time, because they provide solutions to common problems. You will see, that we will use several different packages within this problem set.

For this situation I recommend the package foreign documentation. Load it with the command library() and add the name of the package.

# Use library() to load the foreign package.

After loading the foreign package we can now use the command read.dta(). It works the same way as read.table() but as the name already suggests it works with .dta files. Please use the read.dta() command to load the Steroids_AER_data_post.dta and assign it to a variable called dat.

info("read.dta()") # Run this line (Strg-Enter) to show info

# Use the `read.dta()` command to load the `Steroids_AER_data_post.dta` and assign it to a variable called `dat`

The data contains model-level data on almost all vehicles sold in the United States between the years 1980 and 2006. In the next exercise, we will take a closer look at how the data is structured.

b) Data Overview I

Since we are now able to work with our data, we will first take a look at which columns are in the data set. Use the colnames() command with our newly generated dat to show the names of the columns.

# use `colnames()` on our loaded data "dat" to show the names of the columns.

If you want to know what the different variables stand for, just click the "Info"-Button.

info("Interesting Variables") # Run this line (Strg-Enter) to show info

info("Dummy variable") # Run this line (Strg-Enter) to show info

After we know which columns are in our data, we might as well take a look at the data at hand. Simply click check, to see a brief overview of the data.

# click check, to show the first couple of rows of the data. 
dat

c) Data Overview II

Now that we know which variables are part of the data set, let's see if we can get some rough estimations on the values for the variables. Therefore summary() is a really great way to get a first impression. Because the data consists of cars as well as trucks, we first need to select the cars only. We are also going to take out outliers here for the first time. For more information on outlier, see the info box.

info("Outlier") # Run this line (Strg-Enter) to show info

To select cars only, and get a first impression on the variable mpg (which will later be also referred as fuel economy), click check.

# we first want to use cars only
cars = filter(dat, d_truck == 0 & outlier == 0)
# now we want to have a summary of the variable curbwt
summary(cars$mpg)

Min is the smallest value for mpg in the dataset.

Max is the biggest value for mpg in the data set.

1st Qt. is the 25 percent quantile, and 3rd Qt. is the 75 percent quantile of our data set. This means, that 50 percent of all the values for mpg are between these two values.

Mean is the average mpg of our data set and Median is the amount of mpg, which separated the higher and lower half of the data set.

In case you don't know what the median is, click the info box.

info("Median") # Run this line (Strg-Enter) to show info

For now, we are mostly interested in the mean values, and therefore we will now use the mean() command. Please use the mean() command to get an estimation for horsepower.

# use the `mean()` command on `cars` for horsepower `hp`.

Great job!

If we look at the given values, we can assume that, regarding the means, an average car has a fuel economy of 27.9 miles per gallon, and 157 horsepower.

In order to give evidence about the average car, the data should be weighted with for example sales data:

$$ \bar x = \dfrac{\sum_{i=1}^n w_{i} * x_{i}}{\sum_{i=1}^N x_{i}} $$

This way, cars that were sold more often would be more meaningful and we would have a better representation of "the average sold car".

Unfortunately we don't have any sales data in our data.

Therefore in our data every cars values are weighted equally regarding means:

$$ \bar x = \dfrac{1}{n}\sum_{i=1}^n x_{i} $$

What this results in is that a car that was sold only once, is represented the same as a car that was sold a million times.

Even though our mean values are not a proper representation of the average car sold, it still helps us to get a first impression of our data at hand.

After we now got some means over the whole observation (years 1980 to 2006), we might as well be interested at how these car attributes differ over time. For now, let's keep looking at Fuel Economy (mpg) and Horsepower (hp) only.

We will therefore plot the mean values of mpg and hp for each year.

To do so, click check:

# First we use the package 'dplyr'
library(dplyr)
# The idea is, to plot the means of every year with X-values regarding the year, and Y-values being the mean in a given year.
# Therefore it is useful to have groups of years in our data. 
# We use group_by from the package dplyr to generate these groups. 
# Every group now contains all the cars produced in the given year. 
# If we now use the mean() command on mpg, and horsepower we can plot the data easy. 
qdata = summarise(group_by(cars, year), mpg = mean(mpg), hp=mean(hp))
# we now simply plot the values for mpg using qplot from the package 'ggplot2'
# we take year as our x, and mpg as our y values, and use the recently created data `dat`.
library(ggplot2)
q1 = qplot(x=year, y= mpg, data = qdata)
# then we just show the plot
q1

As you can see, we grouped the cars by year, and saved the means for mpg and hp into qdata, then we plotted every years mean value for mpg, to get an idea of how it changed over time. Your task now is to do this kind of plot for horsepower. You don't have to summarize and group anymore, just simply use qdata.

# save the plot for hp as q2 and display it.

Let's take a look at the two graphs:

The difference in the two graphs is quite obvious. While horsepower has increased almost linear every single year with a slightly higher increase in the last years, fuel economy has drastically increased in the first five years but then started to fluctuate between 27 and 29 miles per gallon. Unfortunately even though fuel economy was fairly constant a small negative trend can be identified later in the sample. Overall, mpg increased by ~18 percent from 1980 to 2006. Especially in the last years, the years horsepower drastically increased, a decline in fuel economy is recognizable. One possible reason for this development of mpg will be discussed in the next exercise.

This exercise refers to page 3377 of the paper.

Exercise 2: Motivation: CAFE - Standards

After getting a rough overview of the data at hand, let us take a small step away from the data and talk about the motivation of our problem set.

In exercise 1, we concluded that horsepower and fuel economy developed differently between 1980 and 2006.

But why was there such a huge increase in horsepower, while fuel economy remained relatively static?

What could have possibly had an impact on that development?

In order to find an answer to this question, it might be interesting to see how policy makers could incentivize manufacturers to produce new cars with higher fuel economy.

The first fuel economy standards, called Corporate Average Fuel Economy (CAFE) standards, were established in 1975 with the Energy Policy and Conservation Act, in response to the 1973 oil embargo.

The purpose of the CAFE standards is to reduce the energy use by increasing the fuel economy of cars and light trucks.

CAFE standards are fleet-wide averages in fuel economy, weighted by sales, which each manufacturer's fleet has to achieve each year, since 1978.

In case a manufacturer does not comply he will be fined. The fine is $5.50 for every 0.1 mpg below standard, multiplied by the number of cars in the manufacturer's new car fleet in that year.

With these fines, policy makers try to incentivize manufacturers to increase their fleet`s fuel economy. Between 1983 and 2003, for example, the penalties collected totaled slightly over $600 million, and were mostly paid by small European manufacturers. (Source: Yacobucci , Bamberger, Automobile and Light Truck Fuel Economy: The CAFE Standards (2008) p. CRS-3 link)

Starting off in 1978, the intention of the CAFE Standards was to double the fuel economy of all new car models sold on the U.S. market. The value to be achieved by 1985 was 27.5 miles per gallon.

If we now take a look at our data at hand, we can try to estimate which manufacturers had met with the required fuel economy standards, and which manufacturers had not. We have to keep in mind that our mean values are still not weighted by sales, and because of this might differ from the values taken in reality.

As usual, we will load our data, and filter the data to only have the cars produced in 1985.

# first we read the data. 
dat = read.dta("Steroids_AER_data_post.dta")
# then we only take the cars in year 1985.  
cafe1 = filter(dat, year == 1985 & outlier == 0 & d_truck == 0)

If we now want to see how the mean fuel economy of a manufacturer looks like in 1985, we can use a command chain from the package dplyr.

We "chain" commands together using the %>% operator.

If you are familiar with UNIX, this can be compared to the "pipe" operator. By using the %>% operator the output of one command becomes the input for the next command.

# load dplyr
library(dplyr)
# after this, we are going to create a command chain using %>%
# we are first going to group our data by the manufacturer, since cafe Standards are manufacturer fleet averages
cafe1 = cafe1%>% 
  group_by(mfr) %>%
# after this we summarize by the mean fuel economy.
summarise(mpg = mean(mpg)) %>%
# lastly we subset those manufactuerers whose mean fuel economy is lower than 27.5.
filter(mpg < 27.5)
# finally we then show the data frame
cafe1

As we can see in our case Audi, BMW, Ferrari, Fiat, Jaguar, Lotus, Maserati, Mercedes, Peugeot, Pininfari, Porsche, Renault, Saab and Volvo did not achieve the target fuel economy of 27.5 mpg by year 1985.

a) Manufacturer fuel economy over the years

Now that we know how fuel economy across manufacturers looked like in 1985, would it not be interesting how the fuel economy changed across all manufacturers within the years?

To visualize this, the package googleVis gives us the opportunity to create a Motion Chart.

This allows us to see how specific values have changed over the course of time. Some values are represented on the axis, others with color or size of the bullet.

For more info on googleVis, click the info box.

info("googleVis") # Run this line (Strg-Enter) to show info

Click check, to see the MotionChart. Then click the Play-Button to start the animation.

# as before, we will use the pipe command
df = dat %>%
# first we filter outliers and trucks out
  filter(outlier==0 & d_truck==0) %>%
# then we group the data by year and manufacturer
  group_by(year,mfr) %>%
# we set the value for each mfr to the mean values of hp, mpg, accel 
# we also generate a new variable called models which represents the amount of different cars in each year
  summarise(mpg=mean(mpg), hp=mean(hp), accel=mean(accel), models=n()) %>%
# lastly we create a new column containing the manufacturer name and call it mfr
  mutate(id = mfr)

library(googleVis)
# Then we use the, gvisMotionChart() command to generate the html code and save it as mp. 
# within the command, we then select the generated data, and specify how every attribute should be represented in our plot.
# It is important to set idvar to id (which is equal to mfr), since we are interested in different manufacturers. 
# we set timevar to year, because 'year' is the variable that depicts time in our data set.
# As the variable shown on the x and y axis we assign hp and mpg
# different manufacturers will be represented in different colors and the amount of different models will determine the size.
mp = gvisMotionChart(df, idvar = "id",
                     timevar = "year", xvar = "hp", yvar = "mpg",
                     colorvar = "mfr", sizevar = "models")
plot(mp, tag = "chart")

In our motion chart, each circle represents a different manufacturer.

Let us take a closer look at the motion chart above. You can always watch it again by clicking the play button. In the early years of our sample (1980 - 1985) a trend to increase fuel economy can be recognized. Most of the circles are moving upwards in the coordinate system and only a few manufacturers are showing an increase in horsepower. If we then take a look at the years after 1985, there is hardly any increase in fuel economy with most manufacturers. In contrast, almost all circles are moving further to the right side of the coordinate system, which represents an increase in horsepower. This could imply the possibility of incentives in the early years in order to lay the manufacturers' focuses on improving their fuel economy as opposed to other characteristics such as for example horsepower. If we now take a look at the values of 2006 we can see that manufacturers with low fuel economy (<20 mpg) tend to have small circles. This suggests a small number of different models in that year. In our case, the manufacturers Ferrari, Bentley, Aston Martin and Mase-rati all have less than 5 models. This is consistent with Yacobucci and Bamberger if they say that most of the CAFE fines were paid by small European manufacturers.

b) Mean fuel economy vs. CAFE Standards

After getting an idea of how the manufacturers' average non-sale weighted fleet looked like over the years, we might now be interested in how CAFE Standards changed compared to the average fuel economy.

To visualize how the unweighted fuel economy changed compared to CAFE standards over the years, we can plot both values. The data on CAFE standards were taken from the "Summary of Fuel Economy Performance" by the National Highway Traffic Safety Administration link.

Here we will use the general unweighted mean value for each year, not taking different manufacturers into account, because we would like to get an overall idea on how all cars have changed. At this point we are not interested in which manufacturers did not reach the target fuel economy.

I have already prepared the data for you. It contains three columns: year, the mean fuel economy in the given year and the CAFE standard requirements in that year. It is saved as cafecars.txt. To see the plot, please click check.

# we load the data first
cafe= read.table("cafecars.txt")
# then we create a plot with x values as year and y values as fuel economy
# afterwards we just add two lines, one for the cafe standards the other one for the mean fuel economy
cafecars_mpg = ggplot(aes(x=year, y = "fuel economy", colour = Legend), data = cafe)+ 
  geom_line(aes(y = cafe, color = "CAFE standard"))+
  geom_line(aes(y = mpg, color = "Mean fuel economy"))
# last, we display the plot
cafecars_mpg 

As we can see, there was an important increase in CAFE Standards for cars in the early years, but after 1985 the standards remained on the same level. In contrast, CAFE Standards decreased in 1986, but rose up back to the 1985 level in 1990. In relation to this, the average car fuel economy of our sample was higher than the requirements most of the time. We have to note here again, that our mean values are not sales weighted. This might result into biased values, as car models with lower sales are "overrepresented", while car models with higher sales are "underrepresented". Nevertheless we can estimate a trend. While standards increased, fuel economy was raised as well. After 1990, the year when CAFE standards were modified for the last time in our sample, the mean fuel economy decreased almost every year. In 2006, it even fell below the requirement line. Therefore we might be able to find a correlation between fuel economy and CAFE standards.

Let's have a look at the correlation coefficient.

cor(cafe$cafe, cafe$mpg)

Our correlation coefficient is 0.8058689 and even though our sample is quite small, this suggests a strong positive linear relationship. This indicates that if we increased the CAFE Standards, this would result in higher average fuel economy. This would be one way to explain why there was an increase in fuel economy early, which later changed to increasing other characteristics such as for example horsepower.

info("cum hoc ergo propter hoc") # Run this line (Strg-Enter) to show info

What we know after this exercise is that, implying a causal relationship, an increase in CAFE Standards should result in an increase in fuel economy. As long as the CAFE Standards are fulfilled, which equals to not having to pay a fine, manufacturers will increase characteristics which are important to the customer. Examples of these characteristics are acceleration or horsepower. Manufacturers who do not comply with the standards value the consumer preferences higher than the resulting fine. But before we are using this information in order to give advice, we have to get a better understanding of how different car attributes influence fuel economy.

This will be the topic of exercise 3.

Source: U.S. Department of Transportation (2014): Corporate Average Fuel Economy (CAFE) Standards link, 28.11.15

Source: U.S. Department of Transportation (2003): CAFE - Fuel Economy link , 28.11.15

Source: Union of Concerned Scientists, Fuel Economy Basics link , 28.11.15

Exercise 3: Graphical Evidence

After getting a first idea of our data and how policy might create incentive for manufacturers to increase their fuel economy in exercises one and two, we will get a deeper understanding how different car attributes influence fuel economy in this exercise.

a) Density Plots

First we wil get a graphical view of the data at hand. The R package ggplot2 contains a nice array of tools to use when creating graphics. For the beginning, since we still want to know a little more about our data, we will create some density plots.

info("Density") # Run this line (Strg-Enter) to show info

Let me give you an example of how this is done: First, we need to load the needed data. I've already prepared this for you, so just click the check button to load the data.

# we first load out data again
dat = read.dta("Steroids_AER_data_post.dta")
# then, we need to filter our data 
# first off we only use data from the years 1980 or 2006 
# we take out the outliers, select only cars using gasoline as fuel, and cars with less than 50 mpg 
dens = filter(dat, year == 1980 | year == 2006, outlier == 0 & fuel == "G" & mpg<50 & d_truck ==0)

Then we will plot the density of Fuel Economy (mpg) for cars in the years 1980 and 2006. 1980 was the first year in our data and 2006 was the last one. By plotting these two years, we can see how the cars have changed over the course of our observation. Just click the check button to see the plot:

# as usual, we need to load a package again 
library(ggplot2)
# we add year as a factor to our data.
dens$Year = as.factor(dens$year)

# With 'ggplot(aes(...), data=dens)' we select the data that ggplot is going to use. 
# In our case it is the one we just created.
# x = mpg describes which values from dens are being used, 
# while fill is responsible to use different densities for the years. 
# geom_density adds a density to the ggplot object. 
# alpha = 0.5 alpha just fades the color.
p1 <- ggplot(aes(x=mpg,  fill=Year),data=dens) + geom_density(alpha=0.5)
p1

We can see that the fuel economy density in 1980 is slimmer than in 2006. This means that in 1980 the fuel economy was more similar between cars than in 2006. Another point to mention is that the peek has shifted from around 18 miles per gallon in 1980, to roughly 26 miles per gallon in 2006. Because the density is wider in 2006, we can assume that customers have more options to give their preferences to fuel economy when buying a car. In addition it is to notice that the density for 23 miles per gallon is roughly the same in 1980 and 2006. This means, that there are almost the same amount of cars that are capable of going 23 miles per gallon in 1980 and 2006. The main difference though is that in 1980 23 miles per gallon has been on the top half of fuel economy, while in 2006 23 miles per gallon is on the bottom half of fuel economy. In total we can say, that fuel economy has increased from 1980 to 2006.

Now please try to plot the density of accel with ggplot. It is not needed to do the data preparation as I did in the example. Use the same syntax as in the example above and save your object as p2. Don't forget to show your plot in the end.

# use p2 <- ... here for accel
# then show your plot

Let's take a look at the acceleration density. It is of course obvious that cars got faster over the years. Most of the cars in 1980 took around 13 seconds from 0-80 mph, whereas in 2006 the peak (which means that most of the cars take this time from 0-80 mph) is roughly 8 seconds. Another very interesting observation might be that one of the slower cars (accel > 12 sec) in 2006 accelerates a little bit faster as the average car in 1980.

b) Scatter Plots

Now that we have only been looking into single variables so far, we will take a look at how two variables stack up in this exercise. Since we are interested in how the Fuel economy, represented as mpg, changed over time it makes sense for us to plot mpg against another Variable. For this situation, Scatter plots are a really nice way to visualize how two variables stack up.

info("Scatter Plot") # Run this line (Strg-Enter) to show info

If we start thinking about which attributes might be important for fuel economy, we could start with weight. So if we want to see how weight stacks up against fuel economy, we will use a scatter plot for fuel economy and curbwt. I will give you an example here:

# we take our data again
scat = dens 
scat$year = as.factor(scat$year)
# Now we use ggplot again. 
# we would like to plot curbwt on the X-Axis 
# and mpg on the Y-Axis
# We use scat as our data.
# geom_pint adds the points, and geom_point(shape=1) changes the appearance of the points.
# geom_smooth() adds the smoothed line though the data for each year.
p3 <- ggplot(aes(x=curbwt,y=mpg, color=year),data=scat) + geom_point() + geom_point(shape=1) + geom_smooth()
# then we show the plot
p3 

This figure suggests that a 3,000 pound passenger car gets roughly 10 more miles per gallon in 2006, compared to 1980. This increase is roughly constant over the weight distributions, which can be seen by the lowess smoothed line which is fitted through the data points. The other way around, a car with a fuel economy of 30 miles per gallon had a curb weight of 2000 pounds in 1980, and a curb weight of almost 3000 pounds more in 2006. This equals an increase of almost 1000 pounds over the given timeframe.

Your task now is, to create a scatter plot called p4 for horsepower hp and fuel economy mpg. As last time, it is sufficient if you start with p4 <- ggplot(...)+...

# to do so, you just have to replace the ??? in the code below with the correct values.
# p4 <- ggplot(aes(x=???,y=???, color=year),data=scat) + geom_point() + geom_point(shape=1) + geom_smooth()
# p4

Good job on that Scatter plot.

It is very interesting that in 1980 a car with more than 200 horsepower has been almost a rarity. Most of the cars had between 80 and 180 horsepower. In 2006, 200 horsepower can almost be considered as a standard amount of horsepower for a new car.
Our scatter plot here suggests that a car with a fuel economy of 20 miles per gallon was able to have 280 more horsepower in 2006 than in 1980. On the other side, a car with 200 horsepower gets roughly 15 more miles per gallon in 2006 than in 1980.

c) Google Vision Plot

For this graphic, we need to use a new data set. I took data from the years 1980 to 2006 for six different cars in our dataset. In case there have been multiple data rows for the same year, I took the one with the highest mpg. The six cars are: - Honda Accord - Honda Civic - Toyota Corolla - GMC Grand Prix - Ford Mustang - GMC Corvette

I then saved the data into a file called gviscars.txt. We will now plot a Google Motion Chart again. This allows us to see how specific values have changed over the course of time. Some values are represented on the axis, others with color or by size of the bullet. Click the check button, followed by the Play button, to see the animation.

# We first load the prepared data
gviscars = read.table("gviscars.txt")
# Then we use the library googleVis, in order to have acces to the needed commands
library(googleVis) 
# Then we use the, gvisMotionChart() command to generate the html code and save it as mp. 
# within the command, we then select the loaded data, and specify how every attribute should be represented in our plot.
# It is important to set idvar to nameplate, since we are interested in different cars data. 
# we set timevar to year, because 'year' is the variable that depicts time in our data set.

mp = gvisMotionChart(gviscars, idvar = "nameplate",
                     timevar = "year", xvar = "hp", yvar = "mpg",
                     colorvar = "torque", sizevar = "curbwt")
plot(mp, tag = "chart")

If we now take a look at this motion chart, we can see the six different cars each being represented by a circle.

We can see that the circle representing the Corvette is the one with the lowest miles per gallon over the course of the sample. But it also is the car with the highest horsepower, fastest acceleration (you can display acceleration by changing one of the values, for example mpg to accel), highest weight and highest torque. In contrast to this, the cars with the highest miles per gallon values, the Honda Civic, has the lowest values for horsepower, torque, curb weight and the slowest acceleration. Therefore we could assume that there might be a relationship between these values and fuel economy. We will take a look at this in later exercises.

This exercise refers to page 3378 and 3379 of the paper.

Exercise 4: Theoretical Model

Before we start using our data to get some empirical results, we will think of a theoretical model.

As we already know, we don't have any sales data. Therefore we can't take sales into account.

What we can do though, is taking costs into account. If we assume the costs of producing a car $i$ with given attributes $mpg_{it}$,$w_{it}$,$hp_{it}$,$tq_{it}$ at a certain time $t$, this will be represented as a marginal cost function:

$$ c_{it} = C(mpg_{it},w_{it},hp_{it},tq_{it},t) $$

$c_{it}$ are the costs that will arise from this vehicle.

$mpg_{it}$ is the fuel economy of the to be produced car.

$w_{it}$ is the curb weight of the to be produced car.

$hp_{it}$ is the horsepower of the to be produced car.

$tq_{it}$ is the torque of the to be produced car.

$t$ is the year at which the car is to be produced, and will later be used to represent technological progress $T_t$.

For more information on marginal cost, click the info box.

info("marginal cost function") # Run this line (Strg-Enter) to show info

Since a normal car consists of more characteristics than $mpg_{it}$,$w_{it}$,$hp_{it}$ and $tq_{it}$, this is obviously not a very accurate representation of a car.

Therefore we will simply add some more characteristics.

If we differentiate between attributes that are related to fuel economy represented as $X_{it}$, and attributes related to other aspects of the vehicle represented as $Z_{it}$, this yields:

$$ c_{it} = C(mpg_{it},w_{it},hp_{it},tq_{it},X_{it},Z_{it}, t) $$

Attributes stored in $X_{it}$ might be a supercharger, a turbocharger or the kind of transmission used in the car.

Attributes stored in $Z_{it}$ are not related to fuel economy. These could be interior quality, a sun roof, a navigation system, a tow-bar and so on.

If we would like to estimate the Technological Progress in our current model, we can try to estimate how this function has changed over time. But there are two major problems:

(1): The dimension of $Z_{it}$, which is needed to control the changes in vehicle attributes across other dimensions, is very big.

(2): We have no cost data available. An obvious proxy would be price data.

But there is also a problem with price data. Given the numerous changes in the industrial structure of the automobile industry a concern when taking price data into account is that the estimates of technological progress would also capture changes in mark-ups over time. As a result, we will instead focus on the iso-cost curves (level sets) of the function.

Now, we would like to get a more precise model than this. One of the problems is that we cannot control the size of $Z_{it}$. If we now assume, that the attributes unrelated to fuel economy $Z_{it}$ are additively separable, this results that our function

$$ c_{it} = C(mpg_{it},w_{it},hp_{it},tq_{it},X_{it},Z_{it}, t) $$

changes to:

$$ c_{it} = C^{1}(mpg_{it},w_{it},hp_{it},tq_{it},X_{it},t) + C^{2}(Z_{it} ,t) $$

This allows us, to have two separate components of our marginal cost function:

$C^{1}$ which contains all the fuel economy related attributes

$C^{2}$ which contains the components of the function that are not related to fuel economy.

Because we are interested in how fuel economy has changed and $C^{2}$ does not contain any components related to fuel economy, we can ignore the $C^{2}$ part from here on.

Since we want to focus on the level sets of our function, we should transform our function into such. This yields to:

$$ mpg_{it} = f(w_{it},hp_{it},tq_{it},X_{it},t | C^{1} = \sigma) $$

The $C^{1} = \sigma$ part of the function represents that costs will be hold constant over the years.

If we now assume that Technological progress $T_t$ (it was represented as $t$ before in the function) is modeled as "input" neutral, we can multiply our function with $T_t$, yielding

$$ mpg_{it} = T_t f(w_{it},hp_{it},tq_{it},X_{it},\in_{it} | C^{1} = \sigma) $$

We can only make consistent estimations of our iso-cost curves, and how they have changed because of $T_t$, if the value of $C^{1}$ does neither change over time, nor within a year. In our Empirical models, the value of $C^{1}$ will be put into the error term $\epsilon_{it}$. In our empirical model we will also not take expenditures on technology into account. This might lead to two different sources of bias.

info("bias") # Run this line (Strg-Enter) to show info

First, if we want to estimate how our iso-cost curves have changed over time, therefore holding investments into technology constant, our estimated iso-cost curves will be biased in an unknown direction. On the one hand, if companies have increased their spending in technology, our curves will reflect not only technological progress, but also its increase. On the other hand, if companies have decreased their spending in technology, our curves will understate technological progress.

Another source of bias could arise from within-year variation in technology investments, if this variation is correlated with observed characteristics of our car. As a result, our relationships between fuel economy, engine power or weight will be biased.

Because the observed increase in fuel economy captures changes in the iso-cost curves due to technological progress and increases in how much firms are devoting to technology, the results should be interpreted in this light.

Besides the cost devoted to technologies, other factors make a difference in the relationship between fuel economy, engine characteristics and weight. As an example, can vehicles with a manual transmission achieve a higher fuel economy than automatic transmissions. This fact might change, if technology evolves further and more efficient automatic transmissions are invented. As our data allows, we will try to control a number of these factors, labelled as $X_{it}$.

Let's start with the empirical work in the next exercise.

This exercise refers to page 3371 and 3372 of the paper.

Exercise 5.1: Empirical Model: Cobb-Douglas

In this part of the Problem Set, we will focus on a Cobb-Douglas functional form to estimate the level sets.

a) Introducing Model

Before we are going to work with the more complicated models in the paper, it makes sense to look at an easier model first.

We assume there is a cost function representing the costs of producing a car with a given amount of fuel economy mpg, horsepower hp and torque torque. I know it is a very simple representation of a car, but the reason is to get an idea of how later models work. In 1928 Charles Cobb and Paul Douglas published their paper " A Theory of Production" in which they established a framework that has been widely accepted in empirical investigations.

A Cobb-Douglas production function is widely used to represent the relationship between two or more inputs and the amount of output generated by those inputs. If we assume, that all manufacturers have the same production elasticities and that substitution elasticities equal 1 we can use the Cobb-Douglas form.

The formula then looks like this:

$$\tilde c_{it} = mpg_{it}{}^\tilde\alpha * hp_{it}{}^\tilde\beta * torque_{it}{}^\tilde\gamma * \tilde T_t $$

info("Technoligcal Progress T") # Run this line (Strg-Enter) to show info

The problem with this formula is, that we don't have any data for costs nor price.

One way of solving this is to express one variable with the others. Since we are interested in how the fuel economy has changed over time, we will express fuel economy with horsepower and torque.

If we take the logarithm of our formula this results in:

$$ \ln \tilde c_{it} = \tilde\alpha * \ln mpg_{it} + \tilde\beta * \ln hp_{it} + \tilde\gamma * \ln torque_{it} + \ln \tilde T_t $$

Since we want to express fuel economy, we should bring mpg on one side of the equation:

$$ -\tilde\alpha * \ln mpg_{it} = \ln \tilde T_t + \tilde\beta * \ln hp_{it} + \tilde\gamma * \ln torque_{it} - \ln \tilde c_{it} $$

Now we multiply it with (-1):

$$ \tilde\alpha * \ln mpg_{it} = - \ln \tilde T_t - \tilde\beta * \ln hp_{it} - \tilde\gamma * \ln torque_{it} + \ln \tilde c_{it}$$

Since we want to express fuel economy we want it to be separated. We simply divide through $\alpha$ :

$$ \ln mpg_{it} = - \frac{\ln \tilde T_t}{\tilde\alpha} - \frac{\tilde\beta}{\tilde\alpha} * \ln hp_{it} - \frac{\tilde\gamma}{\tilde\alpha} * \ln torque_{it} + \frac{\ln \tilde c_{it}}{\tilde\alpha}$$

Now we just have to move the costs into the error term $\tilde\epsilon_{it}$ (for more information see the info box):

$$ \ln mpg_{it} = - \frac{\ln \tilde T_t}{\tilde\alpha} - \frac{\tilde\beta}{\tilde\alpha} * \ln hp_{it} - \frac{\tilde\gamma}{\tilde\alpha} * \ln torque_{it} + \tilde\epsilon_{it}$$

info("Error Term epsilon ") # Run this line (Strg-Enter) to show info

With $$- \frac{\ln \tilde T_t}{\tilde\alpha} = T_t $$ $$ - \frac{\tilde\beta}{\tilde\alpha} * \ln hp_{it} = \beta * \ln hp_{it} $$ $$ - \frac{\tilde\gamma}{\tilde\alpha} * \ln torque_{it} = \gamma * \ln torque_{it}$$

we get:

$$ \ln mpg_{it} = T_t + \beta * \ln hp_{it} + \gamma * \ln torque_{it} + \tilde \epsilon_{it} $$

These results are level sets. For further information, click the info box "level sets".

info("level sets") # Run this line (Strg-Enter) to show info

info("Logarithmic Transformation") # Run this line (Strg-Enter) to show info

For exercise 5.1, we will not take a look at Technological progress $T_t$. It will be discussed in Exercise 5.2

b) Loading Data

For the next few exercises, we need a special subset of our data: We will use the filter()command on dat and select all data sets with following characteristics:

and save them in a new variable called regdata. To do so, just click the check button.

# We load the same data again.
dat = read.dta("Steroids_AER_data_post.dta")
# Then we kick the trucks and outlier out of our data.
regdata = filter(dat, d_truck==0 & outlier==0)

With d_truck==0 we can assure that only data from cars are used and outlier==0 makes sure that outliers are not used in our regressions.

c) Model 1

After getting an overview of the data in earlier exercises and getting an idea on how the models work in exercise a), we will now use the knowledge in order to improve the introducing model.

Let's take our easy model, and assume a "more complicated" car. Our car still contains of fuel economy ($mpg_{it}$), horsepower ($hp_{it}$) and torque ($tq_{it}$), but let's add curb weight ($curbwt_{it}$). Since there are still very few car attributes represented (missing ones might be transmission, exhaust system,...) we should take more into account. To keep it simple, we add a term $X_{it}$ in which we store other attributes related to fuel economy.

A formula for this might look like this: $$c_{it} = mpg_{it}{}^\alpha * hp_{it}{}^\beta * torque_{it}{}^\gamma * curbwt_{it}{}^\delta* X_{it}{}^B * T_t $$

The vector $B$ captures the estimated values for the characteristics represented in $X_{it}$.

After transforming the same way as our example, this yields:

Model 1: $$ \ln mpg_{it} = T_t + \delta \ln curbwt_{it} +\beta \ln hp_{it} + \gamma \ln tq_{it} + X_{it}B + \tilde \epsilon_{it} $$

We can now try to calculate values for the given variables using a regression: In our data, we have existing groups (mfr) in which the values might be correlated. For example may cars from the same manufacturer share parts or technology (a combination is possible as well) and therefore the values might be correlated within the groups. This would result in the fact that regular OLS standard errors are biased. We can correct this, by using clustered standard errors. If we still assume that the values are uncorrelated across groups, we are able to use clustered standard errors. For more information, see the into box.

info("clustered standard errors") # Run this line (Strg-Enter) to show info

Within the package lfe the command felm() allows us a relatively easy implementation of clustered standard errors. To see how the felm() command is structured, click the info box.

info("felm()") # Run this line (Strg-Enter) to show info

Since we know from exercise 1 that dummy variables can be used in an classic linear regression just as any other explanatory variable yielding standard OLS results, we can take the values of the following dummy variables d_manual+time_d_manual+d_diesel+d_turbo+d_super, and interpret them as our additional vector $X_{it}$.

This would then result to the following regression:

# In order to use the felm command, we first need to load the package lfe
library(lfe)
# The stargazer package is needed, to show the restults in a nice way
library(stargazer)

# we use the felm command to express lmpg with other variables, adding year as a factor, clustered by mfr. 
# as data we use the recently loaded data, regdata
reg1 = felm(lmpg ~ 
              lcurbwt+lhp+ltorque+
              d_manual+time_d_manual+d_diesel+d_turbo+d_super | year |0| mfr, data = regdata)
# Now we just need to show the values for reg1
# to show it in a nice html format, we use stargazer
stargazer(reg1, type = "html")

If we take a look at the estimates given by this regression, we can assume a first interpretation of the values:

Ceteris paribus, a 10 percent increase in weight (curbwt) is associated with a 3.977 percent decrease in fuel economy.

The same interpretation is given for horsepower: All else equal, a 10 percent increase in horsepower is associated with a 3.241 percent decrease in fuel economy.

For torque the relationship is not precisely estimated, which we are able to tell by the Signif. codes, but a 10 percent increase in torque is associated with a 0.19 percent decrease in fuel economy.

info("interpretation log-log regression") # Run this line (Strg-Enter) to show info

d) Endogeneity

A variable is called exogenous if it is not correlated with the error term. For example if we assume that torque would be an exogenous variable then:

$$ Cor(\tilde\epsilon_{it}, torque_{it}) = 0 $$

If this is the case, the regression should show the real relationship.

In a statistical model, an endogenous variable is one that is correlated with the error term. In our case, $\tilde \epsilon_{it}$ captures the unobserved costs. Let's think of this scenario:

A Ferrari is typically a very expensive car with a lot of horsepower. Spending more money on a Ferrari would buy the customer more horsepower. But in our model, you can also get more fuel economy by spending more money. As a result, the correlation between horsepower and our error term looks as following:

$$ Cor(\tilde\epsilon_{it}, hp_{it}) \neq 0 $$

This results in horsepower being an endogenous variable. If we have an endogenous variable, all OLS estimators will (typically) be inconsistent and biased.

Source: Wooldridge, Jeffrey M. (2013). Introductory Econometrics: A Modern Approach (Fifth international ed.). Australia: South-Western. pp. 92 and 303.

Source: Herbert Stocker: Methoden der Empirischen Wirtschaftsforschung Chapter 13. link

e) Model 2

As you've seen, we might have endogeneity in our model. One way to fix it, is by using Panel data (see the info box) and adding fixed effects (see the info box). If we now think that cars have changed over time, but are constant across manufacturers we can add fixed manufacturer effects to Model 1.

info("Panel data") # Run this line (Strg-Enter) to show info

info("Fixed Effects") # Run this line (Strg-Enter) to show info

Let's remember what we have learned in exercise 4:

We are using a marginal cost function which consisted of variable as well as fixed costs, and these costs have been moved into the error term $\tilde \epsilon_{it}$.

Our costs in $\tilde \epsilon_{it}$ therefore consists of a manufacturer specific factor $\bar c_i$ and a car specific factor $k_{it}$.

$$\tilde \epsilon_{it} = \bar c_i + k_{it}$$

By adding manufacturer fixed effects, we eliminate the manufacturer specific component of our error term, yielding

$$\epsilon_{it} = k_{it}$$

By doing so, we try to reduce/eliminate endogeneity.

With 'felm' these fixed effects are relatively easy to implement. We simply add mfr to our factor part of the command.

Now it's your turn: Please use the felm command the same way as before to create reg2, but add mfr as a second factor with a + besides year and use regdata for the data .

# use felm to create reg2. 
# express lmpg with lcurbwt+lhp+ltorque+d_manual+time_d_manual+d_diesel+d_turbo+d_super
# add year and mfr as factors (year + mfr)
# cluster by mfr
# use regdata.

Great work on that regression.

To show your results next to the ones from the first regression, click Check

stargazer(reg1, reg2, column.labels=c("OLS","Fixed Effects"), type = "html")

Let me ask you some questions on the results of regression 2:

Question 1:

! addonquizq3

Question 2:

! addonquizsingle

Question 3:

Look at lhp. Add the correct parts to the sentence: "A 10 percent (answer1) in lhp is associated with a (answer2) percent increase in fuel economy"

! addonquizparts

The coefficients associated with manual transmissions and diesel engines suggest fuel economy savings for these two attributes. For our Cobb-Douglas Models, the increase in fuel efficiency from diesel technology is between 19 and 21 percent.

The negative estimates for time_d_manual suggest that the gains from a manual transmission are estimated to fall over time. This might indicate that either more and more cars are equipped with an automatic transmission or the efficiency of those transmissions increased. A combination of both is possible too. Early in our sample, a manual transmission suggests savings between 8.7 and 10 percent.

Since the efficiency gains of automatic transmissions, in relation to manual transmissions, can also be represented as technological improvements specific to automatic transmission, we can try to think of it as some kind of technological progress.

We can also see, that the estimated trade-offs (and as you will see in further exercises the technological progress too) only chance little when manufacturer fixed effects are included. This suggests that any additional endogeneity concerns are likely to be small.

After we now got an idea about how trade-offs between fuel economy and other vehicle characteristics work, we will now take a closer look at Technological Progress $T_t$ in the next exercise.

This exercise refers to page 3372 and 3381 of the paper.

Exercise 5.2: Cobb-Douglas: Technological Progress

As you might have already noticed, we did not touch on the $T_t$ of our formula yet. $T_t$ is the estimator for Technological progress. It should capture the progress that occurred in a certain year $t$ and is, in our models, modeled nonparametrically as a set of year fixed effects.

Technological progress does not only represent increases in engine technology, but also improvements regarding for example transmissions, rolling resistance, aerodynamics or even fuel composition. As you might see, some of these effects cannot be influenced by manufacturers or customers. Beginning in the 1980 numerous technologies were established in newly produced cars. Some of these progresses on the engine side have been for example replacing carburetors with fuel injection, or adding manual cylinder deactivation. Both lead to great improvements in fuel economy. If we compare a modern engine with an engine from around 1980 a modern engine has a camshaft, which is responsible for lifting the valves during its rotation and is placed above the engine head thus eliminating friction. Many new cars have multiple valves per cylinder as well as variable valve timing. While multiple valves allow a smoother flow of the fuel/air mixture within the engine, variable valve timing allows the engine to adjust to driving conditions. Both a supercharger and a turbocharger use a turbine to force more air into the engine, resulting in an increase in efficiency. Within the last years, cylinder deactivation and hybrid technology are becoming more and more common. Hybrid technology is a combination of a traditional engine with an electric motor. This allows a car to run on only the engine or the electric motor or both. The electric motor can be used as long as enough electricity stored in a battery. The battery charges while the car is in motion and the electric motor is not used. Obviously having the possibility to "generate" energy that is later used to move the vehicle without using any fuel, has an immense positive impact on fuel economy. Cylinder deactivation allows a car to not use all of its cylinders if they are not needed. Therefore an increase in fuel economy is obvious. Not all improvements are directly related to the engine though. For example may advanced materials like Carbon, innovations by tire manufacturer or better lubricants from suppliers lead to efficiency improvements as well.

To get some values for this estimator, we would ideally like to get an increasing value for each year. This would imply that technological progress always had a positive effect on fuel economy.

If we now get back to our models, there is one question: How are we able to estimate all of these huge improvements over all these years?

Before we can do anything, we need to load the data as usual.

# first, we should load the data 
dat = read.dta("Steroids_AER_data_post.dta")
regdata = filter(dat, d_truck==0 & outlier==0)

a) Technological progress in model 1

After loading the data again, we will estimate technological progress using our model 1. Since technological progress is a set of year fixed effects, we will simply display the values for year in our model.

# Let's look at our old regression first: 
# we had "lmpg ~ lcurbwt+lhp+ltorque+d_manual+time_d_manual+d_diesel+d_turbo+d_super | year |0| mfr" in our felm command. 
# if we now like to get a value for every year in this regression, we have to take the `year` paramenter into account. 
# because of this, we have to add "year" as a factor to our data. If we don't do that step we would only get 1 value for year, but we would like to see how it has changed. 

regdata$year <- factor(regdata$year)

# Next we replace the `year` in the factor part of the felm command with a 0, and add year to our parameters.
reg1t = felm(lmpg ~ 
               lcurbwt+lhp+ltorque+
               d_manual+time_d_manual+d_diesel+d_turbo+d_super+year |0|0| mfr, data = regdata)
# As the last step, we display the results using stargazer
stargazer(reg1t, type = "html")

As you can see, we now have a value for every year. This value represents the level of technology in this given year based on year 1980. Because we already have the values for $curb weight$ , $horsepower$ , $torque$ and $X_{it}$, and they didn't change, we are for now only interested in the values of technology. In order to only have the values we need, we will extract them from the results. For this task the tidy() command from the package broom helps us to make this task more convenient.

info("tidy()") # Run this line (Strg-Enter) to show info I did this already for you. Just click the check button.

If you want to see what is stored in TECH_PROG_MOD1, you have to simply undomment the last line in this command.

# We use the `tidy` command from the "broom" package to create a nicer looking appearance.  
library(broom)
M1t <- tidy(reg1t)

# lastly, we need to extract the data regarding year from our data frame "M1t"
TECH_PROG_MOD1 = M1t[10:35, c('term', 'estimate')]

# in case you want to see how this looks like, just uncomment the next line
# TECH_PROG_MOD1

We do now have an estimation for Technological Progress in model 1, saved as TECH_PROG_MOD1 (Technological Progress Model 1).

b) Technological progress in model 2

But since having only estimations for 1 model are hard to evaluate, we will get these values for model 2 too.

Your task now is, to create the regression in order to get $T_t$ for model 2:

# change the felm command as I did in the example.
# You have to leave the mfr as a factor, because this represents the Manufacturer Fixed effects we already added.
# In case you don't remember the command for Model 2, here is it again:
# felm(lmpg ~ lcurbwt+lhp+ltorque+d_manual+time_d_manual+d_diesel+d_turbo+d_super | year+ mfr |0| mfr, data = regdata)
# change it accordingly, then save it as `reg2t`

Good Job on this regression again.

To show both results from reg1t and reg2t side by side, click check.

stargazer(reg1t, reg2t, column.labels=c("OLS","Fixed Effects"), type = "html")

After we now have the values for each year in both models, we only have to extract them for model 2.

I have already prepared the needed code for you, so simply click check once more.

If you would like to see how TECH_PROG_MOD2 looks, just uncomment the last line of code.

# Use tidy to create M2t from reg2t
M2t <- tidy(reg2t)
# Take rows 9:34 from M2t and save it as TECH_PROG_MOD2
TECH_PROG_MOD2 = M2t[9:34, c('term', 'estimate')]
# in case you want to see how this looks like, just uncomment the next line
# TECH_PROG_MOD2

Good Job on that regression. We are now able create a plot of the two estimates for $T_t$. This will be done in exercise c).

c) Comparison

Let's set these two estimates into relation:

In order to plot both of the results, we will first create a data frame containing the values of TECH_PROG_MOD1 and TECH_PROG_MOD2.

As in previous exercises, if you would like to see how p12 looks, just uncomment the last line of code.

# first we create a vector containing the years
Year = 1981:2006
# then we create a vector for the estimates we got from our regressions and saved 
# as TECH_PROG_MOD1 and TECH_PROG_MOD2
Model1 = TECH_PROG_MOD1$estimate
Model2 = TECH_PROG_MOD2$estimate
# now we have 3 vectors, one for the years, and one for each models technological progress estimates
# we now only have to save them into a data frame. 
# cbind takes a sequence of vector, matrix or data frames arguments and combine by columns
# in our case the three vectors 
# to be able to plot them, we need to transform them into a data.frame with data.frame()
p12 = data.frame(cbind(Year,Model1,Model2))
# p12

After this step we have a data frame p12 containing the values of TECH_PROG_MOD1 and TECH_PROG_MOD2 as well as an indicator for the year. As a result, we can now easily plot the values for for $T_t$ according to model 1 and model 2.

# Then we use the ggplot command from `ggplot2`to create the plot
progress12 <- ggplot(aes(x=Year,y=Estimate,colour = "Model No"), data=p12) + 
# now we will just add the lines
  geom_line(aes(y = Model1, colour = "Model 1")) + 
  geom_line(aes(y = Model2, colour = "Model 2"))
# we simply show the plot
progress12

As you can see, the two models result in very similar estimates for Technological Progress $T_t$. This also indicates that any additional endogeneity concerns are likely to be small.

We can see that early in the sample (year 1981 to 1986) the increase of progress was greatest. This is consistent with what we estimated in exercise 2. After these years, progress is still increasing considerably, but has slowed down.

d) Possible fuel economy in 2006

The really interesting part is that we are now able to estimate how fuel economy in year $t$ compare to fuel economy in 1980 if we had held size and power constant. To do so, we will hold the values for $\ln curbwt$, $\ln hp$, $\ln tq$ and $X$ on their 1980 level, and only change $T_t$.

To calculate this, we assume the possible fuel economy in year $t$ as $\widetilde {mpg_{it}}$

According to our model, this would yield: $$\widetilde {\ln mpg_{it}} = T_t + \delta \ln curbwt_{i1980} +\beta \ln hp_{i1980} + \gamma \ln tq_{i1980} + X_{i1980}B + \tilde \epsilon_{i1980}$$

and the fuel economy in year 1980 as

$$\ln mpg_{i1980} = T_0 + \delta \ln curbwt_{i1980} +\beta \ln hp_{i1980} + \gamma \ln tq_{i1980} + X_{i1980}B + \tilde \epsilon_{i1980}$$

If we now want to calculate the increase in mpg possible by 2006, this would yield to

$$G = \widetilde {\ln mpg_{it}} - \ln mpg_{i1980} = (T_t + \delta \ln curbwt_{i1980} +\beta \ln hp_{i1980} + \gamma \ln tq_{i1980} + X_{i1980}B + \tilde \epsilon_{i1980}) - (T_0 + \delta \ln curbwt_{i1980} +\beta \ln hp_{i1980} + \gamma \ln tq_{i1980} + X_{i1980}B + \tilde \epsilon_{i1980}) $$

This would then equal:

$$ G = T_t - T_0 \overset{T_0 = 0}{=} T_t$$

This way we can say that our estimates for $T_t$ are the increase in log fuel economy by year $t$ compared to 1980.

Therefore we can say that, for Model 1, the log of fuel economy is over 0.52174952 greater in 2006, compared to 1980. A similar interpretation for Model 2 is that the log of fuel economy is over 0.51150664 greater in 2006, compared to 1980.

We will now estimate how the fuel economy of a car with characteristics of 1980 would look like in 2006 regarding Model 1.

To do so we will calculate the possible fuel economy in 2006, using the characteristics of 1980. The idea is, that if we keep all the values except for $T_t$ on their 1980 levels, we can use our estimates for $T_t$ to calculate the log of fuel economy in each year $t$.

Let's assume a fictive car with the mean values for our attributes in 1980: We therefore have to save the mean values of our attributes in the year 1980 into a vector.

# to get these values, we need to take all the cars from 1980: 

cars1980 = filter(dat, d_truck == 0, outlier == 0, year == 1980)


# since you already know how to calculate means from Exercise 1, 
# we this time will save all the means in a vector called means1980.
# it contains the mean values for the relevant attributes we used in the regression. 

means1980 = c(mean(cars1980$lcurbwt),
              mean(cars1980$lhp),
              mean(cars1980$ltorque),
              mean(cars1980$d_manual), 
              mean(cars1980$time_d_manual), 
              mean(cars1980$d_diesel),
              mean(cars1980$d_turbo), 
              mean(cars1980$d_super))
# In case you would like to see the saved attributes, uncomment the next line.
# means1980

Now, after we got the mean values, we still need the values for our estimators $\delta$, $\beta$, $\gamma$ and $B$. These are the coefficients we got from our first regression.

# since we want to have the coefficients of our regression, we can use the command `coef()` to get them.
# This command saves the coefficients into a data frame called datreg1
datreg1 = data.frame(coef(reg1t))
show(datreg1)

Since we by now only have all the values, we would like to filter out the ones we need. We need the values of $\delta$, $\beta$, $\gamma$ and $B$, as well as the values for $T_{2006}$ and the constant:

# We can retrieve values by declaring the index inside a single square bracket "[row,column]" operator.
# get the constant
const = datreg1[1,1]
const
# get the coefficients of our regression
coef = datreg1[2:9,1]
coef
# get the value for Technological progress in Year 2006
T2006 = datreg1[35,1]
T2006

First we will try to get a rough estimation on how good our regression results are:

Therefore we will use the values of our regression, to estimate the mean fuel economy in 1980. Then we compare it with the real fuel economy from the data.

To calculate the mean fuel economy in 1980 with our model, we need to multiply our coefficients (coef) with the mean values of 1980 (means1980), and add the constant (const). The 0 represents the value for $T_{1980}$.

# here we are estimating the mean fuel economy in 1980 with our model
# sum(coeff*means1980) is just the multiplication of the two vectors
# one contains the coefficients, the other the mean values of 1980.
reglmpg1980 = 0 + sum(coef*means1980) + const
reglmpg1980
# this command provides us the real value from our data.
mean(cars1980$lmpg)

So the value we get from our regression is 3.106287, which is equal to the mean values of this year. We can see that the results of our models are equal to the real value, therefore our calculation appears to be correct.

If we now want to estimate how the same fictive car would ceteris paribus look like in year 2006, we have to use the estimated $T_{2006}$ instead of $T_1980$

# we use the same calculations as before, but we change T to the value of 2006
tildelmpg2006 = T2006 + sum(coef*means1980) + const
tildelmpg2006

Now, that we have those results we can see that ceteris paribus the log of fuel economy in 2006 would be 3.628037. This would mean, that the log of fuel economy in 2006 is 0.512 greater in 2006, compared to 1980. As you might have already realized, this is exactly $T_{2006}$.

If we would like to estimate the increase as percentages, we can take the values for $\widetilde {mpg_{t}}$ (note that this is not $\ln \widetilde {mpg_{t}}$ anymore).

$$ \% increase = \dfrac{\widetilde {mpg_{2006}} -mpg_{1980}}{mpg_{1980}}$$

Since

$$\ln(exp(x)) = x $$ $$exp(\ln(x)) = x $$

we can say that: $$ \widetilde {mpg_{t}} = exp(\ln \widetilde {mpg_{t}}) $$

Now, that we have a value for fuel economy in 2006, we can compare $\widetilde {mpg_{2006}}$ with the mean values of $mpg_{1980}$:

# exp(tildempg2006) provides us with the value of possible fuel economy in 2006, by eliminating the log.
# then we just calculate the percentage increase compared to 1980.
progressm1 = (exp(tildelmpg2006)-mean(cars1980$mpg))/mean(cars1980$mpg)
progressm1

Taking the results from Model 1, we can say that an increase in fuel economy by ~64.4 percent could have been possible.

The increase for model 2 will be discussed in exercise 6.3.

This exercise refers to page 3373, 3382, 3384 and 3385 of the paper.

Exercise 6.1: Robustness: Cobb-Douglas

After we have managed to get an idea of $T_t$, one might be concerned that a supercharger or a turbocharger are some kind of "technological progress". If this is the case, they should not be considered in our regression, because they will already be represented in $T_t$. So let's take a look at this.

a) Loading data

As usual, we load our data.

# We load the same data again.
dat = read.dta("Steroids_AER_data_post.dta")
# Then we kick the trucks and outlier out of our data.
regdata = filter(dat, d_truck==0 & outlier==0)

b) Market penetration of superchargers & turbochargers

Before we think about changing our models, we will take a look at the market penetration of turbochargers and superchargers.

To get an idea, we are going to plot the market penetration. Because d_turbo and d_super are dummy variables, the mean value in a given year equals the percentage of cars having the qualitative phenomenon in this year.

Click check to see an example.

# we take the same data we already used for the past regressions.
# remember exercise 1c) we do the same thing here.
pen = summarise(group_by(regdata, year), d_super=mean(d_super), d_turbo=mean(d_turbo))
# now we need to plot the penetration. We save it as pen1
pen1 = ggplot(aes(x=year, y=d_super), data = pen) + geom_line() +ggtitle("Superchager Penetration")
# then we need to show pen1
pen1

We can see that the market penetration of superchargers started in 1988. Since then it has increased with a small decrease in 1996. In the years after 1996, we can see that more and more cars have been equipped with a supercharger.

Now it is your turn. Do the equivalent plot for turbocharger.

# plot the market penetration for turbocharger(d_turbo), with main = "Turbocharger Penetration", xlab="Year 1980-2006"
# Look at the graph for Superchager Penetration. 
# change the y variable to the turbocharger variable (d_turbo), and the title to "Turbocharger Penetration"
# then display the plot

What we can see here is, that the market penetration of new cars regarding turbochargers has increased drastically early in the sample. After a big downfall between 1989 and 1996, the amount of new cars equipped with a turbocharger increased drastically again.

Now, regarding both market penetrations, we can say that more and more cars are equipped with a supercharger or turbocharger, especially in the later years of our sample.

! addonquizm3

If we don't take d_super and d_turbo into account, we allow our estimates of technological progress to reflect their increased penetration, as well as their effect on fuel economy.

This transforms our iso-cost curve to:

Model 3: $$ \ln mpg_{it} = T_t + \delta \ln curbwt_{it} +\beta \ln hp_{it} + \gamma \ln tq_{it} + X'{it}B + \epsilon{it} $$

The difference between $X_{it}B$ and $X'{it}B$ is, that d_superand d_turbo are not included in $X'{it}B$

For this regression, we will use the felm() command to estimate the relationship between lmpg ~ lcurbwt+lhp+ltorque+d_manual+time_d_manual+d_diesel with the factors year and mfr clustered by mfr. We are going to use regdata as data: For our regression, compared to Model 2, this changes the part of the independent variables: We simply leave out d_super and d_turbo:

# we use the felm command again. In comparison to Model 2, we now leave out d_super and d_turbo  to represent the penetration 
reg3 = felm(lmpg ~ 
              lcurbwt+lhp+ltorque+
              d_manual+time_d_manual+d_diesel | year+ mfr |0| mfr, data = regdata)
# we show the results
stargazer(reg3, type = "html")

The Cobb-Douglas results imply that, ceteris paribus, a 10 percent decrease in weight is associated with a 4.19 percent increase in fuel economy. Large fuel efficiency gains are also correlated with lowering horsepower; all else equal, a 10 percent decrease in horsepower is associated with a 2.62 percent increase in fuel economy. The relationship between fuel economy and torque is small and not precisely estimated; a 10 percent increase in torque is correlated with a 0.45 percent increase in fuel economy.

For the discussion of this models Technological Progress, see Exercise 6.3.

This exercise refers to page 3372 and 3381 of the paper.

Exercise 6.2: Robustness: Translog

The assumptions made by the Cobb-Douglas model, are very restrictive. Therefore we will use a more flexible model in this exercise: the Translog production function.

A general Translog function can look like this:

$$ \ln y = \alpha_0 + \sum_{i} \alpha_i + \ln X_i + \dfrac{1}{2} \sum_{i} \sum_{j} \gamma_{ij} \ln X_i \ln X_j$$

info("Translog") # Run this line (Strg-Enter) to show info

The advantage a translog function has over a Cobb-Douglas Model is the flexible functional form. This results in fewer restrictions on production elasticity and substitution elasticizes. But the disadvantages are that our results are more difficult to interpret. Therefore we will not interpret them as detailed as we did for the first three models.

A translog function is a generalization the Cobb-Douglas production function and therefore we can use the same way of transforming the cost function into level sets as we did in Exercise 5.1 a).

This results to: $$ \ln mpg_{it} = T_t + f(curbwt,tq,hp) + X_{it}B + \epsilon_{it} $$

which is equal to:

$$ \ln mpg_{it} = T_t + \beta_1 \ln curbwt_{it} +\beta_2 \ln hp_{it} + \beta_3 \ln tq_{it} + \\ \gamma_1(\ln curbwt_{it})^2 +\gamma_2(\ln hp_{it})^2 + \gamma_3(\ln tq_{it})^2 +\\ \delta_1\ln curbwt_{it}\ln hp_{it} + \delta_2\ln curbwt_{it}\ln tq_{it} + \delta_3\ln hp_{it}\ln tq_{it} + X_{it}B + \epsilon_{it} $$

a) Loading Data

we load the same data as for Cobb-Douglas. Simply click check:

dat = read.dta("Steroids_AER_data_post.dta")
regdata = filter(dat, d_truck==0 & outlier==0)

b) Model 4

For the first translog model we will take the same assumptions we made for the first Cobb-Douglas model.

! addonquizt1

Since you now know again which characteristics are part of our Cobb-Douglas models, this is how the level sets of our translog model looks like:

$$ \ln mpg_{it} = T_t + \beta_1 \ln curbwt_{it} +\beta_2 \ln hp_{it} + \beta_3 \ln tq_{it} +\\ \gamma_1(\ln curbwt_{it})^2 +\gamma_2(\ln hp_{it})^2 + \gamma_3(\ln tq_{it})^2 +\\ \delta_1\ln curbwt_{it}\ln hp_{it} + \delta_2\ln curbwt_{it}\ln tq_{it} + \delta_3\ln hp_{it}\ln tq_{it} + X_{it}B + \epsilon_{it} $$

The difference between this, and the "old" Cobb-Douglas level sets, is that the translog level set has the functional part

$$ ...+\gamma_1(\ln curbwt_{it})^2 +\gamma_2(\ln hp_{it})^2 + \gamma_3(\ln tq_{it})^2 + \delta_1\ln curbwt_{it}\ln hp_{it} + \delta_2\ln curbwt_{it}\ln tq_{it} + \delta_3\ln hp_{it}\ln tq_{it}+...$$

added.

This allows us to have less restrictions on production elasticities and subsitution elasticities, but makes the results difficult to interpret.

Our data has columns called lhp2, lcurbwt2, ltorque2, which are equal to $(\ln curbwt_{it})^2$, $(\ln hp_{it})^2$ and $(\ln tq_{it})^2$. The same applies for lcurbwt_lhp, lcurbwt_ltorque and lhp_ltorque. They are equal to $\ln curbwt_{it}\ln hp_{it}$, $\ln curbwt_{it}\ln tq_{it}$ and $\ln hp_{it}\ln tq_{it}$.

$X_{it}$ is the same as in model 1, it contains the dummy variables of characteristics related to fuel economy d_manual+time_d_manual+d_diesel+d_turbo+d_super.

Now, we will use a regression again to calculate the coefficients under the Translog assumption:

# we load a package again
library(lfe)
# we use the felm command to save the regression as reg4  
reg4 = felm(lmpg~ 
              lcurbwt+ lhp+ ltorque+ 
              lhp2+ lcurbwt2+ ltorque2+ 
              lcurbwt_lhp+ lcurbwt_ltorque+lhp_ltorque+ 
              d_manual+ time_d_manual+ d_diesel+ d_turbo+ d_super | year |0| mfr, data = regdata)
# we use stargazer to show the results in a nice html format
stargazer(reg4, type="html")

If we now take a look at the results, one of the first things to realize might be, that all the Standard Errors for lcurbwt, lhp and ltorque are way bigger than for the Cobb-Douglas Model.

Since there might be the same problem with endogeneity as in our Cobb-Douglas Model, let's see what happens if we add manufacturer fixed effects to our translog model.

c) Model 5

In this exercise we will add manufacturer fixed effects to our recently developed translog model.

Therefore we repeat the steps we did for Model 2 and add manufacturer fixed effects to our Translog Model:

It will now be your task to create the felm()- command for this regression. Simply do it the same way as you did before with the Cobb-Douglas model. In case you don't know why and how you did it, you can either go back to exercise 5.1 e) or follow the given instructions here.

# save reg5 as a felm command. 
# lmpg should be described as lcurbwt+ lhp+ ltorque+ 
# lhp2 +lcurbwt2 +ltorque2+ 
# lcurbwt_lhp+ lcurbwt_ltorque +lhp_ltorque 
# +d_manual +time_d_manual +d_diesel+ d_turbo +d_super
# add manufacturer and year fixed effects to our translog model, by adding the factor year + mfr as factors. 
# don't forget to cluster by manufacturer

To compare the results, click check

stargazer(reg4, reg5, column.labels=c("OLS","Fixed Effects"), type = "html")

If we look at the Standard Errors of this model, how did they change?

Question 1:

! addonquiztranslog1

Question 2:

! addonquiztranslog2

d) Model 6

If we now take the increased market penetration for turbocharger and supercharger (see exercise 6.1) into account, and therefore eliminate them from our regression, this step yields to:

We had: $$ \ln mpg_{it} = T_t + \beta_1 \ln curbwt_{it} +\beta_2 \ln hp_{it} + \beta_3 \ln tq_{it} +\\ \gamma_1(\ln curbwt_{it})^2 +\gamma_2(\ln hp_{it})^2 + \gamma_3(\ln tq_{it})^2 + \\ \delta_1\ln curbwt_{it}\ln hp_{it} + \delta_2\ln curbwt_{it}\ln tq_{it} + \delta_3\ln hp_{it}\ln t_{it} + X_{it}B + \epsilon_{it} $$

Now that we change $X_{it}B$ from consisting of d_manual , time_d_manual , d_diesel , d_turbo , d_super to $X'_{it}B$ which only contains: d_manual , time_d_manual , d_diesel this results into our final Translog form:

$$ \ln mpg_{it} = T_t + \beta_1 \ln curbwt_{it} +\beta_2 \ln hp_{it} + \beta_3 \ln tq_{it} +\\ \gamma_1(\ln curbwt_{it})^2 +\gamma_2(\ln hp_{it})^2 + \gamma_3(\ln tq_{it})^2 + \\ \delta_1\ln curbwt_{it}\ln hp_{it} + \delta_2\ln curbwt_{it}\ln tq_{it} + \delta_3\ln hp_{it}\ln t_{it} + X'{it}B + \epsilon{it} $$

The regression, for the Translog Model with Manufacturer Fixed Effects and the market penetration of turbocharger and supercharger looks like this:

reg6 = felm(lmpg~
              lcurbwt+ lhp+ ltorque+ 
              lhp2 +lcurbwt2 +ltorque2 +
              lcurbwt_lhp+ lcurbwt_ltorque+ lhp_ltorque +
              d_manual +time_d_manual +d_diesel | year + mfr |0| mfr, data = regdata)
stargazer(reg4, reg5, reg6, column.labels=c("OLS","Fixed Effects","Fixed Effects no turbo/super"), type = "html")

After we now have all our results for the regression, we can see that the standard errors of all models are bigger than for Cobb-Douglas.

It appears that the Translog assumption does overparameterize the iso-cost curve.

The coefficients associated with manual transmissions and diesel engines suggest fuel economy savings for these two attributes. For our Translog Models, the increase in fuel efficiency from diesel technology is between 24 and 27 percent. The gains from a manual transmission are estimated to fall over time, since more and more cars are equipped with an automatic transmission. Early in our sample, a manual transmission suggests savings between 7.6 and 8.7 percent. Since the efficiency gains of automatic transmissions, in relation to manual transmissions, can also be represented as technological improvements specific to automatic transmission, we can try to think of it as some kind of technological progress.

This exercise refers to page 3372 and 3381 of the paper.

Exercise 6.3: Robustness: Technological Progress

Since are not only interested in trade-offs between the different attributes, but also in how technology has changed over the course of our data, we are now using estimates for technological progress across all our models.

Because we will later need the values of cars in 1980, I already prepared the needed data for you. Simply click check.

dat = read.dta("Steroids_AER_data_post.dta")
cars1980 = filter(dat, d_truck == 0, outlier == 0, year == 1980)

In order to make this exercise more convenient for you, I already did all Technological Progress estimators for this Exercise beforehand. It was done the same way as we just did for model 1 and 2 in exercise 5.2, just for every model. To get the required estimations, use read.table() to read the file "Progress.txt" and save it in a variable called progress16 (means: Progress Model 1-6)

# use read.table to save "Progress.txt" as progress16

After we now loaded the Technological Progress estimates, we surely want to take a look at them.

#Use the show() command on progress16 to view the estimates

As we look at these estimates, first of all we might realize that all these values are pretty close to each other across all models. This corresponds to the results we had earlies when just looking at model 1 and 2. To give you an idea of how close the Estimations are across all models, click the check button to see a graph on this.

progressplot <- ggplot(progress16, aes(x=Year, y= Progress, colour = "Model No")) + 
  geom_line(aes(y = Model.1, colour = "Model.1")) + 
  geom_line(aes(y = Model.2, colour = "Model.2")) +
  geom_line(aes(y = Model.3, colour = "Model.3")) + 
  geom_line(aes(y = Model.4, colour = "Model.4")) + 
  geom_line(aes(y = Model.5, colour = "Model.5")) + 
  geom_line(aes(y = Model.6, colour = "Model.6")) 

progressplot

As we already noticed, the technological progress estimates are very similar across models.

If we take a closer look at the graph, we can see that early in the sample (year 1981 to 1986) the increase of progress was greatest. This is consistent with what we estimated in exercise 2. Another reason beside the CAFE standards might be, that early in the sample the gasoline prices were high, and therefore the industry had to come up with ideas to increase fuel economy in order to sell their cars. After these years the technological progress is still increasing considerably, but the increase has obviously slowed down. All results for the Cobb-Douglas Models (1-3) are significant on the one percent level. The same applies to our Translog Models (4-6). It is interesting that even though these results are very similar, we can still see small differences between the different models. Our Cobb-Douglas models yield slightly higher estimates of progress over the year than the Translog models. This might come from the functional part of the translog function. All of the models imply that, conditional on weight and power characteristics, the log of fuel economy is at least over 0.485 greater in 2006, compared to 1980.

Since $T_t$ is the absolute increase of $lmpg$ in year $t$, we can estimate the percentage increase in every model pretty easy.

This is a faster way of estimating the percentage increase as we did it in exercise 5.2.

First off, we will extract the values for $T_{2006}$ from our estimates progress16.

T2006 = progress16[26,2:7]
T2006

Because this is the absolute increase, we can add it to the mean values of lmpg in year 1980. Afterwards we will calculate the absolute increase in lmpg for every model. Your task is to now add the mean values of lmpg from cars1980 to our recently created T2006.

# create a new variable called lmpgtilde
# then add the mean value for lmpg from cars1980 to T2006
# display lmpgtilde

As you can see, these are the different values for $\ln \widetilde {mpg_{2006}}$ across our models if we held other characteristics on their 1980 level.

Now it is very easy to calculate the percentage increase for each model.

percentincrease = (exp(lmpgtilde) - mean(cars1980$mpg)) / mean(cars1980$mpg)
percentincrease

If we now take a look at our percentage increases, we can see similar differences between the models as before (which is logical of course). The Cobb-Douglas models yield slightly higher increases than the translog models. This results from the already greater values for $T_{2006}$ in those models. Overall we can say that , conditional on weight and power characteristics, the log of fuel economy is over 0.485 greater in 2006, compared to 1980. At the mean fuel economy in 1980, our models imply a 58 percent increase in fuel economy could have been possible. This, in contrast to the 18 percent "real increase" we concluded in exercise 1, is a pretty big difference.

This exercise refers to page 3382 and 3384 of the paper.

Exercise 7: Conclusion

Before we are coming to a conclusion for this problem set, let's see what you have been awarded for in this problem set.

To see which awards you achieved during this problem set, click check for one last time. In case you got all the awards, there should be 10 awards shown.

awards(as.html=TRUE)

As a conclusion, we can say that, after analyzing the given data we are able to estimate the trade-offs that consumers and manufacturers face when choosing between fuel economy, vehicle size, and vehicle power. The estimated trade-offs between weight and fuel economy suggests that, fuel economy increases by over 4 percent for every 10 percent reduction of weight. On average, fuel economy increases by 2.7 percent for every 10 percent reduction of horsepower. However, the effect of torque is less precisely estimated. We are also able to estimate the technological advances that occurred over these dimensions from 1980 to 2006. As a consequence fuel economy would have been nearly 60 percent higher in 2006 com-pared to the 1980 level, if we had kept vehicle size and power constant at their 1980 levels.

We could also use our results to potentially estimate how fuel economy could look in the fu-ture.

Let us look back to exercise 2. There we found out that there was a positive correlation be-tween CAFE standards and fuel economy. But would we be able to achieve the increased fuel economy results we estimated, if CAFE standards had been adapted constantly?

In order to answer this, we should look at this question from another perspective. Let's as-sume we were a customer and would like to buy a new car. Would you choose a new car that has the same characteristics as your old car, if there had only been improvements on the fuel economy side since the time you had bought your old car? Since most customers buy new cars with the intention to dispose of more horsepower, better torque and acceleration and so on, it is quite obvious that such a person would not buy a new car that lacks better engine characteristics.

Therefore the incentive to buy a new car is lost, and sales numbers would decrease consid-erably.

So what could be a solution?

One way of maintaining incentives for manufacturers in order to increase fuel economy would be to convince customers of the importance of fuel economy when purchasing a new car. If consumers value fuel economy over other characteristics such as for example horse-power, manufactures will have to value fuel economy stronger. As we have seen, CAFE standards are a good way for policy makers to ensure a minimum of fuel economy require-ments for vehicles. But if the standards are too strict whereas customers want other charac-teristics, more and more manufacturers are willing to pay the fine in order to sell their cars.

We also have to note that all our results are based on an approach from an economics' per-spective. As a result, we regard the functional relationship within a car or an engine as a "black-box". Approaching this question from an engineer's perspective, we should probably take other models into consideration.

Exercise 8: References

Bibliography

R and packages in R

Websites

Licence

Author: Marius Breitmayer

Creative Commons Lizenzvertrag
Dieses Werk ist lizenziert unter einer Creative Commons Namensnennung - Nicht-kommerziell - Weitergabe unter gleichen Bedingungen 4.0 International Lizenz.



MariusBreitmayer/RTutorAttributeTradeOffs documentation built on May 7, 2019, 2:53 p.m.