knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This vignette illustrate the use of the dataset and utility functions included in the package packr
. I collected this data set initially to use in my course GEOG 3LT3: Transportation Geography. As part of this course, students examine some trends in transportation, including the use of energy and emissions. The objective of the practice is two-fold:
On the side of technology, the students are learning to work with R Notebooks and R. For this reason, all code is documented so that the students can see how things are done.
On the side of transportation geography, the students are learning to discern trends in transportation.
Load the packages used in this vignette:
library(packr)
To load the data, use the function data()
:
data("energy_and_emmisions")
To inspect the dataframe, use the function summary()
summary(energy_and_emissions)
The data frame consists of 10 variables. The variable definitions can be consulted in the help file:
?energy_and_emissions
The dataframe includes information on population, GDP per capita, energy consumption, and emissions for world countries. The consumption of energy (in barrels per day) is for the country. We can plot these two variables to see if there is a trend. We create a scatterplot with x = Population
and y = bblpd
, so that the values of population are mapped to the x-axis, and the values of energy consumption are mapped to the y-axis:
# Simple Scatterplot plot(energy_and_emissions$Population, energy_and_emissions$bblpd, main="Scatterplot Example", xlab="Population ", ylab="Barrels of oil per day ", pch=19)
Not suprisingly, there is a strong association between these two variables, since countries with big populations will consume more energy than small countries with small populations. This is not very informative, because the underlying relationship is simply size.
Instead of exploring energy consumption by population, we will look at energy consumption per capita. This is a more informative variable, because it normalizes by size, and potentially can tell us something about the intensity and/or efficiency of energy use. However, energy consumption per capita is not one of the variables in the dataset. We need to divide the variable bblpd
by Population
to add this variable to the dataframe:
energy_and_emissions$EPC <- energy_and_emissions$bblpd/energy_and_emissions$Population
Check the descriptive statistics of EPC
(energy consumption in barrels per day per person):
summary(energy_and_emissions$EPC)
The maximum consumption is approximately r round(max(energy_and_emissions$EPC), 2)
barrels per person per day. Which country is that?
energy_and_emissions[energy_and_emissions$EPC == max(energy_and_emissions$EPC), "Country"]
The country with the highest per capita oil consumption in the world according the the data is Singapore.
To answer this question, we can create a scatterplot of the two variables:
plot(energy_and_emissions$GDPPC, energy_and_emissions$EPC, main="Scatterplot Example", xlab="GDP per capita ", ylab="Energy consumption per capita (bbpd/population) ", pch=19)
Calculate the correlation between these two variables:
cor(energy_and_emissions$GDPPC, energy_and_emissions$EPC)
There is a moderately strong correlation between these two variables.
What do we learn from this analysis? And how would you extend this analysis?
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.