I have an Excel file with data compiled from a few different sources about energy use and emissions for world countries.

The name of the file is Energy and Emissions.xlsx and it contains two sheets, as follows:

To make it easy to share this dataset I will read it, convert it to an .RData file, and then document it so that it can be bundled with package packr.

Load the packages needed to preprocess the data:

library(tidyverse)
library(readxl)
library(countrycode)

Read the Excel file as a dataframe:

energy_and_emissions <- read_excel("Energy and Emissions.xlsx")

The dataframe consists of 188 observations of 10 variables. Get a summary of the dataframe:

summary(energy_and_emissions)

The names of the countries, the year for the population estimates, and the year for the energy estimates are character, so will change to factor:

energy_and_emissions <- energy_and_emissions %>%
  mutate(Country = factor(Country), PYear = factor(PYear), EYear = factor(EYear))

New summary of the dataframe:

summary(energy_and_emissions)

The sources of these data are as follows:

In addition, I want to append the continent to the countries. I will use the dataframe codelist from the package countrycode. Notice that country names in energy_and_emissions are in upper case, so I will change that from codelist:

country_continent <- codelist %>% transmute(Country = toupper(country.name.en), Continent = factor(continent))

Join the continent information to the table:

energy_and_emissions <- left_join(energy_and_emissions, country_continent, by = "Country")

The data defintions are as follows:

Make sure that there are no missing data:

summary(energy_and_emissions)

There are 15 NAs in continent. Wonder what those are...

Some countries the names don't match or are not in the country code table, apparently. Fill manually:

energy_and_emissions <- energy_and_emissions %>%
  mutate(Continent = case_when(Country == "ANTIGUA AND BARBUDA" ~ "Americas",
                               Country == "BOSNIA AND HERZEGOVINA" ~ "Europe",
                               Country == "CABO VERDE" ~ "Africa",
                               Country == "CONGO, DEMOCRATIC REPUBLIC OF THE" ~ "Africa",
                               Country == "CONGO, REPUBLIC OF THE" ~ "Africa",
                               Country == "COTE D'IVOIRE" ~ "Africa",
                               Country == "GAMBIA, THE" ~ "Africa",
                               Country == "KOREA, NORTH" ~ "Asia",
                               Country == "KOREA, SOUTH" ~ "Asia",
                               Country == "SAINT KITTS AND NEVIS" ~ "Americas",
                               Country == "SAINT LUCIA" ~ "Americas",
                               Country == "SAINT VINCENT AND THE GRENADINES" ~ "Americas",
                               Country == "SAO TOME AND PRINCIPE" ~ "Africa",
                               Country == "TRINIDAD AND TOBAGO" ~ "Americas",
                               Country == "TURKS AND CAICOS ISLANDS" ~ "Americas",
                               TRUE ~ as.character(Continent)),
         Continent = factor(Continent))

Save to data folder:

usethis::use_data(energy_and_emissions,
                  overwrite = TRUE)


paezha/packr documentation built on Oct. 25, 2024, 8:16 p.m.