For these exercises we will use the us_states and us_states_df datasets from the spData package. You must have attached the package, and other packages used in the attribute operations chapter (sf, dplyr, terra) with commands such as library(spData) before attempting these exercises:

library(sf)
library(dplyr)
library(terra)
library(spData)
data(us_states)
data(us_states_df)

us_states is a spatial object (of class sf), containing geometry and a few attributes (including name, region, area, and population) of states within the contiguous United States. us_states_df is a data frame (of class data.frame) containing the name and additional variables (including median income and poverty level, for the years 2010 and 2015) of US states, including Alaska, Hawaii and Puerto Rico. The data comes from the United States Census Bureau, and is documented in ?us_states and ?us_states_df.

E1. Create a new object called us_states_name that contains only the NAME column from the us_states object using either base R ([) or tidyverse (select()) syntax. What is the class of the new object and what makes it geographic?

us_states_name = us_states["NAME"]
class(us_states_name)
attributes(us_states_name)
attributes(us_states_name$geometry)

``{asis 03-ex-e1-asis} - It is of classsfanddata.frame: it has 2 classes. - It is thesfclass that makes in geographic. - More specifically it is the attributes of the object (sf_column) and the geometry column (such asbbox,crs`) that make it geographic.

E2. Select columns from the `us_states` object which contain population data.
Obtain the same result using a different command (bonus: try to find three ways of obtaining the same result).
Hint: try to use helper functions, such as `contains` or `matches` from **dplyr** (see `?contains`).

```r
us_states |> select(total_pop_10, total_pop_15)

# or
us_states |> select(starts_with("total_pop"))

# or
us_states |> select(contains("total_pop"))

# or
us_states |> select(matches("tal_p"))

E3. Find all states with the following characteristics (bonus: find and plot them):

us_states |> 
  filter(REGION == "Midwest")

us_states |> filter(REGION == "West", AREA < units::set_units(250000, km^2), total_pop_15 > 5000000)
# or
us_states |> filter(REGION == "West", as.numeric(AREA) < 250000, total_pop_15 > 5000000)

us_states |> filter(REGION == "South", AREA > units::set_units(150000, km^2), total_pop_15 > 7000000)
# or
us_states |> filter(REGION == "South", as.numeric(AREA) > 150000, total_pop_15 > 7000000)

E4. What was the total population in 2015 in the us_states dataset? What was the minimum and maximum total population in 2015?

us_states |> summarize(total_pop = sum(total_pop_15),
                        min_pop = min(total_pop_15),
                        max_pop = max(total_pop_15))

E5. How many states are there in each region?

us_states |>
  group_by(REGION) |>
  summarize(nr_of_states = n())

E6. What was the minimum and maximum total population in 2015 in each region? What was the total population in 2015 in each region?

us_states |>
  group_by(REGION) |>
  summarize(min_pop = min(total_pop_15),
            max_pop = max(total_pop_15),
            tot_pop = sum(total_pop_15))

E7. Add variables from us_states_df to us_states, and create a new object called us_states_stats. What function did you use and why? Which variable is the key in both datasets? What is the class of the new object?

us_states_stats = us_states |>
  left_join(us_states_df, by = c("NAME" = "state"))
class(us_states_stats)

E8. us_states_df has two more rows than us_states. How can you find them? (Hint: try to use the dplyr::anti_join() function.)

us_states_df |>
  anti_join(st_drop_geometry(us_states), by = c("state" = "NAME"))

E9. What was the population density in 2015 in each state? What was the population density in 2010 in each state?

us_states2 = us_states |>
  mutate(pop_dens_15 = total_pop_15/AREA,
         pop_dens_10 = total_pop_10/AREA)

E10. How much has population density changed between 2010 and 2015 in each state? Calculate the change in percentages and map them.

us_popdens_change = us_states2 |>
  mutate(pop_dens_diff_10_15 = pop_dens_15 - pop_dens_10,
         pop_dens_diff_10_15p = (pop_dens_diff_10_15/pop_dens_10) * 100)
plot(us_popdens_change["pop_dens_diff_10_15p"])

E11. Change the columns' names in us_states to lowercase. (Hint: helper functions - tolower() and colnames() may help.)

us_states %>%
  setNames(tolower(colnames(.)))

E12. Using us_states and us_states_df create a new object called us_states_sel. The new object should have only two variables: median_income_15 and geometry. Change the name of the median_income_15 column to Income.

us_states_sel = us_states |>
  left_join(us_states_df, by = c("NAME" = "state")) |>
  select(Income = median_income_15)

E13. Calculate the change in the number of residents living below the poverty level between 2010 and 2015 for each state. (Hint: See ?us_states_df for documentation on the poverty level columns.) Bonus: Calculate the change in the percentage of residents living below the poverty level in each state.

us_pov_change = us_states |>
  left_join(us_states_df, by = c("NAME" = "state")) |>
  mutate(pov_change = poverty_level_15 - poverty_level_10)

# Bonus
us_pov_pct_change = us_states |>
  left_join(us_states_df, by = c("NAME" = "state")) |>
  mutate(pov_pct_10 = (poverty_level_10 / total_pop_10) * 100, 
         pov_pct_15 = (poverty_level_15 / total_pop_15) * 100) |>
  mutate(pov_pct_change = pov_pct_15 - pov_pct_10)

E14. What was the minimum, average and maximum state's number of people living below the poverty line in 2015 for each region? Bonus: What is the region with the largest increase in people living below the poverty line?

us_pov_change_reg = us_pov_change |>
  group_by(REGION) |>
  summarize(min_state_pov_15 = min(poverty_level_15),
            mean_state_pov_15 = mean(poverty_level_15),
            max_state_pov_15 = max(poverty_level_15))

# Bonus
us_pov_change |>
  group_by(REGION) |>
  summarize(region_pov_change = sum(pov_change)) |>
  filter(region_pov_change == max(region_pov_change)) |>
  pull(REGION) |>
  as.character()

E15. Create a raster from scratch, with nine rows and columns and a resolution of 0.5 decimal degrees (WGS84). Fill it with random numbers. Extract the values of the four corner cells.

r = rast(nrow = 9, ncol = 9, res = 0.5,
         xmin = 0, xmax = 4.5, ymin = 0, ymax = 4.5,
         vals = rnorm(81))
# using cell IDs
r[c(1, 9, 81 - 9 + 1, 81)]
r[c(1, nrow(r)), c(1, ncol(r))]

E16. What is the most common class of our example raster grain?

grain = rast(system.file("raster/grain.tif", package = "spData"))
freq(grain) |> 
  arrange(-count )# the most common classes are silt and sand (13 cells)

E17. Plot the histogram and the boxplot of the dem.tif file from the spDataLarge package (system.file("raster/dem.tif", package = "spDataLarge")).

dem = rast(system.file("raster/dem.tif", package = "spDataLarge"))
hist(dem)
boxplot(dem)

# we can also use ggplot2 after converting SpatRaster to a data frame
library(ggplot2)
ggplot(as.data.frame(dem), aes(dem)) + geom_histogram()
ggplot(as.data.frame(dem), aes(dem)) + geom_boxplot()


Robinlovelace/geocompr documentation built on June 14, 2025, 1:21 p.m.