In pedro-andrade-inpe/colrow: Handling SimU, CR, and LU data

Introduction

This package allows users to handle CR, SimU, and LU data. In order to start, it is necessary to install the package from GitHub.

devtools::install_github("pedro-andrade-inpe/colrow")

If the package will be installed in a Linux machine, it is necessary to manually install some dependencies. Please visit the webpage of the package in GitHub for more details.

Creating shapefiles for a given country

First load the package.

require(colrow)

Download world data from Dropbox here and save them in a single directory. Create a variable in R to store this directory.

dataDir <- "c:/Users/pedro/Dropbox/colrow"

The content of the directory should have the following files:

list.files(dataDir)

To see the available countries in alphabetical order, use getCountries(), passing the data directory as argument. The code below shows the first ten countries.

colrow::getCountries(dataDir)[1:10]

Then use getLU(), getCR(), and/or getSimU() to generate geospatial data for a given country. They get as first argument the country name written in the same way as one of the outputs of getCountries(). The second argument is the directory where the data downloaded from Dropbox was stored. Some of these functions might take time to execute, according to the selected arguments.

country <- "Brazil"

myLU   <- colrow::getLU(country, dataDir)
myCR   <- colrow::getCR(country, dataDir)
mySimU <- colrow::getSimU(country, dataDir)

To see that everything is correct, plot the data.

par(mfrow = c(1, 3), mar = c(5, 0.1, 5, 0.1))

plot(sf::st_geometry(mySimU), main = "SimU (Grouped HRU in .5°x.5°)", col = "red"); box()
plot(sf::st_geometry(myCR),   main = "CR (.5°x.5°)", col = "blue"); box()
plot(sf::st_geometry(myLU),   main = "LU (2°x2°)", col ="green"); box()

Finally, save the data to be used by the processing functions. Each of these files has a column named ID that will be used to match the objects stored in CSV files. Once these files are saved, it is not necessary to create them again every time outputs are processed.

sf::write_sf(myLU, paste0(country, "LU.gpkg"))
sf::write_sf(myCR, paste0(country, "CR.gpkg"))
sf::write_sf(mySimU, paste0(country, "SimU.gpkg"))

Several countries together

The same functions above can be used to get several countries together. Just use the names of the countries as a string vector for the first argument. For example, the code below gets LU data for all nine countries of the Amazon basin.

amazon <- c("Brazil", "Peru", "Colombia", "Venezuela",
            "Ecuador", "Bolivia", "Guyana",
            "Suriname", "FrGuiana")

lu <- colrow::getLU(amazon, dataDir)

To save the file, just use write_sf():

sf::write_sf(lu, "amazonLU.gpkg")

Then we can plot them together. This and several examples of plotting in this tutorial use tmap package.

require(tmap)
sf::sf_use_s2(FALSE)

tm_shape(lu) +
  tm_fill(col = "Country") +
  tm_borders(lwd = 1, col = "black")

It is important to note that, when working with more than one neighbor countries, there can exist two or more LU (or SimU, or CR) objects with the same ID, but belonging to different countries. In order to deal with that, colrow package returns also the name of the country. For example, there are two objects LU08515, one belonging to Brazil and the other to Peru:

lu %>%
  dplyr::filter(ID == "LU08515") %>%
  as.data.frame() %>%
  dplyr::select(Country)

Processing outputs

To process a CSV file, it is necessary to choose the created shapefile that matches the representation used in the CSV. Then, it is necessary to define the attribute names, as the CSV files do not store a header. These names must have one attribute called ID (to match the geometries) and another called VALUE (with the output value). Function processFile() gets the shapefile, the CSV file, and the description of the attributes as arguments. You can use colrow::attrs() to avoid quotes.

csvfile <- system.file("extdata/scenarios/FC/Land_Compare3_FC.csv", package = "colrow")
attributes <- colrow::attrs(COUNTRY, ID, ALTI, SLP, SOIL, USE, SCENARIO, YEAR, VALUE)

result <- colrow::processFile("BrazilCR.gpkg", csvfile, attributes)

This function automatically ignores all attributes that have only one value (COUNTRY, ALTI, SLP, SOIL and SCENARIO above). The other attributes (except ID and VALUE) compose the created attribute names (USE and YEAR above). They are concatenated according to their order in the CSV file. See the names of the attributes below.

names(result)

It is possible to plot a map selecting a given attribute using tmap package:

tm_shape(result) +
  tm_fill(col = "CrpLnd2010")

It is also possible to use processFile() to save the output when processing a file by adding a fourth argument with the output file name. In this case, this function does not return anything.

colrow::processFile(
  "BrazilCR.gpkg",
  csvfile,
  attributes,
  "brazilLandCompare.gpkg"
)

Note that shapefiles have limits: 255 or less columns, ten or less characters in the attribute names. When working with large attribute names, it is necessary to simplify the names. In the example below, seven names will be replaced by shorter names. For instance, CrpLnd will be replaced by cr in the attribute names.

convert <- list(
  CrpLnd = "cr", PriFor = "pr",
  NatLnd = "nl", ForReg = "fr",
  GrsLnd = "gl", MngFor = "mf",
  PltFor = "pl" 
)

If you want to simplify the years to two characters, just run:

for(year in paste(seq(2000, 2050, 10))) # from 2000, 2010, ..., 2050
  convert[[year]] = substr(year, 3, 4) # to 00, 10, ..., 50

unlist(convert)

Finally, these simplifications can be used as fifth argument to processFile().

colrow::processFile(
  "BrazilCR.gpkg",
  csvfile,
  attributes,
  "brazilOutput.gpkg",
  convert
)

Sometimes there are more than one VALUE per unique ID, for each unique combination of attributes. In this case, it is necessary to transform all the values into one single result. As default, processFile() sums all the values, but it is possible to use any other function passed as argument, such as mean(). Note that sum() is used for absolute values and mean() for averages. To use another function, use argument aggregate, passing a function as value, as shown in the example below.

result <- colrow::processFile(
  "BrazilCR.gpkg",
  system.file("extdata/csv/YIELD_COMPARE2.CSV", package = "colrow"),
  colrow::attrs(ID, CROP, ScenYear, VALUE),
  aggregate = sum # default value, could be omitted
)

Plotting outputs

To plot the outputs created above, first we read the Brazilian Biomes to be drawn on top of the ColRows. The shapefile with this data is stored in file br_biomes.gpkg available within colrow package. We will use CR data from brazilLandCompare.gpkg, created previously in this tutorial.

brazil <- sf::read_sf("brazilLandCompare.gpkg")
biomes <- system.file("extdata/shape", "br_biomes.shp", package = "colrow") %>% sf::read_sf()

The next step is to select a palette from the functions available in RColorBrewer, using brewer.pal() (see the description of all colors available here), and to choose the cuts to represent the intervals of each slice. In the code below, as we have seven cuts, we need to define six colors.

max(brazil$GrsLnd2010)

cuts <- c(0, 50, 100, 150, 200, 250, 305)
rdPu <- RColorBrewer::brewer.pal(6, "RdPu")

Finally, use tm_shape(), tm_fill(), and tm_borders() to draw the map.

tm_shape(brazil) +
  tm_fill(col = "GrsLnd2010", palette = rdPu, breaks = cuts, title = "Grass 2010") +
  tm_shape(biomes) +
  tm_borders(lwd = 1, col = "black")

If only part of the map is necessary, it is possible to zoom in the map to a given region by selecting a bbox to the tm_shape():

amazCerradoBox <- c(xmin = -74.5, xmax = -41.4, ymin = -25, ymax = 5.5) %>%
  sf::st_bbox(crs = st_crs(4326))

tm_shape(brazil, bbox = amazCerradoBox) +
  tm_fill(col = "GrsLnd2010", palette = rdPu, title = "Grass 2010") +
  tm_shape(biomes) +
  tm_borders(lwd = 1, col = "black")

To save this or any other plot created within R into a file using tmap package, it is necessary to store the plot into a variable and then call tmap_save(). For example, the following code creates variable result storing the same map created above and then saves it into file amz-cerrado.png in the current directory.

result <- tm_shape(brazil, bbox = amazCerradoBox) +
  tm_fill(col = "GrsLnd2010", palette = rdPu, title = "Grass 2010") +
  tm_shape(biomes) +
  tm_borders(lwd = 1, col = "black")

tmap_save(result, "amaz-cerrado.png")

It is also possible to create several maps using plotAll(). This function automatically uses all the values of several attributes with the same prefix in order to define a common legend, with the same maximum and minimum values. The following code creates six pdf files, each one plotting grassland, from 2000 to 2050.

require(tmap)

plotAll(brazil, "GrsLnd", palette = "RdPu", title = "Grass", additional =     
  tm_shape(biomes) +
  tm_borders(lwd = 1, col = "black")
)

Handling CSV files

You can handle CSV files directly to create charts or to compute values without creating shapefiles. First read data as a tibble using readCSV():

data <- colrow::readCSV(
  system.file("extdata/scenarios/FC/Land_Compare3_FC.csv", package = "colrow"),
  colrow::attrs(COUNTRY, ID, ALTICLASS, SLPCLASS, SOILCLASS, USE, SCENARIO, YEAR, VALUE)
)

To get the unique values per class, just use unique():

unique(data$USE)
unique(data$YEAR)

To check if the sum of all land use areas are the same for each year use some dplyr functions:

result <- data %>%
  dplyr::group_by(YEAR) %>%
  dplyr::summarise(VALUE = sum(VALUE))

result

Then let us now compute the total of each land use area per year and rename the attributes in order to use them directly in the plot.

result <- data %>%
  dplyr::group_by(YEAR, USE) %>%
  dplyr::summarise(VALUE = sum(VALUE)) %>%  # sum by year/use
  dplyr::mutate(Year = YEAR, Use = USE) %>% # rename variables
  dplyr::mutate(Use = dplyr::recode(Use,    # rename attributes
    CrpLnd = "Crop Land",
    ForReg = "Forest Regrowth",
    GrsLnd = "Grass Land",
    MngFor = "Managed Forest",
    NatLnd = "Natural Land",
    PltFor = "Planted Forest",
    PriFor = "Primary Forest"))

result

Finally, just use ggplot() to draw the evolution of each land use area per year.

require(ggplot2)

ggplot(result) +
  aes(x = Year, y = VALUE, colour = Use) +
  geom_line(lwd = 1.5) +
  theme_bw() +
  ylab("Total (ha)")

Subsetting data for Study Area

Sometimes the user needs to use only the results of a given study area. Let us suppose that this area is Amazon biome. We can simply use sf::st_intersects() to subset the objects.

amaz <- biomes[1,] %>% sf::st_transform(crs = sf::st_crs(myLU))
subset <- myLU[apply(sf::st_intersects(myLU, amaz), 1, any),]

result <- tm_shape(subset) +
  tm_borders(lwd = 1, col = "black") +
  tm_fill(col = "blue") +
  tm_shape(amaz) +
  tm_borders(lwd = 2, col = "black")

result