knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(tidyrgee) library(rgee) library(dplyr) ee_Initialize()
In addition to building nice dplyresque/tidy syntax wrappers around rgee/GEE
functions. We have decided to explore the possibility of introducing a new framework which includes a new class object: "tidyee".
To use this framework your ImageCollection
or Image
has to be converted to tidyee
class using the new as_tidy_ee
function as shown below:
modis_ic <- ee$ImageCollection("MODIS/006/MOD13Q1") modis_ic_tidy <- as_tidyee(modis_ic)
As you can see below the new object (modis_ic_tidy
) is a named list (of class "tidyee") containing :
ee_ob
: the original ee_object (in this case ImageCollection
)vrt
: virtual table holding key properties of the original ee_objectmodis_ic_tidy$ee_ob modis_ic_tidy$vrt
The virtual table (vrt
) data.frame allows us to leverage all the power and functionality of dplyr to filter, mutate,group, etc. An S3 class method filter.tidyee
has been written which, essentially, first filters the vrt
based on conditions supplied to the filter
argument and then uses the filtered data.frame (vrt) to filter/subset the ImageCollection
. The vrt comes with several pre-defined columns useful for filtering(date, year, month), but mutate
can be used to add any new columns/categories for filtering.
Below is an example using months to filter
# library(tidyverse) # library(tidyrgee) # filter |> debugonce() modis_march_april <- modis_ic_tidy |> filter(month %in% c(3,4))
DEMO of Slice slice respects R's 1-based indexing rather than GEE 0-based indexing
modis_sliced <- modis_ic_tidy |> slice(1:2) # modis_sliced$ee_ob$size()$getInfo()
Now we show an example of mutating a new category and then filtering the tidyee
by that column
modis_filt_growing_season <- modis_ic_tidy |> mutate(crop_cycle= case_when( month %in% c(4,5)~ "land prep", month %in% c(6,7)~ "planting", month %in% c(8,9,10)~"growing", month ==11~ "harvesting", TRUE ~ "other" ) ) |> filter(crop_cycle=="planting") modis_split_crop_cycle <- modis_ic_tidy |> mutate(crop_cycle= case_when( month %in% c(4,5)~ "land prep", month %in% c(6,7)~ "planting", month %in% c(8,9,10)~"growing", month ==11~ "harvesting", TRUE ~ "other" ) ) |> group_by(crop_cycle) |> group_split() modis_split_crop_cycle[[2]]$ee_ob$aggregate_array("tidyee_index")$getInfo() modis_split_crop_cycle[[2]]$ee_ob$aggregate_array("system:time_start")$getInfo() modis_split_crop_cycle[[2]]$ee_ob$size()$getInfo() external_group_output <- modis_ic_tidy |> mutate(crop_cycle= case_when( month %in% c(4,5)~ "land prep", month %in% c(6,7)~ "planting", month %in% c(8,9,10)~"growing", month ==11~ "harvesting", TRUE ~ "other" ) ) |> group_by(crop_cycle) |> summarise( stat="mean",join_bands=F ) external_group_output$ee_ob$size()$getInfo() external_group_output_multi <- modis_ic_tidy |> mutate(crop_cycle= case_when( month %in% c(4,5)~ "land prep", month %in% c(6,7)~ "planting", month %in% c(8,9,10)~"growing", month ==11~ "harvesting", TRUE ~ "other" ) ) |> group_by(crop_cycle) |> summarise( stat=list("mean","sd","min"),join_bands=F ) external_group_output_multi_joined <- modis_ic_tidy |> mutate(crop_cycle= case_when( month %in% c(4,5)~ "land prep", month %in% c(6,7)~ "planting", month %in% c(8,9,10)~"growing", month ==11~ "harvesting", TRUE ~ "other" ) ) |> group_by(crop_cycle) |> summarise( stat=list("mean","sd") ) modis_ic_tidy |> mutate(crop_cycle= case_when( month %in% c(4,5)~ "land prep", month %in% c(6,7)~ "planting", month %in% c(8,9,10)~"growing", month ==11~ "harvesting", TRUE ~ "other" ) ) |> group_by(year,crop_cycle) |> summarise( stat=list("mean","sd") )
Here we show how you can perform pixel-level summary statistics with summarise
function. This is typically referred to as compositing
in GEE documentation as well as other remote sensing literature.
modis_mean_by_yrmo <- modis_ic_tidy |> group_by(year,month) |> summarise(stat = list("median","sd")) modis_mean_by_yrmo <- modis_ic_tidy |> select("NDVI","EVI") |> group_by(year,month) |> summarise(stat = "mean") modis_mean_by_yrmo$ee_ob$map( function(img){ ex_bnames <- img$bandNames() ex_bnames_renamed <- ex_bnames$map( rgee::ee_utils_pyfunc(function(bname){ ee$String(bname)$replace("_mean$","_m") }) ) img$select(ex_bnames,ex_bnames_renamed) } )$first()$bandNames()$getInfo() modis_mean_by_yrmo$ee_ob$first()
It's nice that you can summarise by multiple different statistics at once if you want.
modis_mean_and_sd_by_yrmo <- modis_ic_tidy |> select("NDVI") |> group_by(year,month) |> summarise(stat = list("mean","sd")) modis_mean_and_sd_by_yrmo
Next we will show how you can mutate a new category and then group by and summarise to that category. This is nice and seems to be working well. However, we still need to work out to deal with disappearing attributes after running dplyr verbs like group_split
. This does not seem to affect the results, but just the print methods. Could be a potential solution in vctrs
package. However, it might make more sense to just store band_names
as col/list-col instead of attribute.
modis_ic_tidy |> # select("NDVI") |> mutate(crop_cycle= case_when( month %in% c(4,5)~ "land prep", month %in% c(6,7)~ "planting", month %in% c(8,9,10)~"growing", month ==11~ "harvesting", TRUE ~ "other" ) ) |> group_by(crop_cycle) |> summarise( stat="mean" )
select & inner_join example
modis_monthly_baseline_mean <- modis_ic_tidy |> select("NDVI") |> filter(year %in% 2000:2015) |> group_by(month) |> summarise(stat="mean") modis_monthly_baseline_sd <- modis_ic_tidy |> select("NDVI") |> filter(year %in% 2000:2015) |> group_by(month) |> summarise(stat="sd") modis_monthly_baseline <- modis_monthly_baseline_mean |> inner_join(modis_monthly_baseline_sd, by="month") modis_monthly_baseline
point_sample_buffered <- tidyrgee::bgd_msna |> dplyr::sample_n(3) |> sf::st_as_sf(coords=c("_gps_reading_longitude", "_gps_reading_latitude"), crs=4326) |> sf::st_transform(crs=32646) |> sf::st_buffer(dist = 500) |> dplyr::select(`_uuid`) ndvi_monthly_mean_at_pt<- modis_monthly_baseline_mean |> ee_extract_tidy(y = point_sample_buffered, fun="mean", scale = 500) # just to show that it also works on imageCollection modis_monthly_baseline_ic<- modis_monthly_baseline_mean |> as_ee() modis_monthly_baseline_ic |> ee_extract_tidy(y = point_sample_buffered, fun="mean", scale = 500) # and image modis_monthly_baseline_img_first <- modis_monthly_baseline_ic$first() modis_monthly_baseline_img_first |> ee_extract_tidy(y = point_sample_buffered, fun="mean", scale = 500)
Below I list properties of this approach that could be considered potential downsides and list potential ways to circumvent or minimize these downsides.
1. The new tidyee
object reduces interoperability with rgee
Some potential ideas to improve:
a. maybe very simple functions to switch resulting tidyee
object back to ee$ImageCollection
or ee$Image
(maybe as_ic
, as_img
,as_ee
)
b.add option to make tidyee
on fly from ee$Image
/ee$ImageCollection
and then also include something like return_ic
as a logical switch which will just return ee$Image
or ee$ImageCollection
instead of tidyee
class.
So far I lean towards a and just create a as_ee
function to implement it.
modis_ic_tidy |> as_ee()
2. as_tidyee makes the process take slightly longer
a. Since as_tidyee
relies on client-side operation (primarily rgee::ee_get_date_ic
) this function requires some start-up time investment. However, I am thinking that this one-time investment will actually save time since when using the tidyee
object we will have constantly updating data.frame which is basically updated instantaneously as we filter and process the ImageCollection
. This could allow nice print methods and querying without having to perform the rgee
/client side functions like rgee::ee_print
and getInfo
repeatedly in work-flows which take just as much time as as_tidyee
every time they are run.
b. To make sure these percieved benefits are actual benefits we should: 1) include checks/assertions at the end of each process to ensure the ee_ob
and vrt
are in perfect agreement, 2) think about including more information (bands, properties) in the the print methods for tidyee
Might be worth prefixing dplyr
functions with ee_
to avoid conflicts?
modis_ndvi_baseline <- modis_ic_tidy |> select("NDVI") |> filter(year %in% c(2000:2015)) |> group_by(month) |> summarise(stat = list("mean","sd")) modis_ndvi_current <- modis_ic_tidy |> select("NDVI") |> filter(year %in% c(2022)) |> group_by(month) |> summarise(stat = "mean") modis_mean_and_sd_by_yrmo modis_monthly_baseline_current <- modis_ndvi_baseline |> inner_join(modis_ndvi_current |> select(NDVI_mean_current="NDVI"), by="month") modis_current_with_baseline <- modis_ndvi_current |> select(NDVI_mean_current="NDVI") |> inner_join(modis_ndvi_baseline, by="month")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.