These are tests to figure out how to include a full time zone map in lutz that is small enough to submit to CRAN (<5MB), but still provides a very high degree of accuracy. Simplify the full time zone map at various levels and compare accuracy vs size. Performance is a secondary consideration.

library(lutz)
library(sf)
library(rmapshaper)
library(purrr)

## Get the full time zone geojson from https://github.com/evansiroky/timezone-boundary-builder
download.file("https://github.com/evansiroky/timezone-boundary-builder/releases/download/2018d/timezones-with-oceans.geojson.zip",
              destfile = "tz.zip")
unzip("tz.zip", exdir = ".")
tz_full <- read_sf("dist/combined-with-oceans.json")

A function that takes a vector of values to pass to the keep argument in rmapshaper::ms_simplify(), an sf POINTS object, and the full unmodified time zone map. It will simplify the map using each value of keep and get time zone values from the simplified map, outputting size, time, accuracy etc.

compare_tz_ver <- function(keep, ll, tz_full) {
  ref_ll_tz <- sf::st_join(ll, tz_full)
  map_df(keep, ~ {
    tz_simp <- rmapshaper::ms_simplify(tz_full, keep = .x, keep_shapes = TRUE,
                                            explode = TRUE, sys = TRUE)
    size <- object.size(tz_simp)
    tmp <- tempfile(fileext = ".rda")
    on.exit(unlink(tmp))
    save(tz_simp, file = tmp, compress = "xz")
    file_size <- file.info(tmp)$size
    timing <- system.time(test_ll_tz <- sf::st_join(ll_sf, tz_simp))
    comp <- ref_ll_tz$tzid == test_ll_tz$tzid
    matches <- sum(comp, na.rm = TRUE)
    mismatches <- sum(!comp, na.rm = TRUE)
    list(keep_arg = .x,
         obj_size = size,
         compressed_size = file_size,
         time = timing["elapsed"],
         matches = matches,
         mismatches = mismatches,
         accuracy = matches / (matches + mismatches),
         ref_nas = sum(is.na(ref_ll_tz$tzid)),
         simp_nas = sum(is.na(test_ll_tz$tzid)))
  })
}

Make an sf points object of n points randomly distributed around the globe. The time zone file only has land, so points in the oceans don't get evaluated.

set.seed(42)
n <- 500000
ll <- data.frame(lat = runif(n, -90, 90), lon = runif(n, -180, 180))
ll_sf <- st_as_sf(ll, coords = c("lon", "lat"), crs = 4326)

tests <- compare_tz_ver(keep = c(0.001, 0.01, 0.05, 0.1, 0.15, 0.2),
                        ll = ll_sf, tz_full = tz_full)

We are looking for a high accuracy but where compressed file size is under 5MB so can still submit to CRAN

knitr::kable(tests)
readr::write_csv(tests, "tests.csv")


ateucher/lutz documentation built on Oct. 17, 2023, 8:46 p.m.