Model to build the Ports database

Ricardo Lemos, Feb 11, 2018

TL;DR

Suppose you have an array of coordinates that spans the globe. You may want to cluster them based on proximity (e.g. within 4 km), then define an enveloping polygon for each cluster (i.e., a convex hull). You may also want to give each polygon a name based on the nearest city, or some other type of structure nearby (e.g. a marine port). Finally, you may want to compute summary statistics for each polygon, based on the characteristics of the points you provided. This package does all that for a particular use case, but hopefully is flexible enough to accommodate other needs.

1. Introduction

Package portsModel was built during the 2018 Fishackathon, to address Challenge #9 ("Ports"), posted by Global Fishing Watch (henceforth, GFW). In short, we were challenged to

2. Approach

2.1. EarthGeo

EarthGeo is a handy object that helps us know how many cells we are going to need to cover the globe with a grid whose cells have 100 m in meridional (i.e., latitudinal) resolution, and also 100 m zonal (longitudinal) resolution at the equator. As we go poleward, the zonal resolution increases, meaning that we have to traverse more cells to cover the same 100 m. For convenience, we place an upper limit of 200 on the number of cells traversed to cover 4 km. This should not pose issues, since few ports exist beyond parallel 66 - the Arctic and Antarctic circles.

library(portsModel)
earthGeo = earthGeoClass()
plot(earthGeo$lat, mapply(earthGeo$lat, FUN = earthGeo$deltaLon), xlab = 'latitude', ylab = 'nTraverse', xaxp  = c(-90, 90, 6))

2.2. Placing anchorages, world ports, and cities in large, sparse matrices

Here we employ unnamed_anchorages_20171120, which is a dataset of anchorages provided by GFW for the 2018 Fishackathon. This dataset contains the latitudes, longitudes, and Google-S2 ID code of 102,974 anchorages, defined by GFW as sites occupied by more than 20 different vessels for more than 48 hours. We also use a database of ports - the World Port Index -, which is a tabular listing of thousands of ports throughout the world, describing their location, characteristics, known facilities, and available services. Lastly, we use cities1000.zip, which is a GeoNames Gazetteer dataset that provides all cities in the world with more that 1000 inhabitants.

With the object anchorages, we place each anchorage in one cell of a sparse matrix with ~200k rows and ~400k columns that represents the globe, on a ~0.001 degree resolution. This procedure results in the loss of 87 anchorages (0.08%) due to superposition, a number that we find acceptable.

anchorages = points2MatrixClass(earthGeo, points = unnamed_anchorages_20171120, latName = 'lat', lonName = 'lon', id = 's2id', country = NULL, verbose = FALSE)
wpi = points2MatrixClass(earthGeo, wpiDB, 'LATITUDE', 'LONGITUDE', 'PORT_NAME', 'COUNTRY', FALSE)
cities1000 = points2MatrixClass(earthGeo, citiesDB, 'latitude', 'longitude', 'name', 'country', FALSE)

2.3. Clustering

Now that the anchorages are gridded, we can employ a clustering algorithm: two anchorages belong to the same port if they are within 4 km in latitude and 4 km in longitude of each other. The algorithm may take a few minutes to run. Close to 7 thousand clusters are identified.

clusters = clustersClass(earthGeo, anchorages, autobuild = TRUE, verbose = FALSE)
clusters$nClust

2.4. Polygons

Let us now take each cluster and construct a port. A "port" is going to be a convex hull, or polygon, that encompasses all the anchorages in a cluster. This definition doesn't give us a real, physical port, but it is our best approximation based on AIS data alone. Below we plot the centroids of the ~7k ports, as well as the anchorages and convex hull of port #1.

polygons = polygonsClass(clusters, verbose = FALSE)
plot(polygons$centroids)
plot(polygons$allPolygons[[1]]$x, polygons$allPolygons[[1]]$y, type = 'b', xlab = 'lon', ylab = 'lat')

2.5. Ports

Now we need to associate each polygon with a WPI port name and a city name. Also, to characterize the types of vessels that use a port, we employ the dataset anchTypes, provided by GFW during Fishackathon 2018. The method getData lets us put all the relevant output produced up to here in a convenient list, for further analysis and plotting (see package shinyPorts).

ports = portsClass(polygons, wpi, wpiDB, cities1000, citiesDB, anchTypes, verbose = FALSE)
ports$getData()


rtlemos/portsModel documentation built on May 17, 2019, 11:16 p.m.