knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
The goal of midlines
is to estimate the midline of one or more polygons, taking inputs in sf
formats.
midlines
wraps around several functions from other packages, predominantly sf
. It's development has been a learning experience for the author, but hopefully some users will find it useful. Comments, suggestions and contributions are welcome.
You can install the current version of the package from GitHub with:
# install.packages("devtools") # library(devtools) devtools::install_github("RichardPatterson/midlines")
N.B. to install midlines from GitHub, you will need to install and attach the devtools package if you have not already done so.
The package contains an example polygon representing a short section of the River Thames in central London. This was derived from OpenStreetMap data, downloaded using the OSMdata
package and as such, copyright is retained by OpenStreetMap contributors.
#par(mar = c(4, 4, 0.1, 0.1)) par(mar = c(0,0,0,0)) knitr::opts_knit$set(global.device = TRUE)
# Load libraries library(midlines) library(sf) library(units) plot(thames$geometry)
To estimate the midline of this stretch of the Thames, use the midline_draw
function.
# create a sf collection for the midline m1 = midlines_draw(thames) # plot midline with a different colour for each segment (line_id) plot(m1$geometry, col = m1$line_id, add = TRUE)
We can see the midline has been drawn but there are many unwanted short side branches, especially where the banks of the river are not smooth. To understand why, we need to know what midline_draw
has done. midlines_draw
used the sf
function st_voronoi
to perform an Voronoi
tessellation and then removed all line segments not entirely within the polygon. midlines
can help remove these unwanted side branches, but it is possible you can prevent their creation (see sections on border lines gaps and zig-zagging).
To remove the unwanted branches, the line segments at the end of a line can be identified using a binary variable (removed_flag
) added to the dataset with the midlines_clean
function. The flagged line segments can be explored and removed as required. Specifying the n_removed
option will flag that number of line segments from the end of each line. Previous experimentation indicated that the unwanted branches in this example are made up of ≤ 4 line segments.
knitr::opts_knit$set(global.device = FALSE)
# remove 4 line segments from the end of each line m2 = midlines_clean(m1, n_removed = 4) # plot flagged (red) and unflagged (blue) line segments plot(m2$geometry, col = c("BLUE", "RED")[m2$removed_flag]) # plot only the cleaned midline plot(m2$geometry[m2$removed_flag==0], col = "BLUE")
In addition to the unwanted side branches, midlines_clean
flagged for removal the line segments at the end of the midline itself. If losing the ends of the midline is a problem, midlines_check
can unflag some line segments.
As the unwanted side branches are short, we can unflag longer groups of flagged line segments. We know that the unwanted side branches contain ≤ 4 line segments, whereas the midlines itself is much longer. Therefore, if we remove up to 5 line segments, we know that any removed lines of 5 or more line segments are not unwanted side chains but part of the midline. midlines_check
takes as an input the output from midlines_clean
and adds another binary variable (removed_flag2
), which should flag the unwanted side chains but not the ends of the midlines.
m2 = midlines_clean(m1, n_removed = 5) m3 = midlines_check(m2, n_removed = 5) # plot initially flagged in red and remainder in blue plot(m3$geometry, col = c("BLUE", "RED")[m3$removed_flag2]) # overlay those initially flagged but identifed as having 5+ segments in green plot(m3$geometry[m3$removed_flag==1 & m3$removed_flag2==0], col = "GREEN", add = TRUE)
The ends of the midline are restored (green) but the unwanted side branches are still flagged for removal (red). In addition to adding back longer lines based on the number of line segments, the length
option can be used to specify a length (in Units). In this example, specifying length = set_units(230,"m")
gives the same results as n_removed = 5
There is a trade off between removing all unwanted side branches and retaining the desired midlines. Some experimentation might be required to get the desired results. In this case we've unflagged the forked midline on the right end of our river segment and also some small bits of unconnected line (although see removing unwanted small bits of midlines.
If there is a specific area of interest, this can be used to help control the estimation and cleaning of the midlines. This works best when the polygon used for estimation is larger than the area of actual interest. The border of the area of interest is specified as an sf linestring, as in this example.
# generate a linestring marking the border bbox_line = st_cast(st_as_sfc(st_bbox(c(xmin = 535070, ymin = 177800, xmax = 542560, ymax = 181550), crs = st_crs(thames))), "LINESTRING") m1 = midlines_draw(thames, border_line = bbox_line) plot(thames$geometry) plot(m1$geometry, col = "BLUE", add = TRUE) plot(bbox_line, add = TRUE)
Specifying the boarder_line option in midlines_clean
and midlines_check
can also ensure the line ends that contact the boarder line are not flagged for removal.
m2 = midlines_clean(m1, n_removed = 5) m3 = midlines_check(m2, border_line = bbox_line) plot(m3$geometry[m3$removed_flag2 == 0], col = "BLUE"[m3$removed_flag2]) plot(bbox_line, add = TRUE)
It is also possible to specify the border_line option with midlines_draw and midlines_clean, which crops out midline segments outside the area of interest at the outset and safes time by not flagging and unflagging those lines.
So far we've ignored the side branches north of the the Thames that join the main channel. Attempts to estimate the midline of the left channel has been entirely unsuccessful and the right channel midline has a marked zig-zag pattern. The zig-zagging section below covers that issue but here we look more closely at the lack of any midline in the left of the two side branches.
# generate a polygon to focus on the left side channel(s) bbox_poly = st_as_sfc(st_bbox(c(xmin = 536070, ymin = 180700, xmax = 537800, ymax = 181850), crs = st_crs(thames))) # a slightly smaller area to use as a border line bbox_line_s = st_cast(st_as_sfc(st_bbox(c(xmin = 536120, ymin = 180760, xmax = 537750, ymax = 181800), crs = st_crs(thames))),"LINESTRING") plot(thames$geometry) plot(bbox_poly, add = TRUE)
# crop the larger thames polygon to the area we want side1 = st_intersection(thames, bbox_poly) plot(side1$geometry) plot(bbox_line_s, add = TRUE) # estimating midline of the original polygon m_s1 = midlines_draw(side1, border_line = bbox_line_s) plot(m_s1$geometry, add = TRUE, col = "RED")
The lack of lines is due to the narrowness of the channel, or more precisely, the gaps between points in the polygon perimeter relative to the width of the polygon. The gap between points that was sufficient for the wider main River Thames is insufficient to allow successful midline estimation in the narrow channel. The midlines_draw
function has a dfMaxLength
option specify the maximum distance between points and to add additional points as required. This argument is passed to sf::st_segmentize so see that help file for more details.
# some trial and error revealed 8 meters to be the optimal max length between points plot(side1$geometry) plot(bbox_line_s, add = TRUE) m_s1 = midlines_draw(side1, dfMaxLength = set_units(8,"m"), border_line = bbox_line_s) #plot(ml2$geometry, add = TRUE, col = "BLUE") # using the border_line to prevent removal of lines of interest m_s2 = midlines_clean(m_s1, n_removed = 20, border_line = bbox_line_s) plot(m_s2$geometry, col = c("BLUE", "RED")[m_s2$removed_flag], add = TRUE)
Conversely, if the points on the perimeter are too dense and result in a very complicated tessellation or slow computation times, then sf::st_line_sample might also be useful.
A zig-zagging midline results from the offset between points on either side of the polygon when the Voronoi tessellation is done. To illustrate this, there are a series of midlines drawn with very simple oblong polygons.
mat1 = matrix(c(0,0,2,0,4,0,6,0,8,0,10,0,12,0,12,2,11,2,9,2,7,2,5,2,3,2,1,2,0,2,0,0),ncol=2, byrow=TRUE) pts1 = st_multipoint(mat1) pol1 = st_polygon(list(mat1)) ml1 = midlines_draw(pol1) plot(pol1) plot(pts1, add = TRUE) plot(ml1$geometry, col = ml1$line_id, add = TRUE)
When the points on the polygon perimeter are offset, this results in the zig-zag pattern. One way to resolve this is to ensure that points are directly opposite one another on the polygon.
mat2 = matrix(c(0,0,2,0,4,0,6,0,8,0,10,0,12,0,12,2,12,2,10,2,8,2,6,2,4,2,2,2,0,2,0,0),ncol=2, byrow=TRUE) pts2 = st_multipoint(mat2) pol2 = st_polygon(list(mat2)) ml2 = midlines_draw(pol2) plot(pol2) plot(pts2, add = TRUE) plot(ml2$geometry, col = ml2$line_id, add = TRUE)
However, its not always possible to control the points that make up the polygons of interest. Another option is to reduce the gap between the points so even though they are offset the zig-zag is less pronounced.
mat3 = matrix(c(0,0,1,0,2,0,3,0,4,0,5,0,6,0,7,0,8,0,9,0,10,0,11,0,12,0, 12,2,11.5,2,10.5,2,9.5,2,8.5,2,7.5,2,6.5,2,5.5,2,4.5,2,3.5,2,2.5,2,1.5,2,0.5,2,0,2,0,0),ncol=2, byrow=TRUE) pts3 = st_multipoint(mat3) pol3 = st_polygon(list(mat3)) ml3 = midlines_draw(pol3) plot(pol3) plot(pts3, add = TRUE) plot(ml3$geometry, col = ml3$line_id, add = TRUE)
Returning to the river example and looking more closely at the zig-zag midline of the right side channel.
# generate a polygon to focus on the right side channel(s) bbox_poly = st_as_sfc(st_bbox(c(xmin = 538700, ymin = 180750, xmax = 539800, ymax = 181850), crs = st_crs(thames))) # a slightly smaller area to use as a border line bbox_line_s = st_cast(st_as_sfc(st_bbox(c(xmin = 538750, ymin = 180800, xmax = 539750, ymax = 181800), crs = st_crs(thames))),"LINESTRING") plot(thames$geometry) plot(bbox_poly, add = TRUE) # crop the larger thames polygon to the area we want side2 = st_intersection(thames,bbox_poly) plot(side2$geometry) plot(bbox_line_s, add = TRUE) # estimating midline of the original polygon m_s1 = midlines_draw(side2, border_line = bbox_line_s) plot(ml1$geometry, add = TRUE, col = "RED")
One option would be to use the dfMaxLength option with midlines_draw like we did on the left side channel.
plot(side2$geometry) m_s1a = midlines_draw(side2, border_line = bbox_line_s, dfMaxLength = set_units(8,"m")) # clean this to remove extraneous lines m_s2a = midlines_clean(m_s1a, border_line = bbox_line_s, n_removed = 5) plot(m_s2a$geometry[m_s2a$removed_flag==0], add = TRUE, col = "RED") #plot(ml2$geometry, add = TRUE, col = "RED")
An alternative is to smooth the midline using a rolling average - basically a wrapper round the rollapply function in the excellent Zoo package.
# smooth with a rolling average m_s2b = midlines_smooth(m_s1) plot(side2$geometry) plot(m_s2b$geometry, add = TRUE, col = "BLUE")
midlines_smooth
will smooth lines in a network of midlines but it does not move the point where two or more lines join. The width
option can be specified for midlines_smooth
, this is passed to rollapply so see documentation there for more details. Beware that the wider the window the greater tenancy to cut the corners and therefore it becomes less like a midline.
Sometime it might be useful to remove small bits of midline that are not connected the to main midline. For example, in our earlier example
plot(m2$geometry) # remove the short bits of line m2 = midlines_debit(m2, length = set_units(500, "m") ) plot(m2$geometry)
In addition to manipulating the density of points on the perimeter of the initial polygon, it is sometimes desirable to manipulate the density of points that form the midline. This results in fewer longer line segments, rather than many shorter line segments. This can be done using midlines_dedensify
either before or after cleaning. Like with the midlines_smooth
function, the points where lines join are not affected but between line joins the number of points can be reduced. This function calls sf::st_line_sample so see this for the relevant options.
The estimation of midlines is done using Voronoi
tessellation via the sf::st_voronoi function. Some understanding of this process can help to understand how the attributes of the polygon will influence it's estimated midline. For a given set of points, Voronoi tessellation identifies the area around each point, i.e., that are closer to that point than any other. Below is an demonstration using the points in the perimeter of the river Thames polygon as the starting point.
j = st_line_sample(st_cast(thames, "LINESTRING"), density = 1/250) pols = st_collection_extract(st_voronoi(do.call(c, st_geometry(j)))) plot(pols)
An alternative use case is to merge two versions of a road network that are measured differently and therefore do not exactly match. Use buffers to creating a polygon around both networks and then estimate the midline of this polygon to give a combined network against which each can be compared. See this blog post for more details.
If you use this package in your work it would be helpful if you would cite it. A suggested citation is:
Patterson, R. midlines: Estimate Polygon Midlines. R package version 0.0.0.9002. DOI: 10.17863/CAM.100113
cmgo
does many of the same things as this package and much more. If your aim to to estimate the midline of a river, you should probably start with cmgo
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.