library(OSMtidy)

knitr::opts_chunk$set(echo = TRUE, out.height = "75%", out.width = "75%")

This vignette describes the OSMtidy workflow. The workflow consists of six steps which are intended to be simple and easy to follow:

  1. Input A shapefile outlining the location
  2. Extract Spatial data – inside the shapefiles ‘bounding box’ – is extracted from OpenStreetMaps servers via the R package osmdata
  3. Cut The extracted data is ‘cookie cutter’-ed to the shapefile extent
  4. Wrangle The data is transformed into a suitable format for filtering
  5. Filter The physical objects are filtered and renamed to follow a simple naming convention
  6. Tidy Collates the outputs to form a streamlined database of physical objects

This vignette also introduces to helper functions which can be applied at all of these six steps: dataSummary() and dataExport().

To get started, you'll need:

1. Input - Using dataShapefile()

Using the function dataShapefile() we can import the shapefile for which data is to be extracted. There are two input arguments:

shp <- dataShapefile(filename = "exampleEdinburgh.shp", crs = 4326) # or dataShapefile("exampleEdinburgh.shp")
shp

1.1 dataSummary()

At each step, you can print a summary of the OSMtidy outputs using the function dataSummary().

dataSummary(shp)

1.2 dataExport()

You can also export any OSMtidy output using the function dataExport(). The export file names have the following convention:

dataExport(shp, "exampleEdinburgh")

2. Extract - Using dataExtract()

The OpenStreetMap data is extracted, via the R package osmdata and the overpass server, using the function dataExtract().

2.1 Features

This data is extracted by feature, and a vector of the names of features consisting of physical objects can be generated via data("features"). A vector of all the vailable features in osmdata can be accessed via the function osmdata::available_features.

data("features")
features <- features[c(2,18,19,22,24)]; features # For this example we'll select a subset of 5 features

osmdata::available_features # A vector of all 209 available features

2.2 dataExtract()

The function dataExtract() has three input arguments:

Details from the R package osmdata

timeout It may be necessary to increase this value for large queries, because the server may time out before all data are delivered. memsize The default memory size for the 'overpass' server in bytes; may need to be increased in order to handle large queries.> See https://wiki.openstreetmap.org/wiki/Overpass_API#Resource_management_options_.28osm-script.29 for explanation of timeout and memsize (or maxsize in overpass terms). Note in particular the comment that queries with arbitrarily large memsize are likely to be rejected.


Note this function may take some time to run. Timestamps and progress are printed while the function is running. It is recommended that you execute the function once to avoid flooding the overpass server. The example below extracts 5 of the 47 features.

dlExtract <- dataExtract(dataShapefile = shp, features = features)

dataSummary(dlExtract)

dataExport(dlExtract, "exampleEdinburgh")

3. Cut - Using dataCut()

In step 2 the data was extracted as a "bounding box" (a rectangle). In step 3, the data is cut to the shapefile using the function dataCut().

The function dataCut has two input arguments:

Timestamps and progress are printed when the function is running.

dlCut <- dataCut(dataExtracted = dlExtract, dataShapefile = shp)

dataSummary(dlCut)

dataExport(dlCut, "exampleEdinburgh")

4. Wrangle - Using dataWrangle()

Using the function dataWrangle we can tidy up (or wrangle) the data before filtering.

There is one input argument:

Timestamps and progress are printed when the function is running.

dlWrangle <- dataWrangle(dataCut = dlCut)

dataSummary(dlWrangle)

dataExport(dlWrangle, "exampleEdinburgh")

5. Filter - Using dataFilter()

5.1 Filters

The main function of OSMtidy is dataFilter(). Here, the data is filtered based on rules set out in an Excel spreadsheet. Default filters, generated as part of the Water Resilient Cities project, can be accessed using the data() function: data("filters").

data("filters")
filters

To see the spreadsheet the default filters are based on, or to access a template to create your own filters, generate the filepaths using the code below.

system.file("extdata", "filters.xlsx", package = "OSMtidy")
system.file("extdata", "filtersTemplate.xlsx", package = "OSMtidy")

See vignette 3 for further details.

5.2 filterOverview()

A filter overview can be generated using the function filterOverview(). The input can be either the filepath as a string or as an object in R. Examples of both are provided below.

filterOverview() with a filepath

filepath <- system.file("extdata", "filters.xlsx", package = "OSMtidyPackage")
filepath
filterOverview(filepath)

filterOverview() with an R object

data("filters")
filters
filterOverview(filters)


5.3 Application

There are three input arguments to dataFilter():

Timestamps and progress are printed when the function is running.

Depending on the location size, number of filters and computer performance, filters can take anything from a couple of minutes (the example ward) to multiple hours to run (City of London and Boroughs).

dlFilter <- dataFilter(dataWrangle = dlWrangle, filters = filters)
dataSummary(dlFilter)
dataExport(dlFilter, "exampleEdinburgh")

6. Tidy - Using dataTidy()

Note that multiple outputs from dataWrangle() and dataFilter() were spreadsheets (.xlsx extension). These spreadsheets may be manually adjusted; this is covered in the third vignette.

The function dataTidy() generates a single tidied output based on any combination (R object and/or spreadsheet) of this filtered, validated, unfiltered and no detail data.

There is one input argument to dataTidy():

In this vignette, the input is the list of outputs from steps 4 and 5. In vignettes 3 and 4 a number of alternative approaches are described.

The tidied geotagged dataset is saved in .RDS, and .csv for use in a range of applications. To export as a shapefile it is necessary to split the geotagged dataset by geometry type first. Still need to update how the final output works and runs

dlTidy <-
  dataTidy(dataList = 
             list(dlWrangle$noDetail,
                  dlFilter$unfiltered,
                  dlFilter$filtered,
                  dlFilter$validate))
dataSummary(dlTidy)
dataExport(dlTidy, "exampleEdinburgh")


avisserquinn/OSMtidy documentation built on June 3, 2023, 7:30 a.m.