So far, we've demonstrated using emodnet.wfs to request complete layers.
However, WFS services allows us to be selective with what features are returned from the EMODnet servers. For example: using CQL filters, we can send a query including a filtering clause that will be executed on the server and return only the subset of the information we request. This can make queries a lot faster and reduce the need for filtering data after download.
For example, let's consider the case where we want to include the territorial sea boundaries in a habitat map.
One way to approach this would be to download the full layer containing all the European maritime boundaries from the human activities service and then filter the sf
object returned for features representing Territorial Seas.
A more efficient approach would be to use a filter and request only the features representing Territorial Seas from the server. Let's have a look at how to do that.
CQL (Common Query Language) is a plain-text language created for the OGC Catalog specification. GeoServer has adapted it to be an easy-to-use filtering mechanism. GeoServer actually implements a more powerful extension called ECQL (Extended CQL), which allows expressing the full range of filters that OGC Filter 1.1 can encode. ECQL is accepted in many places in GeoServer.
GeoServer supports the use of both CQL and ECQL in WFS requests and whenever the documentation refers to CQL, ECQL syntax can be used as well.
First, let's load the library and start a new WFS client with emodnet_init_wfs_client
.
library(emodnet.wfs)
wfs <- emodnet_init_wfs_client(service = "biology") #> ✔ WFS client created successfully #> ℹ Service: "https://geo.vliz.be/geoserver/Emodnetbio/wfs" #> ℹ Version: "2.0.0"
To develop filters, we first need information about the attributes we can filter on. For example, we might want to know the data type and range or distribution of values of each attributes of a given layer.
Let's look at existing layers.
emodnet_get_wfs_info(wfs) #> # A tibble: 36 × 9 #> # Rowwise: #> data_source service_name service_url layer_name title abstract class format #> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> #> 1 emodnet_wfs biology https://geo.vliz.be/geose… mediseh_c… EMOD… "Coral … WFSF… sf #> 2 emodnet_wfs biology https://geo.vliz.be/geose… mediseh_c… EMOD… "Coral … WFSF… sf #> 3 emodnet_wfs biology https://geo.vliz.be/geose… mediseh_c… EMOD… "Cymodo… WFSF… sf #> 4 emodnet_wfs biology https://geo.vliz.be/geose… Species_g… EMOD… "This d… WFSF… sf #> 5 emodnet_wfs biology https://geo.vliz.be/geose… Species_g… EMOD… "This d… WFSF… sf #> 6 emodnet_wfs biology https://geo.vliz.be/geose… Species_g… EMOD… "This d… WFSF… sf #> 7 emodnet_wfs biology https://geo.vliz.be/geose… Species_g… EMOD… "This d… WFSF… sf #> 8 emodnet_wfs biology https://geo.vliz.be/geose… mediseh_h… EMOD… "Haloph… WFSF… sf #> 9 emodnet_wfs biology https://geo.vliz.be/geose… mediseh_m… EMOD… "Maërl … WFSF… sf #> 10 emodnet_wfs biology https://geo.vliz.be/geose… mediseh_m… EMOD… "Maërl … WFSF… sf #> # ℹ 26 more rows #> # ℹ 1 more variable: layer_namespace <chr>
To start, you might want to know the names of a layer's attributes.
layer_attributes_get_names(wfs, layer = "mediseh_zostera_m_pnt") #> [1] "id" "country" "the_geom"
All functions can also be used by providing a service name instead of a wfs
object.
layer_attributes_get_names(service = "biology", layer = "mediseh_zostera_m_pnt") #> ✔ WFS client created successfully #> ℹ Service: "https://geo.vliz.be/geoserver/Emodnetbio/wfs" #> ℹ Version: "2.0.0" #> [1] "id" "country" "the_geom"
The type of filtering you might want to apply will depend on the data type of each attribute.
You can inspect attribute descriptions (metadata) for a given layer with layer_attribute_descriptions()
.
layer_attribute_descriptions(wfs, layer = "mediseh_zostera_m_pnt") #> name type minOccurs maxOccurs nillable geometry #> 1 id integer 0 1 TRUE FALSE #> 2 country character 0 1 TRUE FALSE #> 3 the_geom Point 0 1 TRUE TRUE
You can get summaries of the values of each attribute with layer_attributes_summarise()
. This function basically runs summary()
on the attribute columns of the layer.
layer_attributes_summarise(wfs, layer = "mediseh_zostera_m_pnt") #> gml_id id country #> Length:54 Min. :0 Length:54 #> Class :character 1st Qu.:0 Class :character #> Mode :character Median :0 Mode :character #> Mean :0 #> 3rd Qu.:0 #> Max. :0
You can also inspect individual attributes which, in the case of categorical variables, can give more detailed information on the names and distribution of categories.
layer_attribute_inspect( wfs, layer = "mediseh_zostera_m_pnt", attribute = "country" ) #> # A tibble: 7 × 3 #> . n percent #> <chr> <int> <dbl> #> 1 Croazia 4 0.0741 #> 2 Francia 1 0.0185 #> 3 Italia 30 0.556 #> 4 Libia 2 0.0370 #> 5 Slovenia 8 0.148 #> 6 Spagna 8 0.148 #> 7 Tunisia 1 0.0185
Finally, to enable full inspection and custom processing of attribute data, you can download the full set of attributes as a data.frame
, excluding the geometry column. As the geometries are usually the largest column, this is much faster than downloading the full layer and can be useful in developing attribute filtering rules.
attr_tbl <- layer_attributes_tbl(wfs, layer = "mediseh_zostera_m_pnt") attr_tbl #> # A tibble: 54 × 3 #> gml_id id country #> <chr> <int> <chr> #> 1 mediseh_zostera_m_pnt.1 0 Spagna #> 2 mediseh_zostera_m_pnt.2 0 Spagna #> 3 mediseh_zostera_m_pnt.3 0 Spagna #> 4 mediseh_zostera_m_pnt.4 0 Spagna #> 5 mediseh_zostera_m_pnt.5 0 Spagna #> 6 mediseh_zostera_m_pnt.6 0 Spagna #> 7 mediseh_zostera_m_pnt.7 0 Spagna #> 8 mediseh_zostera_m_pnt.8 0 Francia #> 9 mediseh_zostera_m_pnt.9 0 Italia #> 10 mediseh_zostera_m_pnt.10 0 Italia #> # ℹ 44 more rows
Once we know the values of a categorical attribute of a layer, we can use them to filter the features returned from our query.
For example, let's say we are only interested in results for France.
From our previous interrogation of the attributes associated with the mediseh_zostera_m_pnt
layer, we have seen that the attribute country
contains the country.
We can use the name of the attribute and the value we require to construct our eqcl filter and pass it to our request using the cql_filter
argument.
emodnet_get_layers( wfs, layers = "mediseh_zostera_m_pnt", cql_filter = "country = 'Francia'" ) #> $mediseh_zostera_m_pnt #> Simple feature collection with 1 feature and 3 fields #> Geometry type: POINT #> Dimension: XY #> Bounding box: xmin: 4.84864 ymin: 43.37637 xmax: 4.84864 ymax: 43.37637 #> Geodetic CRS: WGS 84 #> gml_id id country the_geom #> 1 mediseh_zostera_m_pnt.8 0 Francia POINT (4.84864 43.37637)
The above shows the most basic text comparisons we can make use the equality operator =
.
If we want to match more than one value, we can use the operator IN
instead and provide a list of values to compare to.
For example, to request both France and Spain, we could use the following filter:
emodnet_get_layers( wfs, layers = "mediseh_zostera_m_pnt", cql_filter = "country IN ('Francia', 'Spagna')" ) #> $mediseh_zostera_m_pnt #> Simple feature collection with 9 features and 3 fields #> Geometry type: POINT #> Dimension: XY #> Bounding box: xmin: -4.167154 ymin: 36.71226 xmax: 4.84864 ymax: 43.37637 #> Geodetic CRS: WGS 84 #> gml_id id country the_geom #> 1 mediseh_zostera_m_pnt.1 0 Spagna POINT (-2.61314 36.71681) #> 2 mediseh_zostera_m_pnt.2 0 Spagna POINT (-3.846598 36.75127) #> 3 mediseh_zostera_m_pnt.3 0 Spagna POINT (-3.957785 36.72266) #> 4 mediseh_zostera_m_pnt.4 0 Spagna POINT (-4.039712 36.74217) #> 5 mediseh_zostera_m_pnt.5 0 Spagna POINT (-4.100182 36.72331) #> 6 mediseh_zostera_m_pnt.6 0 Spagna POINT (-4.167154 36.71226) #> 7 mediseh_zostera_m_pnt.7 0 Spagna POINT (-1.268366 37.55796) #> 8 mediseh_zostera_m_pnt.8 0 Francia POINT (4.84864 43.37637) #> 9 mediseh_zostera_m_pnt.54 0 Spagna POINT (3.291868 42.29132)
It is the same as using the OR
filter:
emodnet_get_layers( wfs, layers = "mediseh_zostera_m_pnt", cql_filter = "country='Francia' OR country='Spagna'" ) #> $mediseh_zostera_m_pnt #> Simple feature collection with 9 features and 3 fields #> Geometry type: POINT #> Dimension: XY #> Bounding box: xmin: -4.167154 ymin: 36.71226 xmax: 4.84864 ymax: 43.37637 #> Geodetic CRS: WGS 84 #> gml_id id country the_geom #> 1 mediseh_zostera_m_pnt.1 0 Spagna POINT (-2.61314 36.71681) #> 2 mediseh_zostera_m_pnt.2 0 Spagna POINT (-3.846598 36.75127) #> 3 mediseh_zostera_m_pnt.3 0 Spagna POINT (-3.957785 36.72266) #> 4 mediseh_zostera_m_pnt.4 0 Spagna POINT (-4.039712 36.74217) #> 5 mediseh_zostera_m_pnt.5 0 Spagna POINT (-4.100182 36.72331) #> 6 mediseh_zostera_m_pnt.6 0 Spagna POINT (-4.167154 36.71226) #> 7 mediseh_zostera_m_pnt.7 0 Spagna POINT (-1.268366 37.55796) #> 8 mediseh_zostera_m_pnt.8 0 Francia POINT (4.84864 43.37637) #> 9 mediseh_zostera_m_pnt.54 0 Spagna POINT (3.291868 42.29132)
Other text comparisons include text pattern matching using operator LIKE
. For example, to request maritime boundaries for countries starting with an S
, we can use the following filter:
emodnet_get_layers( wfs = wfs, layers = "mediseh_zostera_m_pnt", cql_filter = "country LIKE 'S%'", simplify = TRUE ) #> Simple feature collection with 16 features and 3 fields #> Geometry type: POINT #> Dimension: XY #> Bounding box: xmin: -4.167154 ymin: 36.71226 xmax: 13.73725 ymax: 45.58252 #> Geodetic CRS: WGS 84 #> First 10 features: #> gml_id id country the_geom #> 1 mediseh_zostera_m_pnt.1 0 Spagna POINT (-2.61314 36.71681) #> 2 mediseh_zostera_m_pnt.2 0 Spagna POINT (-3.846598 36.75127) #> 3 mediseh_zostera_m_pnt.3 0 Spagna POINT (-3.957785 36.72266) #> 4 mediseh_zostera_m_pnt.4 0 Spagna POINT (-4.039712 36.74217) #> 5 mediseh_zostera_m_pnt.5 0 Spagna POINT (-4.100182 36.72331) #> 6 mediseh_zostera_m_pnt.6 0 Spagna POINT (-4.167154 36.71226) #> 7 mediseh_zostera_m_pnt.7 0 Spagna POINT (-1.268366 37.55796) #> 8 mediseh_zostera_m_pnt.39 0 Slovenia POINT (13.73725 45.56948) #> 9 mediseh_zostera_m_pnt.40 0 Slovenia POINT (13.70701 45.58252) #> 10 mediseh_zostera_m_pnt.41 0 Slovenia POINT (13.71894 45.54535)
CQL/ECQL filters can also include any of the filter functions available in GeoServer, including multiple string functions which greatly increases the power of CQL expressions.
For example, we can request countries that contain l
anywhere, including the first letter. To make the request case independent, we can turn country
names to lowercase and then use a like
comparison to a lowercase l
:
emodnet_get_layers( wfs = wfs, layers = "mediseh_zostera_m_pnt", cql_filter = "strToLowerCase(country) LIKE '%l%'", simplify = TRUE ) #> Simple feature collection with 40 features and 3 fields #> Geometry type: POINT #> Dimension: XY #> Bounding box: xmin: 11.71682 ymin: 33.07783 xmax: 14.4714 ymax: 45.72451 #> Geodetic CRS: WGS 84 #> First 10 features: #> gml_id id country the_geom #> 1 mediseh_zostera_m_pnt.9 0 Italia POINT (13.71831 45.70017) #> 2 mediseh_zostera_m_pnt.10 0 Italia POINT (13.16378 45.72451) #> 3 mediseh_zostera_m_pnt.11 0 Italia POINT (13.35982 45.70508) #> 4 mediseh_zostera_m_pnt.12 0 Italia POINT (12.26722 45.25975) #> 5 mediseh_zostera_m_pnt.13 0 Italia POINT (12.31285 45.362) #> 6 mediseh_zostera_m_pnt.14 0 Italia POINT (12.53509 45.53269) #> 7 mediseh_zostera_m_pnt.15 0 Italia POINT (12.48269 45.5065) #> 8 mediseh_zostera_m_pnt.16 0 Italia POINT (12.77334 43.96751) #> 9 mediseh_zostera_m_pnt.17 0 Italia POINT (13.62604 43.50809) #> 10 mediseh_zostera_m_pnt.18 0 Italia POINT (13.46398 43.60892)
A number of additional comparison operators, such as =, <>
, >
, >=
, <
, <=
, arithmetic operators +
, -
, *
, /
as well as comparison and math functions can be used for filtering numeric variables.
This time we'll use the mediseh_posidonia_nodata.
We can inspect again, using some of our interrogative functions to get information on layer attributes.
layer_attributes_get_names( wfs, layer = "mediseh_posidonia_nodata" ) #> [1] "id" "km" "the_geom" layer_attribute_inspect( wfs, layer = "mediseh_posidonia_nodata", attribute = "km" ) #> Min. 1st Qu. Median Mean 3rd Qu. Max. #> 0.2371 6.0306 16.6190 40.1771 42.8958 472.9613
We can see that values range between 0.23
and ~473
with a mean of ~40
.
Let's say we are interested in values greater than 400
. We can use the operator >
to create our filter:
emodnet_get_layers( wfs = wfs, layers = "mediseh_posidonia_nodata", cql_filter = "km > 400", simplify = TRUE ) #> Simple feature collection with 4 features and 3 fields #> Geometry type: MULTICURVE #> Dimension: XY #> Bounding box: xmin: -1.572609 ymin: 35.24538 xmax: 28.33243 ymax: 38.68017 #> Geodetic CRS: WGS 84 #> gml_id id km the_geom #> 1 mediseh_posidonia_nodata.8 0 456.3759 MULTICURVE (LINESTRING (2.3... #> 2 mediseh_posidonia_nodata.227 0 464.6573 MULTICURVE (LINESTRING (26.... #> 3 mediseh_posidonia_nodata.228 0 472.9613 MULTICURVE (LINESTRING (26.... #> 4 mediseh_posidonia_nodata.229 0 463.8207 MULTICURVE (LINESTRING (27....
To request a range we can use filter functions between
and and
.
emodnet_get_layers( wfs = wfs, layers = "mediseh_posidonia_nodata", cql_filter = "km between 40 and 400", simplify = TRUE ) #> Simple feature collection with 120 features and 3 fields #> Geometry type: MULTICURVE #> Dimension: XY #> Bounding box: xmin: -2.1798 ymin: 30.26623 xmax: 34.59127 ymax: 45.35511 #> Geodetic CRS: WGS 84 #> First 10 features: #> gml_id id km the_geom #> 1 mediseh_posidonia_nodata.1 0 291.50323 MULTICURVE (LINESTRING (27.... #> 2 mediseh_posidonia_nodata.2 0 75.37950 MULTICURVE (LINESTRING (23.... #> 3 mediseh_posidonia_nodata.4 0 110.34480 MULTICURVE (LINESTRING (19.... #> 4 mediseh_posidonia_nodata.13 0 66.99746 MULTICURVE (LINESTRING (9.1... #> 5 mediseh_posidonia_nodata.5 0 222.44651 MULTICURVE (LINESTRING (8.6... #> 6 mediseh_posidonia_nodata.6 0 58.72822 MULTICURVE (LINESTRING (6.8... #> 7 mediseh_posidonia_nodata.7 0 192.50519 MULTICURVE (LINESTRING (6.4... #> 8 mediseh_posidonia_nodata.10 0 65.74354 MULTICURVE (LINESTRING (-1.... #> 9 mediseh_posidonia_nodata.11 0 138.68819 MULTICURVE (LINESTRING (4.8... #> 10 mediseh_posidonia_nodata.17 0 184.05959 MULTICURVE (LINESTRING (10....
OR
and AND
statementsFilter conditions can also be a logical combination of other conditions using AND
, OR
or NOT
. We showed previously how we could use a list of potential values to match an attribute to. This could also be achieved using an OR
statement.
For example, the following query returns features where country is either Baltic Sea
or Bulgaria
.
wfs <- emodnet_init_wfs_client(service = "geology_seabed_substrate_maps") emodnet_get_layers( wfs = wfs, layers = "seabed_substrate_1m", cql_filter = "country='Baltic Sea' OR country='Bulgaria'", simplify = TRUE )
Let's say we also want to restrict the features returned those having shape_length
> 1. We can include an additional condition which must be met using an AND
statement.
filter_sf1 <- emodnet_get_layers( wfs = wfs, layers = "seabed_substrate_1m", cql_filter = "country='Baltic Sea' OR country='Bulgaria' AND shape_length>1", simplify = TRUE ) filter_sf1
The query returns features where country
is Baltic Sea
or Bulgaria
but we can see that the minimum value is below the minimum we set with our filter. That's because the AND
filter is added only to the country='Bulgaria'
, so while only features with shape_length > 1
are returned where country
is also Bulgaria
, the filter is not applied to shape_length
where country
is Baltic Sea
.
unique(filter_sf1$country) min(filter_sf1$shape_length)
To add the AND
filter to both OR
filters, we can use a parenthesis:
filter_sf2 <- emodnet_get_layers( wfs = wfs, layers = "seabed_substrate_1m", cql_filter = "(country='Baltic Sea' OR country='Bulgaria') AND shape_length>1", simplify = TRUE )
Now the minimum value is indeed above 1. However, we only get features where country
is 'Bulgaria'
. That's because the single Baltic Sea
feature had a shape_length
value smaller than 1 so is filtered out.
min(filter_sf2$shape_length) unique(filter_sf2$country)
Finally filtering for shape_length < 1
returns features with shape_length
values smaller than 1 for both country
values.
filter_sf3 <- emodnet_get_layers( wfs = wfs, layers = "seabed_substrate_1m", cql_filter = "(country='Baltic Sea' OR country='Bulgaria') AND shape_length<1", simplify = TRUE )
max(filter_sf3$shape_length) unique(filter_sf3$country)
There is more that can be accomplished by using the EMODnet WFS services than downloading data. The emodnet.wfs
package is built on top of the ows4R library, meaning that all the functionalities of this package are available for emodnet.wfs. The ows4R returns a special type of R object called R6. You can learn more in Hadley Wickham's chapter on R6 Objects of the Advance R book.
For instance: it is not efficient to read a large dataset into R just and later subset part of it. This requires longer waiting times and more bandwidth usage, and in very large datasets it would simply not be possible. For instance, all the occurrences data available through the EMODnet Biology portal are stored in one table: These are approximately 30 millions rows! In this case, we suggest you access the EMODnet Biology occurrence data through the EMODnet geoviewer or the eurobis R package. This package retrieves occurrences in batches, allowing for large datasets, and uses viewParams
, a geoserver-specific filtering feature for faster requests compared to standard OGC or CQL filters. In addition, one of these specific filters allow to query on species traits such as functional groups.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.