rdwd
is an R package to handle data from the German Weather Service (DWD).
This website has 3 main sections:
Important links:
library(rdwd) # not sure why this is needed to actually make it quiet... options(rdwdquiet=TRUE) # This should suppress all progress and info messages in the website output # while not suggesting users to copypaste with explicit quiet=TRUE all the time. # Especially new users should see the messages and progress bars, I think
The remainder of this intro chapter is a copy of the github README file.
# mirror section of README reme <- readLines('../../README.md') doc <- grep("## Documentation", reme) + 0:4 reme <- gsub("###", "##", reme) cat(reme[-c(1,doc)], sep='\n')
helplink <- function(doc, topic=doc) paste0("[`",doc,"`](https://www.rdocumentation.org/packages/rdwd/topics/",topic,"){target=\"_blank\"}")
library(rdwd) print_short <- function(x) {out <- lapply(x, gsub, pattern=dwdbase, replacement="---") if(is.list(x)) out else unlist(out)}
rdwd
is designed around 6 main functions:
r helplink("selectDWD")
: select time series data, needing:index of files on the FTP server (r helplink("fileIndex", "index")
), see chapter fileIndex
r helplink("nearbyStations")
: alternative to selectDWD
for a given lat/lon location
raster data must be selected manually.
r helplink("dataDWD")
: download a file (or multiple files), without re-downloads for existing files
r helplink("readDWD")
: read that data into R (handling all the various file formats)
r helplink("plotDWD")
: plot time series
r helplink("plotRadar")
: visualize all raster data types nicely on a projected map
The rdwd
package provides a collection of all the metafiles on the
DWD data server.
It is presented as an interactive map below.
When a point is clicked, an infobox should appear. The first line can be copypasted into R to obtain more information on the available files. The map is created with the following code:
rdwd::updateRdwd() # for the latest version library(rdwd) ; data(geoIndex) ; library(leaflet) leaflet(geoIndex) %>% addTiles() %>% addCircles(~lon, ~lat, radius=900, stroke=F, color=~col) %>% addCircleMarkers(~lon, ~lat, popup=~display, stroke=F, color=~col)
The blue dots mark stations for which recent files are available (with >=1 file in 'recent' folder or 'BIS_DATUM' later than one year ago). The red dots mark all stations with only historical datasets.
To see only the stations with recent data, use the following code:
library(rdwd) ; data(geoIndex) ; library(leaflet) leaflet(data=geoIndex[geoIndex$recentfile,]) %>% addTiles() %>% addCircleMarkers(~lon, ~lat, popup=~display, stroke=F)
To request the nonpublic datasets counted in the infobox, please contact cdc.daten@dwd.de or klima.vertrieb@dwd.de. (The DWD cannot publish all datasets because of copyright restrictions).
Note: r helplink("geoIndex", "index")
is created using r helplink("createIndex")
in
updateIndexes
.
See also: interactive app to compare the weather of a certain time period across years, run locally with rdwd::app()
.
overview of the FTP folder structure
The following folders in res/var/per
notation (resolution/variable/period) are available at
dwdbase
.
Here's a full example URL, for a file at hourly/wind/recent: ftp://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/wind/recent/stundenwerte_FF_00164_akt.zip
"<" signifies a split into the folders per
= "recent" and "historical".
"<<" signifies a split into the folders per
= "now", "recent", "historical" and "meta_data".
"-" signifies that there are no further sub-folders.
"+" signifies another structure of subfolders.
The symbols are clickable URLS leading to the corresponding res/var/
FTP folder.
Please note:
1_minute/precipitation/historical
has a subfolder for each year. subdaily/standard_format
does not actually split in subfolders, but has recent/hist information in fileIndex
.
internally, rdwd
uses ftp://opendata.dwd.de URLS, but web browsers no longer support FTP, so the documentation links to the https variant.
library(magrittr) suppressWarnings(suppressPackageStartupMessages(library(huxtable))) options(huxtable.bookdown = FALSE) # no captions on tables f <- readODS::read_ods("FTP_folders.ods", sheet="folders", col_types=NA, as_tibble=FALSE) colnames(f)[1:4] <- c( "res= (columns)","1_minute","5_minutes","10_minutes") fu <- f # f with urls for(i in 2:ncol(f)) fu[-1,i] <- ifelse(f[-1,i]=="", " ", paste0('<a href="',sub("ftp:","https:",dwdbase),'/',colnames(f)[i],'/',f[-1,1],'" target="_blank">',f[-1,i],'</a>')) n <- which(fu$`res= (columns)`=="RES") fu[n,-1] <- colnames(fu)[-1] n <- n+1 # in the huxtable ht <- huxtable::hux(fu, add_colnames=TRUE) ht %>% # set_label("") %>% set_background_color(value="white") %>% set_background_color(3:nrow(ht), 2:ncol(ht), value="gray95") %>% set_all_borders(1) %>% # set_bottom_border(1, everywhere, 2) %>% set_bold(1, -1, TRUE) %>% set_bold(n, -1, TRUE) %>% set_rotation(1, -1, 90) %>% set_rotation(n, -1, 90) %>% set_align(everywhere, -1, "center") %>% set_valign(row=1, col=1, "bottom") %>% map_background_color(by_values(" "="white")) %>% set_position("center") %>% set_escape_contents(value=FALSE) detach("package:huxtable", unload=TRUE) # to suppress further autoformatting of tables # fails, but whatever - looks nice. # Note: date2-date1 needs as.numeric, otherwise huxtable fails with error about # assert_that not being available in chunk uc_berlinstationselection
library(rdwd) data("fileIndex") # online and table folder comparison on <- paste(fileIndex$res, fileIndex$var, fileIndex$per, sep="/") on <- unique(on) on <- on[!grepl("timeseries_overview$", on)] tab <- lapply(colnames(f)[-1], function(r) { if(r=="multi_annual") return(paste0(r,"//",f[f[,r]=="-",1])) per4 <- c("now", "recent", "historical", "meta_data") per2 <- c("recent", "historical") r0 <- f[,r]=="-" r2 <- f[,r]=="<" r4 <- f[,r]=="<<" rv0 <- paste0(r,"/", f[r0, 1], "/") rv2 <- paste0(r,"/", f[r2, 1], "/") rv4 <- paste0(r,"/", f[r4, 1], "/") rv0 <- if(any(r0)) rv0 else ""[0] rv2 <- if(any(r2)) paste0(rep(rv2, each=length(per2)), per2) else ""[0] rv4 <- if(any(r4)) paste0(rep(rv4, each=length(per4)), per4) else ""[0] c(rv0,rv2,rv4) }) tab <- unlist(tab) miss <- on[!on %in% tab] miss <- miss[! miss %in% c("subdaily/standard_format/recent", "subdaily/standard_format/historical", "annual/climate_indices/kl", "annual/climate_indices/precip", "monthly/climate_indices/kl", "monthly/climate_indices/precip", "annual/climate_indices/historical", "annual/climate_indices/recent", "monthly/climate_indices/historical", "monthly/climate_indices/recent" )] l <- length(miss) if(l>0) stop("\n\nThe following ", l, " FTP folders must yet be added to the ", "table (FTP_folder.ods) in section 'Available datasets':\n", paste(miss, collapse="\n")) miss <- tab[!tab %in% on] l <- length(miss) if(l>0) stop("\n\nThe following ", l, " table entries in section 'Available datasets' (from FTP_folder.ods)", " are no longer in the FTP folder:\n", paste(miss, collapse="\n"))
print_short
in this chapter is just a helper function to replace dwdbase with ---
for shorter output.
Weather stations can be selected geographically with the interactive map.
All stations within a certain radius around a given lat-long position can be obtained with
r helplink("nearbyStations")
.
The DWD station IDs can be obtained from station names with r helplink("findID")
:
findID("Potsdam") findID("Koeln", exactmatch=FALSE)
File selection by station name/id and folder can happen directly with r helplink("selectDWD")
.
It is designed to be very flexible:
# inputs can be vectorized, and period can be abbreviated: selectDWD(c("Potsdam","Wuerzburg"), res="hourly", var="sun", per="hist") %>% print_short
selectDWD(c("Potsdam","Wuerzburg"), res="hourly", var="sun", per="hist") %>% print_short
If res/var/per are left NA, an interactive selection is opened with the available folder options for the given station.
The time period can be doubled to get both filenames:
selectDWD("Potsdam", res="daily", var="kl", per="rh") %>% print_short
There may be a differing number of available files for several stations across all folders.
selectDWD(id=c(3467,5116), res="",var="",per="") %>% print_short
Since version 1.3.26 (2020-07-20), SelectDWD
collects warnings from the loop into a single message:
options(warn=0) # out of 43 stations with a partial match for the name "Berlin", only 20 have data in the folowing folder bd <- selectDWD("Berlin", exactmatch=FALSE, res="monthly", var="kl", per="h", quiet=FALSE) str( print_short(bd[bd!=dwdbase]) )
r helplink("indexFTP")
recursively lists all the files on an FTP-server (using RCurl::getURL
).
From those paths, r helplink("createIndex")
generates r helplink("fileIndex", "index")
and r helplink("gridIndex", "index")
.
It also downloads all (ca 84) description files with metadata and creates r helplink("metaIndex", "index")
and r helplink("geoIndex", "index")
.
All indexes are updated irregularly with the internal function updateIndexes
.
r helplink("selectDWD")
helps to query the fileIndex
.
The DWD often (but irregularly) updates or expands datasets, at which point the filenames in historical folders change.
If you find the index to be outdated ("download.file errors: [...] cannot open URL"),
check in the latest commits
if the github development version already has an updated index (install with r helplink("rdwd::updateRdwd()", "updateRdwd")
).
If not, please let me know and I will update it.
Meanwhile, either use current=TRUE in r helplink("selectDWD")
:
# all files at a given path, with current file index (RCurl required): links <- selectDWD(res="monthly", var="more_precip", per="hist", current=TRUE)
or, for repetitive usage, create your own file index (for a certain subfolder):
# recursively list files on the FTP-server: files <- indexFTP("hourly/sun") # use dir="some_path" to save the output elsewhere berryFunctions::headtail(files, 5, na=TRUE) # create and use a personal file index: cursun <- createIndex(files) head(cursun) sunlink <- selectDWD("Potsdam", res="hourly", var="sun", per="r", findex=cursun) # with other FTP servers, this should also work... funet <- indexFTP(base="ftp.funet.fi/pub/standards/w3/TR/xhtml11/", folder="") p <- RCurl::getURL( "ftp.funet.fi/pub/standards/w3/TR/xhtml11/", verbose=T, ftp.use.epsv=TRUE, dirlistonly=TRUE)
r helplink("selectDWD")
also uses a complete data.frame with meta information,
r helplink("metaIndex", "index")
(derived from the "Beschreibung" files in r helplink("fileIndex", "index")
).
# All metadata at all folders: data(metaIndex) str(metaIndex, vec.len=2)
View(data.frame(sort(unique(rdwd:::metaIndex$Stationsname)))) # ca 6k entries
r helplink("readDWD")
can correctly read such a data.frame from any folder on the FTP server:
# file with station metadata for a given path: m_link <- selectDWD(res="monthly", var="more_precip", per="hist", meta=TRUE) m_link <- grep(".txt$", m_link, value=TRUE) print_short(m_link) # (Monatswerte = monthly values, Beschreibung = description)
meta_monthly_rain <- dataDWD(m_link) str(meta_monthly_rain)
Meta files may list stations for which there are actually no files. These refer to nonpublic datasets (The DWD cannot publish all datasets because of copyright restrictions). To request those, please contact cdc.daten@dwd.de or klima.vertrieb@dwd.de.
The from and to dates do not always reflect the real time period with avalaible data.
They are read from the DWD _Beschreibung_Stationen.txt
files, e.g. this one, see also issue 32.
For up-to-date metaIndexes, check for updates in the development version (r helplink("rdwd::updateRdwd()", "updateRdwd")
), prompt me to update it, or use your own version:
ll <- selectDWD("", c("hourly","daily"), c("wind","kl"), "r", meta=TRUE) ll <- grep(".txt$", ll, value=TRUE) ll <- ll[!grepl("mn4",ll)] ll <- sub(dwdbase, "", ll) ll ind <- createIndex(ll, dir=tempdir(), meta=TRUE, checkwarn=FALSE) ind$metaIndex$hasfile <- TRUE metaInfo(3987, mindex=ind$metaIndex)
r helplink("readDWD")
has the argument fread
to read datasets through data.table::fread
.
This is significantly faster than base unzip + read.table
, especially for large historical files.
The default is fread=NA
, which checks for availability of the R package data.table
and the system command unzip
.
On Windows, the system command unzip
is not available by default, only through e.g. a git shell or Rtools (source).
If needed, install Rtools (directly at C:/Rtools since compiler paths may not have spaces, as there would be with C:/Program Files/R/Rtools/).
You might then need to add something like C:/rtools40/usr/bin
to PATH, e.g. with the R code
cat('PATH="${RTOOLS40_HOME}\\usr\\bin;${PATH}"', file="~/.Renviron", append=TRUE)
and then restart R (CTRL + SHIFT + F10 should suffice).
Check the availability of unzip
with
Sys.which("unzip") # gives me c:\\Rtools\\bin\\unzip.exe
Note that github user Mightynasty did not get unzip
in Rtools to work correctly.
For more background information, see rdwd issues 22, 23 and 24.
In case you do not need fast reading and don't want the warning about unzip, simply use readDWD("file.zip", fread=FALSE)
.
The following error messages have been reported to me.
[pfile] is the produkt file, e.g. produkt_klima_tag_19370101_19860630_00001.txt for
[path] DWDdata/daily_kl_historical_tageswerte_KL_00001_19370101_19860630_hist.zip
'unzip' is not recognized as an internal or external command, operable program or batch file.
'(unzip -p [path].zip [pfile].txt) > [Rtempfile]'
execution failed with error code 1
File '[Rtempfile]' has size 0. Returning a NULL data.frame. File contains no rows: [path].zip
Error in data.table::fread(paste("unzip -p", f, fp), na.strings = na9(), : File is empty: [Rtempfile]
In addition: Warning messages:
1: running command 'C:\Windows\system32\cmd.exe /c (unzip -p [path].zip [pfile].txt) > [Rtempfile]
had status 1
2: In shell(paste("(", input, ") > ", tt, sep = "")) : '(unzip -p [path].zip [pfile].txt) > [Rtempfile]'
execution failed with error code 1
Der Befehl "unzip" ist entweder falsch geschrieben oder konnte nicht gefunden werden.
The command "unzip" is either misspelled or could not be found.
For observational data at r helplink("dwdbase")
,
r helplink("selectDWD")
is the main function to choose data to be downloaded.
For gridded data at gridbase
, including
data interpolated onto a 1 km raster and radar data up to the last hour,
I don't yet understand the structure of the FTP server as well.
For now, you'll have to query r helplink("gridIndex", "index")
yourself, e.g. with
data(gridIndex) head(grep("historical", gridIndex, value=TRUE)) # currently available files in a given folder: rasterbase <- paste0(gridbase,"/seasonal/air_temperature_mean") ftp.files <- indexFTP("/16_DJF", base=rasterbase, dir=tempdir()) # current index of all grid files (takes > 2 min, yields >30k charstrings >5MB): gridIndexNow <- indexFTP(folder="currentgindex", base=gridbase, filename="grids")
If you send me examples of how you use it, I can then expand this in rdwd
.
For files that are not yet read correctly, you can also consult the
Kompositformatbeschreibung at https://www.dwd.de/DE/leistungen/radolan/radolan.html
Besides at r helplink("dwdbase")
and r helplink("gridbase", "dwdbase")
,
there's yet more data at https://opendata.dwd.de/weather.
Before running the code below, update the package:
rdwd::updateRdwd()
The following overview will usually unzip only a few selected files for speed and memory considerations.
In real life, you probably do not want to unzip to a temporary exdir
.
You can also remove read=FALSE
in dataDWD and add the needed arguments right there,
but I wanted to be explicit here.
The first line in each code block below shows for which FTP folder
at gridbase
this function will be called.
The last line shows what projection and extent to use in r helplink("plotRadar")
.
r helplink("readDWD.raster")
- for actual code, see use case: monthly gridded data
par(mar=c(2,2,2.2,5), mgp=c(3,0.7,0)) link <- "seasonal/air_temperature_mean/16_DJF/grids_germany_seasonal_air_temp_mean_188216.asc.gz" # 0.2 MB file <- dataDWD(link, base=gridbase, joinbf=TRUE, read=FALSE) rad <- readDWD(file) # with dividebyten=TRUE rad <- readDWD(file) # runs faster at second time due to skip=TRUE plotRadar(rad, main=".raster", proj="seasonal", extent=NULL)
r helplink("readDWD.nc")
par(mar=c(2,2,2.2,5), mgp=c(3,0.7,0)) link <- "daily/Project_TRY/pressure/PRED_199606_daymean.nc.gz" # 5 MB file <- dataDWD(link, base=gridbase, joinbf=TRUE, read=FALSE) rad <- readDWD(file) # can also have interactive selection of variable plotRadar(rad, main=".nc", proj="nc", extent="nc", layer=1)
r helplink("readDWD.binary")
- see also use case: recent hourly radar files
par(mar=c(2,2,2.2,5), mgp=c(3,0.7,0)) link <- "hourly/radolan/reproc/2017_002/bin/2017/RW2017.002_201712.tar.gz" # 25 MB file <- dataDWD(link, base=gridbase, joinbf=TRUE, read=FALSE) rad <- readDWD(file, exdir=tempdir(), selection=1:3) plotRadar(rad$dat, main=".binary RW", extent="rw", layer=1)
r helplink("readDWD.binary")
par(mar=c(2,2,2.2,5), mgp=c(3,0.7,0)) link <- "/daily/radolan/historical/bin/2017/SF201712.tar.gz" # 204 MB file <- dataDWD(link, base=gridbase, joinbf=TRUE, read=FALSE) rad <- readDWD(file, exdir=tempdir(), selection=1:3) # with toraster=TRUE plotRadar(rad$dat, main=".binary SF", layer=1)
r helplink("readDWD.asc")
par(mar=c(2,2,2.2,5), mgp=c(3,0.7,0)) link <- "hourly/radolan/historical/asc/2018/RW-201809.tar" # 25 mB file <- dataDWD(link, base=gridbase, joinbf=TRUE, read=FALSE) # dbin -> mode=wb rad <- readDWD(file, selection=1:3, dividebyten=TRUE) plotRadar(rad, main=".asc", layer=1)
r helplink("readDWD.radar")
- for actual code for daily data, see use case: daily radar files
par(mar=c(2,2,2.2,5), mgp=c(3,0.7,0)) suppressPackageStartupMessages(library(R.utils)) links <- indexFTP("hourly/radolan/recent/bin", base=gridbase, dir=tempdir()) # 0.04 MB file <- dataDWD(links[773], base=gridbase, joinbf=TRUE, dir=tempdir(), read=FALSE) rad <- readDWD(file) plotRadar(rad$dat, main=".radar RW")
r helplink("readDWD.radar")
par(mar=c(2,2,2.2,5), mgp=c(3,0.7,0)) rqbase <- "ftp://opendata.dwd.de/weather/radar/radvor/rq" links <- indexFTP("", base=rqbase, dir=tempdir()) # 0.07 MB file <- dataDWD(links[17], base=rqbase, joinbf=TRUE, dir=tempdir(), read=FALSE) rad <- readDWD(file) plotRadar(rad$dat, main=".radar RQ")
r helplink("readDWD.grib2")
par(mar=c(2,2,2.2,5), mgp=c(3,0.7,0)) nwpbase <- "ftp://opendata.dwd.de/weather/nwp/icon-d2/grib/00/t_2m" links <- indexFTP("", base=nwpbase, dir=tempdir()) file <- dataDWD(links[6], base=nwpbase, joinbf=TRUE, dir=tempdir(), read=FALSE) #forecast <- readDWD(file) #plotRadar(forecast, main=".grib2", project=FALSE)
grib files are currently broken (2021-04-08) but would look like this:
Binary files must be downloaded by download.file
with wb=TRUE (at least on Windows, due to CRLF issues).
download.file
will automatically do that for some file endings (like .gz, .zip).
For others (e.g. .tar files in readDWD.asc
or files at weather/radar/radolan),
r helplink("dataDWD")
has a dbin=TRUE option.
Since Version 1.4.18 (2021-04-08), the default is dbin=TRUE
. Report errors here if needed.
If you do not use this, your plots may look partially shifted like this and have the wrong units (image from 2020-06-16 21:30 CEST):
url <- "ftp://opendata.dwd.de/weather/radar/radolan/rw/raa01-rw_10000-latest-dwd---bin" rw_file <- dataDWD(url, read=FALSE, dbin=FALSE) rw_orig <- dwdradar::readRadarFile(rw_file) terra::plot(terra::rast(rw_orig$dat))
Here are some references related to weather data analysis. Expansion suggestions are very welcome via github or berry-b@gmx.de!
R packages for weather data
USA: countyweather, rnoaa
Canada: weathercan
Netherlands: KNMIr
World: GSODR
France: Blogpost, not package
Other resources
UK data website
The Python equivalent of rdwd: the package wetterdienst that also covers Radolan and Radvor and has local caching (even with an SQL interface), but has many dependencies. It accesses more data sources than rdwd and currently has a more active development community.
the Catchment Attributes and MEteorology for Large-sample Studies initiative (CAMELS).
download & read data
rdwd::updateRdwd() library(rdwd) link <- selectDWD("Potsdam", res="daily", var="kl", per="recent") clim <- dataDWD(link, force=NA, varnames=TRUE) str(clim)
plot time series
par(mar=c(4,4,2,0.5), mgp=c(2.7, 0.8, 0), cex=0.8) plot(clim[,c(2,14)], type="l", xaxt="n", las=1, main="Daily temp Potsdam") berryFunctions::monthAxis() ; abline(h=0) mtext("Source: Deutscher Wetterdienst", adj=-0.1, line=0.5, font=3)
par(mar=c(4,4,2,0.5), mgp=c(2.7, 0.8, 0), cex=0.8) link <- selectDWD("Goettingen", res="monthly", var="kl", per="h") clim <- dataDWD(link) clim$month <- substr(clim$MESS_DATUM_BEGINN,5,6) temp <- tapply(clim$MO_TT, clim$month, mean, na.rm=TRUE) prec <- tapply(clim$MO_RR, clim$month, mean, na.rm=TRUE) berryFunctions::climateGraph(temp, prec, main="Goettingen") mtext("Source: Deutscher Wetterdienst", adj=-0.05, line=2.8, font=3)
See also the app to visualize the weather of a given time period, compared to the measurements of the same period in other years.
This can also be run locally with rdwd::app()
.
links <- selectDWD("Potsdam", res="daily", var="kl", per="hr") klima <- dataDWD(links, hr=4) plot(TMK~MESS_DATUM, data=tail(klima,1500), type="l") links <- selectDWD("Celle", res="daily", var="kl", per="hr") klima <- dataDWD(links, hr=4, varnames=TRUE) plotDWD(tail(klima,800), "PM.Luftdruck")
I'ts always good to update before getting started:
rdwd::updateRdwd() library(rdwd)
Now we'll just specify some IDs we're interested in:
ids <- c(3988, 5559, 2456, 3034, 1964, 4549, 2950, 5419, 2641, 3565) links <- selectDWD(id=ids, res="daily", var="weather_phenomena", per="h") phen <- dataDWD(links) names(phen) <- substr(names(phen), 54,58) # only IDs (not paths) as name str(phen, max.level=1)
All further analysis can now be done on this named list.
From the monthly
folder at r helplink("gridbase", "dwdbase")
, we want to
download .asc.gz files for selected years and open them in R for further processing.
data("gridIndex") index <- grep("monthly/precipitation", gridIndex, value=TRUE) # 1'709 index <- grep('2014|2015|2016',index, value=TRUE) # n=36 (3*12) precip <- dataDWD(index[6:8], base=gridbase, joinbf=TRUE)
For .asc.gz files, r helplink("readDWD")
calls r helplink("readDWD.raster")
.
This runs faster if called a second time due to skip=TRUE in gunzip
.
Now we can project and visualize with:
plotRadar(precip[[1]], proj="seasonal", extent=NULL, main=names(precip)[1])
For further processing, we can create a terra raster from the list, which e.g. enables fast and easy indexing.
precip <- terra::rast(precip)
This is expanded upon in the next use case.
A few projection references:
The Beschreibung file
leads to
https://spatialreference.org/ref/epsg/31467/
which is used internally in r helplink("projectRasterDWD")
, currently at
line 70.
Obtain time series of values at a selected location in gridded data.
library(rdwd) # select data index <- indexFTP(folder="annual/air_temperature_max", base=gridbase) index <- index[-(1:2)] # exclude description files index <- index[as.numeric(substr(index,62,65))>=2013] # after year 2013 index
# download & read data: tempmax <- dataDWD(index, base=gridbase, joinbf=TRUE) names(tempmax) <- substr(names(tempmax), 62, 65)
# visual data & projection check: plotRadar(tempmax[[1]], proj="seasonal",extent="seasonal", main="Annual grid of monthly averaged daily maximum air temperature (2m) - 2013")
# terra raster: tempmax_stack <- terra::rast(tempmax) tempmax_stack <- projectRasterDWD(tempmax_stack, proj="seasonal",extent="seasonal") tempmax_stack terra::plot(tempmax_stack, range=range(terra::minmax(tempmax_stack)) )
# Time series at given location: loc <- data.frame(x=12.65295, y=53.06547) # Aussichtspunkt Kyritz-Ruppiner Heide round(unlist(terra::extract(tempmax_stack, loc)[-1]),1)
Update the package first:
rdwd::updateRdwd() library(rdwd)
rdwd::updateRdwd()
Download and read with r helplink("readDWD.radar")
with dividebyten=TRUE
:
# library("rdwd") radbase <- paste0(gridbase,"/daily/radolan/recent/bin/") radfile <- format(Sys.Date()-5, "raa01-sf_10000-%y%m%d1450-dwd---bin.gz") rad <- dataDWD(radfile, base=radbase, joinbf=TRUE)
Project and then plot:
radp <- projectRasterDWD(rad$dat) plotRadar(radp, main=paste("mm in 24 hours preceding", rad$meta$date), project=FALSE)
Save the projected radar image as NCDF file, read back and plot (not executed):
terra::writeCDF(radp, "rad_0714_2350.nc", overwrite=TRUE, varname="pre", longname="precipitation24hrs", zname="nbands") nc <- ncdf4::nc_open("rad_0714_2350.nc") nc pre <- ncdf4::ncvar_get(nc, "pre") lon <- nc$dim$longitude$vals lat <- nc$dim$latitude$vals image(lon, rev(lat), pre[,ncol(pre):1]) # takes several seconds rdwd::addBorders()
Single RW files at https://opendata.dwd.de/weather/radar/radolan/rw should be read
with the underlying readRadarFile
that has been outsourced to dwdradar
to keep the basic rdwd
as lean as possible.
Since these datasets only exist for two days on the FTP Server, I'm storing them in tempdir()
.
Please note that for projecting (see r helplink("projectRasterDWD")
),
the radolan extent seems to be needed, not the rw extent.
rw_base <- "ftp://opendata.dwd.de/weather/radar/radolan/rw" rw_urls <- indexFTP(base=rw_base, dir=tempdir(), folder="", exclude.latest.bin=TRUE) rw_file <- dataDWD(rw_urls[14], base=rw_base, joinbf=TRUE, dir=tempdir(), read=FALSE) rw_file_actual <- R.utils::bunzip2(rw_file) # zipped since between 2022-06 and 2023-04 rw_orig <- dwdradar::readRadarFile(rw_file_actual) str(rw_orig) # NB: this is an rw file, but needs radolan extent instead of rw plotRadar(terra::rast(rw_orig$dat), extent="radolan", main=rw_orig$meta$date)
Choose station in Berlin with longest monthly average recordings (according to metadata, which is not always correct).
ids <- findID("Berlin", exactmatch=FALSE) head(ids) data("metaIndex") berlin <- metaIndex[with(metaIndex, Stations_id %in% ids & res=="monthly" & var=="kl" & per=="historical"),] berlin$ndays <- as.numeric(berlin$bis_datum - berlin$von_datum) berlin <- berryFunctions::sortDF(berlin, ndays) berlin$von_datum <- as.character(berlin$von_datum) # avoid huxtable error berlin$bis_datum <- as.character(berlin$bis_datum) berlin # Dahlem (FU) has data since 1719 !
par(mar=c(2,2,0.2,0.2)) url <- selectDWD("Berlin-Dahlem (FU)", res="monthly", var="kl", per="h") kl <- dataDWD(url, varnames=TRUE) plot(kl$MESS_DATUM, kl$MO_TT.Lufttemperatur, type="l", las=1) # pretty complete
par(mar=c(1.5, 3, 2, 0.2)) monthly <- tapply(kl$MO_TT.Lufttemperatur, format(kl$MESS_DATUM,"%m"), quantile, probs=0:10/10) monthly <- sapply(monthly, I) plot(1, type="n", xlim=c(1,12), ylim=range(monthly), xaxt="n", las=1, xlab="Vis by Berry B, github.com/brry/rdwd", ylab="monthly temperature average", main="Temperature variation in Berlin is highest in winter") axis(1, 2:12-0.5, labels=FALSE) axis(1, 1:12, substr(month.abb,1,1), tick=FALSE, line=-0.5) for(m in 1:12) for(q in 1:5) lines(c(m,m), monthly[c(q,12-q),m], lend=1, col=berryFunctions::addAlpha("red",0.2), lwd=5) for(m in 1:12) points(m, mean(monthly[,m]), pch=3)
Clausius-Clapeyron scaling holds even for very high temperatures, we just don't have enough data yet to have observed the expected extreme rainfall intensities.
If quantiles are estimated by appropriately fitting a GDP and using its quantiles, extreme rainfall intensity estimates continue to rise with air temperature.
Code (with a much older version of rdwd
, might not run out of the box any more):
https://github.com/brry/prectemp/blob/master/Code_analysis.R
Publication:
http://www.nat-hazards-earth-syst-sci-discuss.net/nhess-2016-183
rdwd::updateRdwd() library(rdwd) links <- selectDWD(res="daily", var="more_precip", per="hist") length(links) # ca 5k stations - would take very long to download # select only the relevant files: data("metaIndex") myIndex <- metaIndex[ metaIndex$von_datum < as.Date("2014-01-01") & metaIndex$bis_datum > as.Date("2016-12-31") & metaIndex$hasfile , ] data("fileIndex") links <- fileIndex[ suppressWarnings(as.numeric(fileIndex$id)) %in% myIndex$Stations_id & fileIndex$res=="daily" & fileIndex$var=="more_precip" & fileIndex$per=="historical" , "path" ] length(links) # ca 2k elements - much better
If some downloads fail (mostly because you'll get kicked off the FTP server), you can just run the same code again and only the missing files will be downloaded.
If you really want to download 2k historical (large!) datasets,
you might need to set sleep
in r helplink("dataDWD")
to a relatively high value.
For speed, we'll only work with the first 3 urls.
localfiles <- dataDWD(links[1:3], joinbf=TRUE, sleep=0.2, read=FALSE)
2k large datasets probably is way too much for memory, so we'll use a custom reading function. It will only select the relevant time section and rainfall column. The latter will be named with the id extracted from the filename.
readVars(localfiles[1])[,-3] # we want the RS column read2014_2016 <- function(file, fread=TRUE, ...) { out <- readDWD(file, fread=fread, ...) out <- out[out$MESS_DATUM >= as.Date("2014-01-01") & out$MESS_DATUM <= as.Date("2016-12-31") , ] out <- out[ , c("MESS_DATUM", "RS")] # Station id as column name: idstringloc <- unlist(gregexpr(pattern="tageswerte_RR_", file)) idstring <- substring(file, idstringloc+14, idstringloc+18) colnames(out) <- c("date", idstring) return(out) } str(read2014_2016(localfiles[1])) # test looks good
Now let's apply this to all our files and merge the result.
rain_list <- pbapply::pblapply(localfiles, read2014_2016) rain_df <- Reduce(function(...) merge(..., all=T), rain_list) str(rain_df) # looks nice! summary(rain_df) # 9 NAs in station 00006
plot(rain_df$date, rain_df[,2], type="n", ylim=range(rain_df[,-1], na.rm=T), las=1, xaxt="n", xlab="Date", ylab="Daily rainfall sum [mm]") berryFunctions::monthAxis() for(i in 2:ncol(rain_df)) lines(rain_df$date, rain_df[,i], col=sample(colours(), size=1)) plot(rain_df[,2:4]) # correlation plot only works for a few columns!
Let's see the locations of our stations in an interactive map.
data(geoIndex) ; library(leaflet) mygeoIndex <- geoIndex[geoIndex$id %in% as.numeric(colnames(rain_df)[-1]),] leaflet(data=mygeoIndex) %>% addTiles() %>% addCircleMarkers(~lon, ~lat, popup=~display, stroke=T)
For a static map with scaleBar, use OSMscale:
library(OSMscale) pointsMap("lat", "lon", mygeoIndex, fx=2, fy=1, pargs=list(lwd=3), col="blue", zoom=5)
m <- nearbyStations(49.211784, 9.812475, radius=30, res=c("daily","hourly"), var=c("precipitation","more_precip","kl"), mindate=as.Date("2016-05-30"), statname="Braunsbach catchment center") # Remove duplicates. if kl and more_precip are both available, keep only more_precip: library("berryFunctions") m <- sortDF(m, "var") m <- m[!duplicated(paste0(m$Stations_id, m$res)),] m <- sortDF(m, "res") m <- sortDF(m, "dist", decreasing=FALSE) rownames(m) <- NULL DT::datatable(m, options=list(pageLength=5, scrollX=TRUE))
Interactive map of just the meteo station locations:
library(leaflet) m$col <- "red" ; m$col[1] <- "blue" leaflet(m) %>% addTiles() %>% addCircles(lng=9.812475, lat=49.211784, radius=30e3) %>% addCircleMarkers(~geoLaenge, ~geoBreite, col=~col, popup=~Stationsname)
Download and process data for the stations, get the rainfall sums of a particular day (Braunsbach flood May 2016):
prec <- dataDWD(m$url, fread=TRUE) names(prec) <- m$Stations_id[-1] prec29 <- sapply(prec[m$res[-1]=="daily"], function(x) { if(nrow(x)==0) return(NA) col <- "RS" if(!col %in% colnames(x)) col <- "R1" if(!col %in% colnames(x)) col <- "RSK" sel <- x$MESS_DATUM==as.Date("2016-05-29") if(!any(sel)) return(NA) x[sel, col] }) prec29 <- data.frame(Stations_id=names(prec29), precsum=unname(prec29)) prec29 <- merge(prec29, m[m$res=="daily",c(1,4:7,14)], sort=FALSE) head(prec29[,-7]) # don't show url column with long urls
7 of the files contain no rows. readDWD warns about this (but the warnings are suppressed in this website).
One example is daily/more_precip/historical/tageswerte_RR_07495_20070114_20181231_hist.zip
For a quick look without a map, this works:
plot(geoBreite~geoLaenge, data=m, asp=1) textField(prec29$geoLaenge, prec29$geoBreite, prec29$precsum, col=2)
But it's nicer to have an actual map via OSMscale:
library(OSMscale) map <- pointsMap(geoBreite,geoLaenge, data=m, type="osm", plot=FALSE) pp <- projectPoints("geoBreite", "geoLaenge", data=prec29, to=map$tiles[[1]]$projection) prec29 <- cbind(prec29,pp) ; rm(pp) pointsMap(geoBreite,geoLaenge, data=m, map=map, scale=FALSE) textField(prec29$x, prec29$y, round(prec29$precsum), font=2, cex=1.5) scaleBar(map, cex=1.5, type="line", y=0.82) title(main="Rainfall sum 2016-05-29 7AM-7AM [mm]", line=-1)
Shapefile of Landkreis districts:
https://public.opendatasoft.com/explore/dataset/landkreise-in-germany/export/
(file size 4 MB, unzipped 10 MB)
# Select monthly climate data: data("metaIndex") ; m <- metaIndex m <- m[m$res=="monthly" & m$var=="kl" & m$per=="recent" & m$hasfile, ] # Transform into spatial object: msf <- sf::st_as_sf(m, coords=c("geoLaenge", "geoBreite"), crs=4326) # Read district shapefile, see link above: lk <- sf::st_read("landkreise-in-germany.shp", quiet=TRUE) # intersections: list with msf rownumbers for each district: int <- sf::st_intersects(lk, msf)
https://gis.stackexchange.com/a/318629/36710
# plot to check projection: plot(lk[,"id_2"], reset=FALSE) colPoints("geoLaenge", "geoBreite", "Stationshoehe", data=m, add=T, legend=F) # berryFunctions::colPointsLegend + sf plots = set margins, see note there! axis(1, line=-1); axis(2, line=-1, las=1) points(m[int[[2]], c("geoLaenge", "geoBreite")], pch=16, col=2, cex=1.8)
Running analysis for a few selected districts only to reduce computation time.
Monthly rainfall average per Landkreis.
landkreis_rain <- function(lki) # LandKreisIndex (row number in lk) { rnr <- int[[lki]] # msf row number if(length(rnr)<1) { warning("No rainfall data available for Landkreis ", lki, ": ", lk$name_2[lki], call.=FALSE) out <- data.frame(NA,NA)[FALSE,] colnames(out) <- c("MESS_DATUM", as.character(lk$name_2[lki])) return(out) } urls <- selectDWD(id=m[rnr, "Stations_id"], res="monthly", var="kl", per="r") clims <- dataDWD(urls, varnames=FALSE) if(length(urls)==1) {rainmean <- clims$MO_RR monthlyrain <- clims[c("MESS_DATUM", "MO_RR")] } else { monthlyrain <- lapply(seq_along(clims), function(n) { out <- clims[[n]][c("MESS_DATUM", "MO_RR")] colnames(out)[2] <- names(clims)[n] # no duplicate names out }) monthlyrain <- Reduce(function(...) merge(..., by="MESS_DATUM",all=TRUE), monthlyrain) rainmean <- rowMeans(monthlyrain[,-1], na.rm=TRUE) # check also with median, variation is huge! } out <- data.frame(monthlyrain[,1], rainmean) colnames(out) <- c("MESS_DATUM", as.character(lk$name_2[lki])) return(out) } rainLK <- pbapply::pblapply(c(133,277,300,389), landkreis_rain) rainLK <- Reduce(function(...) merge(..., by="MESS_DATUM",all=TRUE), rainLK) head(rainLK)
Located here: https://opendata.dwd.de/climate_environment/CDC/observations_germany/phenology
This example uses data in the subfolder annual/crops/hist
phenocrop_base <- paste0(sub("climate$", "phenology", dwdbase), "/annual_reporters/crops/historical/") # pheno_urls <- indexFTP("", base=phenocrop_base, dir="Pheno") kohl_url <- "PH_Jahresmelder_Landwirtschaft_Kulturpflanze_Weisskohl_1951_1990_hist.txt"# 9 MB kohl_file <- dataDWD(base=phenocrop_base, url=kohl_url, joinbf=TRUE, read=FALSE) kohl <- read.table(kohl_file, sep=";", header=TRUE) summary(kohl)
This needs rdwd >= 1.5.3 (2021-06-01)
https://opendata.dwd.de/climate_environment/CDC/derived_germany/soil/daily/historical/
dbase <- "ftp://opendata.dwd.de/climate_environment/CDC/derived_germany" soilIndex <- indexFTP(folder="soil/daily", base=dbase) soilIndex <- createIndex(soilIndex, base=dbase) # "res" and "var" are inverted in the derived_germany folder! colnames(soilIndex)[1:2] <- c("var", "res") # non-standard column order, but rdwd should always use names (not positions) head(soilIndex) summary(soilIndex$id) # select URL: soil_link <- selectDWD("Potsdam", res="", var="", per="", base=dbase, findex=soilIndex) # Add monthly data: soilIndexm <- indexFTP(folder="soil/monthly", base=dbase) soilIndexm <- createIndex(soilIndexm, base=dbase) soil_link <- c(soil_link, selectDWD("Potsdam", res="", var="", per="", base=dbase, findex=soilIndexm)) # download and read files: soil_data <- dataDWD(soil_link, base=dbase) # later reading runs are much faster because unzipping is already done :)
To get files for all stations, use the following:
dbase <- "ftp://opendata.dwd.de/climate_environment/CDC/derived_germany" soil_index <- indexFTP(folder="soil/daily", base=dbase) soil_index <- createIndex(soil_index, base=dbase) soil_links <- selectDWD(res="soil", var="daily", per="r", base=dbase, findex=soil_index) soil_data <- dataDWD(soil_links, base=dbase)
In 2016, I published my Master thesis work on high intensity rainfall depending on temperature,
see rainfall use case.
I also started a PhD in flood research and was part of a taskforce to analyze a flashflood in Braunsbach (southwestern Germany), see
nearbyStations use case.
For both these projects, I developed some R code to download and read data for given weather stations.
In June 2016, these scripts got included in my misc package berryFunctions, see the commit history there.
The DWD had just started their CDC FTP server with a serious amount of data the year before, so this was an exciting time to work on it.
rdwd
started life as a standalone package in October 2016.
Back then, it only had dataDWD and readDWD. That quickly expanded, see the
early rdwd commit history.
In January 2017, it was first published on CRAN.
A bit of development was done in early 2017, then I quit my PhD (apparently, software dev is not valued much by grant-acquiring research institutions) and spent 2018 mostly offline building a house.
Early 2019 marked a turning point: I went to France for three months and worked a lot on rdwd
, introducing support for reading asc and binary raster files.
That was outsourced to dwdradar
over summer to keep the rdwd
development version easy to install (i.e. without FORTRAN code).
Systematic testing and raster plotting/projection methods got implemented and documentation took a leap forward.
Over 2019, an increasing number of feature requests spurred further development, including creating this homepage (now I hardly ever need to answer "yes, it's already implemented, look here" anymore ^^).
A lot of automation went into updating the indexes of available files and runnning all local tests (code and index).
Nowadays, I just need to trigger them by calling the two related functions.
2020 (especially the summer) then saw a lot of refinement in details, slowly morphing rdwd
into a mature and stable package.
In 2023, I finally built the app and changed all raster/sp/rgdal code to terra.
More detailed (but still aggregated) changes can nicely be seen at https://github.com/brry/rdwd/releases
I plan to continue maintaining the package, even though its capabilities have long exceeded my personal needs.
Coding simply brings joy - and of course it's also very satisfactory to see my work actually used in many contexts.
A few (rather minor) issues are also still open and I expect that state of things to continue for a long time :).
Lastly, I hope to find some help in understanding the structure of gridded data to improve that part of the package.
If you want to support rdwd
, you can
- fix issues / report bugs
- submit feature requests
- thank me (that's very encouraging)
- suggest me as a trainer (brry.github.io)
rdwd description
Weather Data Germany download with R, Climate Data Germany
Deutscher Wetterdienst R Daten download Klimastationen
DWD Daten mit R runterladen, Wetter und Klimadaten in R
DWD Radolan Raster data
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.