indexFTP: Create a recursive index of an FTP Server

View source: R/indexFTP.R

indexFTPR Documentation

Create a recursive index of an FTP Server

Description

Create a list of all the files (in all subfolders) of an FTP server. Defaults to the German Weather Service (DWD, Deutscher WetterDienst) OpenData server at https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/.
The R package RCurl must be available to do this.

It's not suggested to run this for all folders, as it can take quite some time and you may get kicked off the FTP-Server. This package contains an index of the climatic observations at weather stations (fileIndex) and gridded datasets (gridIndex). If they are out of date, please let me know!

Getting banned from the FTP Server
Normally, this shouldn't happen anymore: since Version 0.10.10 (2018-11-26), a single RCurl handle is used for all FTP requests. There's a provision if the FTP server detects bot requests and denies access. If RCurl::getURL() fails, there will still be an output which you can pass in a second run via folder to extract the remaining dirs. You might need to wait a bit and set sleep to a higher value in that case. Here's an example:

gridindex <- indexFTP("", gridbase)
gridindex <- indexFTP(gridindex, gridbase, sleep=15)

Of course, with a higher sleep value, the execution will take longer!

Note: Between version 1.0.17 (2019-05-14) and 1.8.26 (2025-05-20), the DWD provided a tree file that was used to obtain all folders first, eliminating the recursive calls. See issue 47.

Usage

indexFTP(
  folder = "",
  base = dwdbase,
  is.file.if.has.dot = TRUE,
  exclude.latest.bin = TRUE,
  fast = NULL,
  sleep = 0,
  nosave = FALSE,
  dir = locdir(),
  filename = folder[1],
  overwrite = FALSE,
  quiet = rdwdquiet(),
  progbar = !quiet,
  verbose = FALSE
)

Arguments

folder

Folder(s) to be indexed recursively, e.g. "/hourly/wind/". Leading slashes will be removed. Use folder="" to search at the location of base itself. DEFAULT: ""

base

Main directory of FTP server. Trailing slashes will be removed. DEFAULT: dwdbase

is.file.if.has.dot

Logical: if some of the input paths contain a dot, treat those as files, i.e. do not try to read those as if they were a folder. Only set this to FALSE if you know what you're doing. DEFAULT: TRUE

exclude.latest.bin

Exclude latest file at opendata.dwd.de/weather/radar/radolan? RCurl::getURL indicates this is a pointer to the last regularly named file. DEFAULT: TRUE

fast

Obsolete, ignored. DEFAULT: NULL

sleep

If not 0, a random number of seconds between 0 and sleep is passed to Sys.sleep() after each read folder to avoid getting kicked off the FTP-Server, see note above. DEFAULT: 0

nosave

Logical: do not save the results to disc? If TRUE, dir, filename and overwrite are ignored. DEFAULT: FALSE

dir

Writeable directory name where to save the downloaded file. Created if not existent. DEFAULT: locdir()

filename

Character: Part of output filename. "INDEX_of_DWD_" is prepended, "/" replaced with "_", ".txt" appended. DEFAULT: folder[1]

overwrite

Logical: Overwrite existing file? If not, "_n" is added to the filename, see berryFunctions::newFilename(). DEFAULT: FALSE

quiet

Suppress progbars and message about directory/files? DEFAULT: FALSE through rdwdquiet()

progbar

Logical: present a progress bar in each level? DEFAULT: TRUE

verbose

Logical: write a lot of messages from RCurl::getURL()? DEFAULT: FALSE (usually, you dont need all the curl information)

Value

a vector with file paths

Author(s)

Berry Boessenkool, berry-b@gmx.de, Oct 2016

See Also

createIndex(), updateIndexes(), website index chapter

Examples

## Not run:  ## Needs internet connection
sol <- indexFTP(folder="/daily/solar", dir=tempdir())
head(sol)

# with subfolders:
mon <- indexFTP(folder="/monthly", dir=tempdir())
unique(dirname(mon))
# mon <- indexFTP(folder="/monthly/kl", dir=tempdir(), verbose=TRUE)

## End(Not run)


brry/rdwd documentation built on June 11, 2025, 3:58 a.m.