downloadcomextfile: Download bulk data from Comext into a designated folder

Description Usage Arguments Details Value Examples

View source: R/downloadcomext.R

Description

Download files from the Comext bulk data repository located at http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&dir=comext Make sure that the output of paste0(comextfolder, comextfile) converted to html characters returns an URL that has a matching file name on the bulk download repository.

If the file doesn't exist, Comext will still return a page, with a table containing the following message: "File **** does not exist or is not readable on the server". In this case, the download status will be 0 (sucess) even though the file failed to download.

If the destination folder is empty, download all files. Otherwise, if there are files in the destination folder, download only data from the past recentyears years.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
downloadcomextfile(
  comextfolder,
  comextfile,
  rawdatafolder,
  pause = 0,
  logfile = file.path("~/log", "harvesterrorlog.txt"),
  method = "libcurl"
)

downloadcomextmonthlyrecent(
  rawdatafolder,
  comextfolderpath = "/COMEXT_DATA/PRODUCTS",
  extension = ".7z",
  recentyears = 4,
  pause = 10
)

downloadcomextmetadata(
  rawdatafolder,
  comextfolderpath = "/COMEXT_METADATA/CLASSIFICATIONS_AND_RELATIONS/ENGLISH",
  extension = ".txt",
  recentyears = 4,
  pause = 10
)

downloadcomextmonthlyarchive(
  startyear,
  rawdatafolder,
  pause = 60,
  pattern = "S1\\]",
  extension = ".7z"
)

downloadcomextyearlyarchive(
  startyear,
  rawdatafolder,
  pattern = "S2\\]",
  extension = ".7z"
)

Arguments

comextfolder

name of the folder starting with "comext/"

comextfile

name of the file in the comext platform

rawdatafolder

folder where the raw data will be storred

pause

numeric pause time in seconds before downloading the file (usefull for multiple downloads)

logfile

path to a log file, located in the user directory by default

method,

see download.file

recentyears

numeric number of years, will only download most recent files for the given number of years. Use a large number (>20) to load all data the first time.

startyear

numeric download files from that year onwards

pattern

character pattern of the folder name, see scrapcomextfoldername

Details

Pause time was introduced because download returned an error status when downloadingmany files in a row. Pause time can be decided individually in the various functions that call downloadcomextfile, usually a few seconds to a few minutes should be enough.

Value

downloadcomextfile Returns the download status, see download.file

downloadcomextmonthlyrecent returns a data frame of file names and paths with their download status.

downloadcomextmonthlyrecent returns a data frame of file names and paths with their download status.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
## Not run: 
# This example will get outdated as time goes by
# Check the bulk download repository at
# http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&dir=comext
# for the current name of the comext most recent data folder
# Both should work,
# with trailing slash in the folder name
downloadcomextfile("comext/201706/data/", "nc201701.7z", "/tmp")
# without trailing slash in the folder name
downloadcomextfile("comext/201706/data", "nc201702.7z", "/tmp")
# Test error logging by downloading empty file in empty folder
downloadcomextfile("", "", "/tmp", logfile = "~/comextlog.txt")

## End(Not run)
## Not run: 
# Downloads all recent .7z data files into the /tmp folder
# and returns a dataframe with a status column
# describing the status of the download for each file
dtf <- downloadcomextmonthlyrecent("/tmp")
dtf$status
# Download all recent .txt description files
# describing products and reporting countries into the /tmp folder
dtf2 <- downloadcomextmonthlyrecent("/tmp", subfolder = "text/english", extension = ".txt")

## End(Not run)
## Not run: 
downloadcomextmetadata(rawdatafolder = "/tmp", pause = 0)

## End(Not run)
## Not run: 
# download monthly archive from 2000
downloadcomextmonthlyarchive(startyear = 2000, rawdatafolder = "/tmp")

## End(Not run)
## Not run: 
# download yearly archive from 2000
downloadcomextyearlyarchive(startyear = 2000, rawdatafolder = "/tmp")

## End(Not run)

stix-global/eutradeflows documentation built on Nov. 13, 2020, 9:23 p.m.