harvest: Harvest: Download and transfer Comext data to the database

Description Usage Arguments Details See Also Examples

View source: R/harvest.R

Description

harvestcomextdata downloads the most recent (based on the recentyears parameter) comext monthly data and transfer all sub products of the given product codes to the database. The raw comext database structure is recreated each time this function is called. The database table name ends with the name of the most recent comext folder.

harvest checks for updates in the Comext bulk download repository and downloads recent data if it's not yet present in the database. If recent data has been updated, also check for updates in the archive data and download accordingly.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
harvestcomextdata(
  RMariaDBcon,
  rawdatafolder,
  productcodestart,
  tabletemplate = "raw_comext_monthly_template",
  tablemonthly = "raw_comext_monthly",
  tableyearly = "raw_comext_yearly",
  recentyears = 4,
  template = getOption("tradeharvester")$template
)

harvestcomextmetadata(RMariaDBcon, rawdatafolder, pause = 0)

harvest(
  rawdatafolder,
  dbname,
  startyear,
  productcodestart = tradeharvester::products2harvest$productcode,
  tabletemplatemonthly = "raw_comext_monthly_template",
  template = getOption("tradeharvester")$template,
  logfile = paste0("/mnt/sdb/public/log/harvest", format(Sys.Date(), "%Y"), ".txt"),
  randomsleeptime = 3600,
  recentyears = 4
)

Arguments

RMariaDBcon

database connection object created by RMariaDB::dbConnect

rawdatafolder

character path to a folder where comext files will be downloaded

tabletemplate

character name of the table template giving the data structure

template

character part of the table name to be replaced by the comext folder name

logfile

character path to the main log file. The main log file is not to be confused with standard output and standard error of Rscript which can also be sent to a lof file, see more info in the details below.

randomsleeptime

numeric maximum number of seconds to wait before harvesting

productcodestarts

numeric vector of product codes to transfer to the database

tablename

character name of the database table where data will be storred

Details

The harvest() function extracts [year] and [month] from the raw_comext_monthly_[year][month], raw_comext_monthly_[year]S1 and raw_comext_yearly_[year]S2 tables to compare them with the names of the most recent comext folder, S1 and S2 folders. If the most recent comext data is not present in the database, this function will harvest it and then if the archive folders are not present, it will harvest them as well. To run harvest periodically as a cron job, edit crontab:

sudo vim /etc/crontab

and enter:

0 3 * * * debian Rscript -e "library(tradeharvester); harvest(rawdatafolder = '/mnt/sdb/public', dbname = 'tradeflows', startyear = 2000)" >> ~/log/harvest$(date +"\%Y\%m\%d").log 2>&1

As explained in https://serverfault.com/questions/117360/sending-cron-output-to-a-file-with-a-timestamp-in-its-name make sure to escape any % with \%.

To keep a detailed log of the harvesting process, this cron tab entry writes standard errors and standard output to a file. It was inspired by this StackoverFlow question: https://stackoverflow.com/questions/14008139/capturing-rscript-errors-in-an-output-file. You can follow the harvest in progress in that log file with tail -f harvestlogfilename.log. The main log file given as the function parameter logfile will only contain the date and folder name of major updates.

See Also

crontime, a function that tests if a cron job is working as expected.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
## Not run: 
# Create a database connection object to be supplied as a parameter RMariaDBcon
con <- RMariaDB::dbConnect(RMariaDB::MariaDB(), dbname = "test")
harvestrecent(RMariaDBcon = con, rawdatafolder = "/tmp", productcodestart = c(44,94))
harvestmonthlyarchive(RMariaDBcon = con, rawdatafolder = "/tmp", startyear = 2015, productcodestart = c(44,94))
harvestyearlyarchive(RMariaDBcon = con, rawdatafolder = "/tmp", startyear = 2015, productcodestart = c(44,94))
RMariaDB::dbDisconnect(con)

# Harvest creates its own database connection, dbname is passed as a parameter
harvest(rawdatafolder = "/tmp", dbname = "test", startyear = 2015, randomsleeptime = 0)
harvest(rawdatafolder = "/mnt/sdb/public", dbname = "tradeflows", startyear = 2015, randomsleeptime = 3)

## End(Not run)

stix-global/eutradeflows documentation built on Nov. 13, 2020, 9:23 p.m.