Description Usage Arguments Details See Also Examples
harvestcomextdata
downloads the most recent (based on the recentyears parameter) comext monthly data and transfer all sub products
of the given product codes to the database.
The raw comext database structure is recreated each time this function
is called.
The database table name ends with the name of the most recent comext folder.
harvest
checks for updates in the Comext bulk download
repository and downloads recent data if it's not yet present in the database.
If recent data has been updated, also check for updates in the archive data and
download accordingly.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | harvestcomextdata(
RMariaDBcon,
rawdatafolder,
productcodestart,
tabletemplate = "raw_comext_monthly_template",
tablemonthly = "raw_comext_monthly",
tableyearly = "raw_comext_yearly",
recentyears = 4,
template = getOption("tradeharvester")$template
)
harvestcomextmetadata(RMariaDBcon, rawdatafolder, pause = 0)
harvest(
rawdatafolder,
dbname,
startyear,
productcodestart = tradeharvester::products2harvest$productcode,
tabletemplatemonthly = "raw_comext_monthly_template",
template = getOption("tradeharvester")$template,
logfile = paste0("/mnt/sdb/public/log/harvest", format(Sys.Date(), "%Y"), ".txt"),
randomsleeptime = 3600,
recentyears = 4
)
|
RMariaDBcon |
database connection object created by RMariaDB::dbConnect |
rawdatafolder |
character path to a folder where comext files will be downloaded |
tabletemplate |
character name of the table template giving the data structure |
template |
character part of the table name to be replaced by the comext folder name |
logfile |
character path to the main log file. The main log file is not to be confused with standard output and standard error of Rscript which can also be sent to a lof file, see more info in the details below. |
randomsleeptime |
numeric maximum number of seconds to wait before harvesting |
productcodestarts |
numeric vector of product codes to transfer to the database |
tablename |
character name of the database table where data will be storred |
The harvest()
function extracts [year] and [month] from
the raw_comext_monthly_[year][month],
raw_comext_monthly_[year]S1 and raw_comext_yearly_[year]S2 tables to
compare them with the names of the most recent comext folder,
S1 and S2 folders.
If the most recent comext data is not present in the database,
this function will harvest it and then if the archive folders are not present,
it will harvest them as well.
To run harvest
periodically as a cron job, edit crontab:
sudo vim /etc/crontab
and enter:
0 3 * * * debian Rscript -e "library(tradeharvester); harvest(rawdatafolder = '/mnt/sdb/public', dbname = 'tradeflows', startyear = 2000)" >> ~/log/harvest$(date +"\%Y\%m\%d").log 2>&1
As explained in https://serverfault.com/questions/117360/sending-cron-output-to-a-file-with-a-timestamp-in-its-name make sure to escape any % with \%.
To keep a detailed log of the harvesting process,
this cron tab entry writes standard errors and standard output to a file.
It was inspired by this StackoverFlow question:
https://stackoverflow.com/questions/14008139/capturing-rscript-errors-in-an-output-file.
You can follow the harvest in progress in that log file with tail -f harvestlogfilename.log
.
The main log file given as the function parameter logfile
will only contain the date
and folder name of major updates.
crontime
, a function that tests if a cron job is working as expected.
1 2 3 4 5 6 7 8 9 10 11 12 13 | ## Not run:
# Create a database connection object to be supplied as a parameter RMariaDBcon
con <- RMariaDB::dbConnect(RMariaDB::MariaDB(), dbname = "test")
harvestrecent(RMariaDBcon = con, rawdatafolder = "/tmp", productcodestart = c(44,94))
harvestmonthlyarchive(RMariaDBcon = con, rawdatafolder = "/tmp", startyear = 2015, productcodestart = c(44,94))
harvestyearlyarchive(RMariaDBcon = con, rawdatafolder = "/tmp", startyear = 2015, productcodestart = c(44,94))
RMariaDB::dbDisconnect(con)
# Harvest creates its own database connection, dbname is passed as a parameter
harvest(rawdatafolder = "/tmp", dbname = "test", startyear = 2015, randomsleeptime = 0)
harvest(rawdatafolder = "/mnt/sdb/public", dbname = "tradeflows", startyear = 2015, randomsleeptime = 3)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.