In EuropeanForestInstitute/tradeflows: Process forest products trade flow data from COMTRADE and COMEXT

getwd()

Installation on a server

More details on the server configuration in this google doc

Rmarkdown requires the latest pandoc version, explanation here

To include the latest required version of pandoc:

install pandoc from source haksell installation was rather bulky. and pandoc installation from source failed:

 setup: At least the following dependencies are missing:
 http-client >=0.3.2 && <0.4 && ==0.4.5
 cabal: Error: some packages failed to install:
 pandoc-1.13.1 failed during the configure step. The exception was:
 ExitFailure 1

Or install Rstudio server

md5sum rstudio-server-0.98.1091-amd64.deb
930eca2738ce335791df41c3f6302fae

Rstudio server requres the installation of a recent libssl version. md5sum should not be used anymore for security reason, authenticity should be checked with

sha256sum libssl0.9.8_0.9.8o-4squeeze14_amd64.deb

Couldn't find the key. So i loaded if from 2 locations and checked that they had the same sum. To manually stop, start, and restart the server you use the following commands:

$ sudo rstudio-server stop
$ sudo rstudio-server start
$ sudo rstudio-server restart

Comtrade issues

Bug request of the Comtrade API
submitted the jsonlite::fromJSON("http://comtrade.un.org/data/cache/classificationHS.json") Warning message: Unexpected Content-Type: application/x-javascript

Raw data

Inspired by the way hadley prepares this flight planes data. The package includes a training dataset: sawnwood bilateral trade data for European countries.

Location of report templates

location inspired by the rapport package. See also their function rapport.ls that lists templates. And their function rapport.read that reads templates from files or form package-bundled templates.

Tools

This is a package

Created based on instructions from Hadley. devtools::load_all() or Cmd + Shift + L, reloads all code in the package. Add packages to the list of required packages devtools::use_package("dplyr") devtools::use_package("ggplot2", "suggests") For data I followed his recommendations in r-pkgs/data.rmd devtools::use_data(mtcars) devtools::use_data_raw() # To create a data-raw/ folder and add it to .Rbuildignore

Tests

Use Ctrl+Shift+T to run the package tests in RStudio.

The test_check function documentation tells us that tests should be placed in tests/testthat.

Example of testing for the devtools package
R style guide of FAOSTAT recommends using tests
Example tests of the stringr package

Code coverage

The covr package can be used to measure code coverage. covr::package_coverage() Shows test coverage of scripts in the ./R directory. Visualise coverage in a shiny application:

x <- package_coverage()
shine(x)

Git

Git command to revert a file one revision back in the "develop" branch:

git checkout develop~1 R/clean.R
# Experiment something
# Then
# Come back to the latest revision
git checkout devlop R/clean.R

Use this to check that a test failed in the past for example. And that it doesn't fail anymore.

Data frame manipulation with dplyr

dplyr uses non standard evaluation. See vignette("nse") NSE is powered by the lazyeval package

# standard evaluation
sawnwood %>% select_(.dots = c("yr", "rtCode" )) %>% head
# is the same as
# lazy evaluation
sawnwood %>% select(yr, rtCode ) %>% head

Error catching with tryCatch

see demo(error.catching).
See also Hadley's article: beyond-exception-handling.

Documentation in long form

How to create package vignettes.

To create a vignette, use the command use_vignette(name)

You can build all vignettes from the console with devtools::build_vignettes()

RStudio’s “Build & reload” does not build vignettes to save time. Similarly, devtools::install_github() (and friends) will not build vignettes by default because they’re time consuming and may require additional packages. You can force building with devtools::install_github(build_vignettes = TRUE). This will also install all suggested packages.

Function documentation using roxygen2

Export documentation in a pdf document at the command line in the man folder run

R CMD Rd2pdf *

You should be able to see the documentation of exported functions by placing a question mark before the function name at the R command prompt.

inspired by the documentation of roxygenize https://github.com/yihui/roxygen2/blob/master/R/roxygenize.R vignette("namespace", package = "roxygen2") says:

If you are using just a few functions from another package, the recommended option is to note the package name in the Imports: field of the DESCRIPTION file and call the function(s) explicitly using ::, e.g., pkg::fun(). Alternatively, though no longer recommended due to its poorer readability, use @importFrom, e.g., @importFrom pgk fun, and call the function(s) without ::. If you are using many functions from another package, use @import package to import them all and make available without using ::.

But Hadley says:

Alternatively, if you are repeatedly using many functions from another package, you can import them in one command with @import package. This is the least recommended solution: it makes your code harder to read (because you can’t tell where a function is coming from), and if you @import many packages, the chance of a conflicting function names increases.

calling packages might have to be changed to follow Hadley's recommendations on how package namespaces: http://r-pkgs.had.co.nz/namespace.html see also vignette("namespace", package = "roxygen2") require(RJSONIO) require(dplyr)

Version tracking system with git

The .git repository is backed on bitbucket. Use devtools::install_bitbucket() to install the package.

Shiny

A demonstration with time series plot and bar chart will be made with shiny and the ggplot2 package, based on the diamond example using.

Screen server tool

Use screen to keep a long process running on a server after you close the ssh session. I started a screen session with:

    screen -S sessionname

In order to find the screen session later you might want to rename it using sessionname. Or on the first screen invocation use the s flag -S sessionname

I started the R software in this screen session, started a long running process. Then detached the session with:

    CTRL-A-D

I could re-attach the session later with:

    screen -r sessionname

If the session was not detached properly, it might be necessary to detach it and re attach it:

   screen -d -r sessionname

Notes to EFI developpers

Notification of version changes

I will try to change the package's version number each time I commit a change that impacts the cleaning procedure. I will also try to tag those versions in git.

Code refactoring

It would be nice to clarify the interface: What R functions are used by the PHP code and bash scripts? This would enable code refactoring. For example the parameter called outputdir is not consistent with inputpath. It would be preferable to call tehm both "dir" or "path". Outputdir is named after the rmarkdown::render() parameter output_dir. What is inputpath named after?

Installation and configuration

See the vignette/installation.Rmd on installation and configuration steps.

Which directories I want to read at https://bitbucket.org/paul4forest/tradeflows/? You want to look at files in the R folders.

database.R is doing the database interaction
clean.R is cleaning the data
inst contains folders which will be installed in the package folder

The configuration table columnnames located in config/column_names.csv now contains 2 column specifying which columns names are used in the trade flows database: "raw_flow" and "validated_flow"

Database configuration

Database configuration file and column names are located under: a location available from shell command prompt, run:

Rscript  -e 'library(tradeflows)' -e 'system.file("config", package="tradeflows")'

Loading data

This is managed by a PHP program. The data to load is contained in this instruction

itto <- classificationitto %>% filter(productcodecomtrade > 10000 & nomenclature =="HS12") %>% select(product, productcodeitto, productcodecomtrade)
write.csv(itto, file="data-raw/ittoproducts.csv", row.names = FALSE)

Cleaning data

The function cleandb() will feed data into the database table(s) validated_flow updates will be done on a product basis, at the 6 digit level. The cleaning script will:

Delete all flows for a product (between all reporter and partner countries in all years),
Enter All validated flows for that product.

The main clean instruction can also run from a system shell directly

Rscript -e 'library(tradeflows)' -e 'cleandbproduct(440799, tableread =  "raw_flow_yearly", tablewrite = "validated_flow_yearly")

Creating reports

createreportfromdb(productcode = , template = "", )

It is not possible to generate the discrepancy plot which I illustrated in a PDF report

There are 6373 distinct bilateral trade flows in the 440799 yearly dataset. Some flows occur only inone year, others are repeated every year. Six thousand plots cannotbe easily represented in one report. This requires an interface.

Ongoing work

[Oct 2014] Use a client side table sorter, such as the JQuery tablesorter The plugin requires the thead and tbody tags, which are generated by Rmarkdown. Calling the javascript files should be possible within a YAML document, see HTML document format.
- I linked to the necessary javascript in docs/www/include/in_header.html
- It basically works, CSS styles need to be added so that we can see little sorting arrows. See sample table in docs/www/index.html

Work for programmers of the production system

Automate data load from comtrade
- Use loadcomtrade_bycode() as a function
- Replace loadcomtrade_bycode() by a software that loads json files from comtrade, and places the content in a raw comtrade table in a SQL database
Automate clean mechanism
- Call R functions in clean.R from the server to clean the raw data from the database. Put cleaned data in the database.

Bugs

20141208 A bug in Lyx prevents me from generating pdfs when the text contains a euro € sign.

Eurostat Comext

Load monthly data Load yearly data Rename columns Copy into a database

TODO by order of ease / importance

Add all reporters and partners to the test dataset
Change the docs/development folder into vignettes.
use an environmennt variable for the yearbegin and yearend of the function that chooses between reporter and partner volume
Create a trac system with externam accesss to track progress.
in checkdbcolumns() Use sqlquery <- "SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA='tradeflows' AND TABLE_NAME='validated_flow_yearly';" instead of loading a data frame to check the column names. Because loading a data frame doesn't work when the table is empty.
create a function to display product codes available or a function that returns an uncollected tbl dplyr object on which to run arbitrary dplyr statements. --> This might not actually be needed.
calculate regional prices as a ponderation of import and export prices as done in Chasamil2000 --> Actually regional import and export price should be different
discrepancy report in the server function, add a parameter to the loadcomtrade_bycode function to render this optional log validataion status of jsonfiles with fileConn<-file("output.txt") writeLines(c("Hello","World"), fileConn) close(fileConn) Change the docs/development folder into vignettes.
Calculate median prices by region patner and see how using partner prices for conversion impacts the world trade flows
Floating table of content for html reports with a custom csss http://rpubs.com/stevepowell99/floating-css
Use aesthetic to make points more transparent on a coutry * contry grid display trade volume as alpha level. see also docs/development/ggplot2
use package options, inspired by the devtools or knitr package "Devtools uses the following options to configure behaviour:..."
Time plot of HS by quantity, weight or value data (ggplot gant chart) to visualise missing data
load from EUROSTAT comext at 10 digit level
Javascript visualisation: Add googleVis and Rcharts to Markdown documents

Done

20151103 Methodology report add a paragraph on the different types of predefined automated reports that can be generated, with indication of the parameters that can be set. Four different report types: - completeness report - discrepancy report - overview report- trade network analysis 3: Add abstract to country overview report 20151023 commit a58d6e4fb Overview report should be based on the validated data and include quantity besides trade values 20151009 Section titles in the overview report should be those JFSQ-1 names Generate overview report plots accroding to JFSQ product codes. 20151009 Overview report list the 10 largest exporters and 10 largest importers in all plots. 20150904 Include partner data into the quantity estimation for those which have missing partner data. in commit c3d92e77e33008eef2eef64fb465c77d0829bb73 git diff fd724fa080cc c3d92e77e330 # View changes introduced See the function addmissingmirrorflow()

Project issues + requests

SQLite database connection using RSQLite Convert MySQL database to SQLite, using a script such as https://gist.github.com/esperlu/943776#file-mysql2sqlite-sh See also http://stackoverflow.com/questions/3890518/convert-mysql-to-sqlite or simply use R to move content from one base to the other. This might leave a sub-optimal SQL structure though.
place the methodology report datasets and plots in an R script
that can be called from the package
Product report on the number of countries under each flag.
Export prices and conversion factors to Excel files [Next version] Write conversion factors, prices and choice description to the database during the cleaning process
Change loadcomtradeallreporters() output format from RDATA to RDS Object serialisation makes sense when there is only one object.
Export documentation to a github.io site, in the Group functions in sections in packagedocs/rd_index.yaml re-create packagedocs if needed with the command: packagedocs_init("."; "packagedocs")
Make a sankey diagram of a product or a range of products to illustrate all outgoing flows from one country or all incoming flows into one country. or more ambitious, illustrate all outgoing and icoming flows for a given product See multilevel sankeys here: https://developers.google.com/chart/interactive/docs/gallery/sankey#multilevel-sankeys Other Tutorial on Sankeys with R: http://timelyportfolio.github.io/rCharts_d3_sankey/example_build_network_sankey.html http://rcharts.io/viewer/?6001601#.VrHVmZ5XbVR change Sankey at different point in time to illustrate changes http://rcharts.io/gallery/#visualizationType=sankey Use Sankey diagrams to illustrate the difference between raw and validated data by using for example a color by flag or by drawing a sankey for raw data and another Sankey diagram for validated data. Use Sankey diagrams to illustrate discrepancies between reported imports and exports.
add global variables for variables used in NSE dplyr and ggplot code see Hadley's change of mind on the "no visible binding for global variable" issue when running R CMD Check. https://stackoverflow.com/posts/12429344/revisions
Make database connectors to SQLite.
introducing a $/kg price will mean that it has to be in agreement with the $/m3 price and the kg/m3 conversion factor.
Document function arguments to deal with R CMD check warnings of the type: "Undocumented arguments in documentation object " see those warnings on Travis: https://s3.amazonaws.com/archive.travis-ci.org/jobs/109356599/log.txt

EuropeanForestInstitute/tradeflows documentation built on Oct. 7, 2019, 10:57 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

EuropeanForestInstitute/tradeflows
Process forest products trade flow data from COMTRADE and COMEXT

In EuropeanForestInstitute/tradeflows: Process forest products trade flow data from COMTRADE and COMEXT

Installation on a server

Comtrade issues

Raw data

Location of report templates

Tools

This is a package

Tests

Code coverage

Git

Data frame manipulation with dplyr

Error catching with tryCatch

Documentation in long form

Function documentation using roxygen2

Version tracking system with git

Shiny

Screen server tool

Notes to EFI developpers

Notification of version changes

Code refactoring

Installation and configuration

Database configuration

Loading data

Cleaning data

Creating reports

It is not possible to generate the discrepancy plot which I illustrated in a PDF report

Ongoing work

Work for programmers of the production system

Bugs

Eurostat Comext

TODO by order of ease / importance

Done

Project issues + requests

R Package Documentation

Browse R Packages

We want your feedback!

EuropeanForestInstitute/tradeflows Process forest products trade flow data from COMTRADE and COMEXT

In EuropeanForestInstitute/tradeflows: Process forest products trade flow data from COMTRADE and COMEXT

Installation on a server

Comtrade issues

Raw data

Location of report templates

Tools

This is a package

Tests

Code coverage

Git

Data frame manipulation with dplyr

Error catching with tryCatch

Documentation in long form

Function documentation using roxygen2

Version tracking system with git

Shiny

Screen server tool

Notes to EFI developpers

Notification of version changes

Code refactoring

Installation and configuration

Database configuration

Loading data

Cleaning data

Creating reports

It is not possible to generate the discrepancy plot which I illustrated in a PDF report

Ongoing work

Work for programmers of the production system

Bugs

Eurostat Comext

TODO by order of ease / importance

Done

Project issues + requests

R Package Documentation

Browse R Packages

We want your feedback!

EuropeanForestInstitute/tradeflows
Process forest products trade flow data from COMTRADE and COMEXT