
Installation on a server

More details on the server configuration in this google doc

Rmarkdown requires the latest pandoc version, explanation here

To include the latest required version of pandoc:

 setup: At least the following dependencies are missing:
 http-client >=0.3.2 && <0.4 && ==0.4.5
 cabal: Error: some packages failed to install:
 pandoc-1.13.1 failed during the configure step. The exception was:
 ExitFailure 1
md5sum rstudio-server-0.98.1091-amd64.deb

Rstudio server requres the installation of a recent libssl version. md5sum should not be used anymore for security reason, authenticity should be checked with

sha256sum libssl0.9.8_0.9.8o-4squeeze14_amd64.deb 

Couldn't find the key. So i loaded if from 2 locations and checked that they had the same sum. To manually stop, start, and restart the server you use the following commands:

$ sudo rstudio-server stop
$ sudo rstudio-server start
$ sudo rstudio-server restart

Comtrade issues

Raw data

Inspired by the way hadley prepares this flight planes data. The package includes a training dataset: sawnwood bilateral trade data for European countries.

Location of report templates

location inspired by the rapport package. See also their function that lists templates. And their function that reads templates from files or form package-bundled templates.


This is a package

Created based on instructions from Hadley. devtools::load_all() or Cmd + Shift + L, reloads all code in the package. Add packages to the list of required packages devtools::use_package("dplyr") devtools::use_package("ggplot2", "suggests") For data I followed his recommendations in r-pkgs/data.rmd devtools::use_data(mtcars) devtools::use_data_raw() # To create a data-raw/ folder and add it to .Rbuildignore


Use Ctrl+Shift+T to run the package tests in RStudio.

The test_check function documentation tells us that tests should be placed in tests/testthat.

Code coverage

The covr package can be used to measure code coverage. covr::package_coverage() Shows test coverage of scripts in the ./R directory. Visualise coverage in a shiny application:

x <- package_coverage()


Git command to revert a file one revision back in the "develop" branch:

git checkout develop~1 R/clean.R
# Experiment something
# Then
# Come back to the latest revision
git checkout devlop R/clean.R

Use this to check that a test failed in the past for example. And that it doesn't fail anymore.

Data frame manipulation with dplyr

dplyr uses non standard evaluation. See vignette("nse") NSE is powered by the lazyeval package

# standard evaluation
sawnwood %>% select_(.dots = c("yr", "rtCode" )) %>% head
# is the same as
# lazy evaluation
sawnwood %>% select(yr, rtCode ) %>% head

Error catching with tryCatch

Documentation in long form

How to create package vignettes.

To create a vignette, use the command use_vignette(name)

You can build all vignettes from the console with devtools::build_vignettes()

RStudio’s “Build & reload” does not build vignettes to save time. Similarly, devtools::install_github() (and friends) will not build vignettes by default because they’re time consuming and may require additional packages. You can force building with devtools::install_github(build_vignettes = TRUE). This will also install all suggested packages.

Function documentation using roxygen2

Export documentation in a pdf document at the command line in the man folder run

R CMD Rd2pdf *

You should be able to see the documentation of exported functions by placing a question mark before the function name at the R command prompt.

inspired by the documentation of roxygenize vignette("namespace", package = "roxygen2") says:

If you are using just a few functions from another package, the recommended option is to note the package name in the Imports: field of the DESCRIPTION file and call the function(s) explicitly using ::, e.g., pkg::fun(). Alternatively, though no longer recommended due to its poorer readability, use @importFrom, e.g., @importFrom pgk fun, and call the function(s) without ::. If you are using many functions from another package, use @import package to import them all and make available without using ::.

But Hadley says:

Alternatively, if you are repeatedly using many functions from another package, you can import them in one command with @import package. This is the least recommended solution: it makes your code harder to read (because you can’t tell where a function is coming from), and if you @import many packages, the chance of a conflicting function names increases.

calling packages might have to be changed to follow Hadley's recommendations on how package namespaces: see also vignette("namespace", package = "roxygen2") require(RJSONIO) require(dplyr)

Version tracking system with git

The .git repository is backed on bitbucket. Use devtools::install_bitbucket() to install the package.


A demonstration with time series plot and bar chart will be made with shiny and the ggplot2 package, based on the diamond example using.

Screen server tool

Use screen to keep a long process running on a server after you close the ssh session. I started a screen session with:

    screen -S sessionname

In order to find the screen session later you might want to rename it using sessionname. Or on the first screen invocation use the s flag -S sessionname

I started the R software in this screen session, started a long running process. Then detached the session with:


I could re-attach the session later with:

    screen -r sessionname

If the session was not detached properly, it might be necessary to detach it and re attach it:

   screen -d -r sessionname

Notes to EFI developpers

Notification of version changes

I will try to change the package's version number each time I commit a change that impacts the cleaning procedure. I will also try to tag those versions in git.

Code refactoring

It would be nice to clarify the interface: What R functions are used by the PHP code and bash scripts? This would enable code refactoring. For example the parameter called outputdir is not consistent with inputpath. It would be preferable to call tehm both "dir" or "path". Outputdir is named after the rmarkdown::render() parameter output_dir. What is inputpath named after?

Installation and configuration

See the vignette/installation.Rmd on installation and configuration steps.

Which directories I want to read at You want to look at files in the R folders.

The configuration table columnnames located in config/column_names.csv now contains 2 column specifying which columns names are used in the trade flows database: "raw_flow" and "validated_flow"

Database configuration

Database configuration file and column names are located under: a location available from shell command prompt, run:

Rscript  -e 'library(tradeflows)' -e 'system.file("config", package="tradeflows")'

Loading data

This is managed by a PHP program. The data to load is contained in this instruction

itto <- classificationitto %>% filter(productcodecomtrade > 10000 & nomenclature =="HS12") %>% select(product, productcodeitto, productcodecomtrade)
write.csv(itto, file="data-raw/ittoproducts.csv", row.names = FALSE)

Cleaning data

The function cleandb() will feed data into the database table(s) validated_flow updates will be done on a product basis, at the 6 digit level. The cleaning script will:

  1. Delete all flows for a product (between all reporter and partner countries in all years),
  2. Enter All validated flows for that product.

The main clean instruction can also run from a system shell directly

Rscript -e 'library(tradeflows)' -e 'cleandbproduct(440799, tableread =  "raw_flow_yearly", tablewrite = "validated_flow_yearly")

Creating reports

createreportfromdb(productcode = , template = "", )

It is not possible to generate the discrepancy plot which I illustrated in a PDF report

There are 6373 distinct bilateral trade flows in the 440799 yearly dataset. Some flows occur only inone year, others are repeated every year. Six thousand plots cannotbe easily represented in one report. This requires an interface.

Ongoing work

Work for programmers of the production system


20141208 A bug in Lyx prevents me from generating pdfs when the text contains a euro € sign.

Eurostat Comext

Load monthly data Load yearly data Rename columns Copy into a database

TODO by order of ease / importance


20151103 Methodology report add a paragraph on the different types of predefined automated reports that can be generated, with indication of the parameters that can be set. Four different report types: - completeness report - discrepancy report - overview report- trade network analysis 3: Add abstract to country overview report 20151023 commit a58d6e4fb Overview report should be based on the validated data and include quantity besides trade values 20151009 Section titles in the overview report should be those JFSQ-1 names Generate overview report plots accroding to JFSQ product codes. 20151009 Overview report list the 10 largest exporters and 10 largest importers in all plots. 20150904 Include partner data into the quantity estimation for those which have missing partner data. in commit c3d92e77e33008eef2eef64fb465c77d0829bb73 git diff fd724fa080cc c3d92e77e330 # View changes introduced See the function addmissingmirrorflow()

Project issues + requests

