library(tradeflows)
library(knitr)
opts_knit$set(root.dir="../..") # file paths are relative to the root of the project directory
library(dplyr)

Convert from MySQL to SQLite

conversion scirpt found on a Stackoverflow answer. See also below for another version of that script.

Conversion script first version

The script connects to MySQL, creates the dump and then convers to SQLite.

cp ~/downloads/mysql2sqlite.sh /tmp/
cd /tmp
chmod +x mysql2sqlite.sh # Make it executable
# Old version by Esperlu
./mysql2sqlite.sh -u R -p tradeflows | sqlite3 tradeflows.sqlite

The raw database was copied but there was an issue with the validated database: Error: near line 5217030: no such table: validated_flow_yearly.

Conversion script second version

Another version of the conversion script is maintained at https://github.com/dumblob/mysql2sqlite It converts from a MySQL dump into a SQLite dump.

# Same as above for download and permission
# Command is different
cd /tmp
chmod +x mysql2sqlite.sh
./mysql2sqlite.sh raw_ | sqlite3 tradeflows.sqlite

Importing from CSV into SQLite

How to output MySQl query result in CSV format ?

MySQL command to export CSV

select * from raw_flow_yearly into outfile '/tmp/raw_flow_yearly2.csv' fields terminated by ',' enclosed by '"' lines terminated by '\n';

Copy database structure. SQLite datatypes are called affinity. Since the csv file doesn't contain information about the datatypes, I decided to copy the structure from MYSQL into SQLite and use that structure.

SQLite command to import CSV

Browse the newly created SQLite database

# Bash command to start sqlite
sqlite3 tradeflows.sqlite3
# Commands within sqlite
.tables
.schema raw_flow_yearly
select * from raw_flow_yearly limit 5;

Using dplyr to read from SQLite

tf_sqlite <- src_sqlite("/tmp/tradeflows.sqlite3")
tf <- tbl(tf_sqlite, "raw_flow_yearly")

system.time(repprod <- tf %>% select(reporter, productcode, period) %>%
    distinct() %>% collect())
#  user  system elapsed 
#   3.908   0.660   4.566 

Compare with the same operation on MySQL

tf2 <- readdbtbl("raw_flow_yearly")
system.time(repprod2 <- tf2  %>% 
                select(reporter, productcode, period) %>%
                distinct() %>% collect())
#  user  system elapsed 
#   0.072   0.004  13.408 

# Compare the output
repprod <- repprod %>% arrange(reporter, productcode, period)
repprod2 <- repprod2 %>% arrange(reporter, productcode, period)
summary(repprod$productcode - repprod2$productcode)

Using dplyr to write into SQLite

tfdb <- readdbtbl("validated_flow_yearly")
swd99 <- readdbproduct(440799, "validated_flow_yearly")
tfdblite <- src_sqlite("/tmp/tradeflows.sqlite3", create = T)


tf_lite <- copy_to(tfdblite, swd99, temporary = FALSE, 
                   indexes = list("year", "flowcode","reportercode","partnercode","productcode"))


EuropeanForestInstitute/tradeflows documentation built on Oct. 7, 2019, 10:57 a.m.