library(knitr)
opts_knit$set(root.dir="../..") # file paths are relative to the root of the project directory
opts_knit$set(eval = FALSE)
library(tradeflows)
library(dplyr)
library(RMySQL)

mysqldump

Load a dump

The following commands can be used to load mysql dump files: ```{bash eval=FALSE} cat raw_flow_yearly.sql | mysql -u paul -p tradeflows cat validated_flow_yearly.sql | mysql -u paul -p tradeflows

# SELECT using dplyr
* Description and examples in the `?src_mysql` documentation
    To connect to a database :
    ```
    # First create a src
    my_db <- src_mysql(host = "blah.com", user = "hadley",
        password = "pass")
    # Then reference a tbl within that src
    my_tbl <- tbl(my_db, "my_table")
    ```
* [dplyr vignette on databases](http://cran.r-project.org/web/packages/dplyr/vignettes/databases.html) explains the verbs that can be used to communicate with the database: `select`, `filter`, `arrange`, `mutate`, `summarise`.
* [Stackoverflow question on insert using dplyr](http://stackoverflow.com/questions/26568182/is-it-possible-to-insert-add-a-row-to-a-sqlite-db-table-using-dplyr-package) says it is possible to add a row to a database using dplyr. I added an answer mentionning the point below.
* [a newsgroup](https://groups.google.com/forum/#!topic/manipulatr/IsRVJWoMPe4). 
   explained the purpose of the function `dplyr::copy_to`.
   It is intended to create temporary test tables.
   The email exchange ends by suggesting to to use `RMariaDB::dbWriteTable()` 
   to append data to an existing table.


### Check the presence of database columns
```r
# ai'nt working baby

checkdbcolumns(c("validated_flow_yearly"))
# Connect to db
setdatabaseconfig(silent = FALSE)
db <- getOption("tradeflowsDB")
DBwrite <- dbConnect(MySQL(), user=db["user"], host=db["host"],
                     password=db["password"], dbname=db["dbname"])
tableread <- "validated_flow_yearly"
sqlquery <- paste(c("SELECT `COLUMN_NAME`", 
                    "FROM `INFORMATION_SCHEMA`.`COLUMNS` ",
                    "WHERE `TABLE_SCHEMA`='tradeflows' ",
                    "AND `TABLE_NAME`='",tableread,"';"), collapse = "")
res <- dbSendQuery(DBwrite, sqlquery)
columnname <- dbFetch(res)
columnname <- columnname$COLUMN_NAME
# See ?dbSendQuery

# delete colums
deletedbproduct(440799, "validated_flow_yearly")
sqlquery <- "SELECT `COLUMN_NAME` 
FROM `INFORMATION_SCHEMA`.`COLUMNS` 
WHERE `TABLE_SCHEMA`='tradeflows' 
AND `TABLE_NAME`='validated_flow_yearly';"

Query the database using tbl, verbs and collect

DB <- src_mysql(user="R", host="localhost",
                       password="localhost", dbname="tradeflows")
# Create `tbl`
rawdata <- tbl(DB, sql("SELECT * FROM raw_flow_yearly"))

# number of rows
rawdata %>% summarise(nrow = n()) %>%
    collect  # forces computation and brings data back into a data.frame 

# Get data for one reporter
# number of flows by product for Germany in 2011
germany <- rawdata %>% 
    filter(reporter=="Germany" & year==2011) 
germany %>% group_by(productcode, flow) %>%
    summarise(n()) %>% collect %>% kable

# Play with verbs
germimport <- rawdata %>% 
    filter(reporter=="Germany" & flow=="Import" ) %>%
    select(year, flow, reporter, partner, tradevalue) %>%
    collect

db_insert_into, for test purposes only

# Create table
copy_to(DB, swdhead, "enddata2", temporary = FALSE)
# Insert into the new table
db_insert_into(DB = tdb$con, table = "enddata2", fields = swdhead) 

Look at ?db_insert_into for a list of database actions.

If you find yourself overriding many of these functions
it may suggest that you should just override copy_to instead.

Run SQL statements on the database

Number of unique product codes and of unique countries ```{sql eval=FALSE} SELECT count(distinct(productcode)) FROM tradeflows.raw_flow_yearly; / 127 / SELECT count(distinct(reporter)) FROM tradeflows.raw_flow_yearly; / 190 / SELECT count(distinct(partner)) FROM tradeflows.raw_flow_yearly; / 248 /

## SELECT, INSERT and more using RMySQL, 
Source of the package : [RMySQL](https://github.com/jeffreyhorner/RMySQL)

Following developments inspired by :

* [a finnish blog](http://gettinggeneticsdone.blogspot.fi/2011/12/query-mysql-database-from-r-using.html).
* [a question on stackoverflow](http://stackoverflow.com/questions/5431961/connect-to-mysql-database-with-rmysql)


### SELECT or dbReadTable()
```r
tdb2 <- dbConnect(MySQL(), user="tradeflows", host="localhost",
                       password="tradeflows", dbname="tradeflows")
dbListTables(tdb2)
querytradedb <-  function(...) dbGetQuery(tdb2, ...)
rawdatahead <- querytradedb("SELECT * FROM raw_flow LIMIT 2;")
kable(rawdatahead)
rawdata <- dbReadTable(tdb2, "raw_flow")

CREATE a table

querytradedb("CREATE TABLE enddata LIKE tradeflows.raw_trade_data;")
dbListTables(tdb2)

INSERT or dbWriteTable()

Using queries is not convenient with large dataframes

#dbGetQuery(tb2, "INSERT into SELECT * FROM raw_trade_data LIMIT 10;")
nrowinend <- function(){
    dbGetQuery(tdb2, "SELECT COUNT(*) FROM enddata;")
}
nrowinend()
dbWriteTable(tdb2, "enddata", swdhead, append=TRUE, row.names = FALSE)
nrowinend()
# Same instruction a seccond time
dbWriteTable(tdb2, "enddata", swdhead, append=TRUE, row.names = FALSE)
nrowinend()
# Content is not added twice
# If you remove the id
swdhead$id <- NULL
dbWriteTable(tdb2, "enddata", swdhead, append=TRUE, row.names = FALSE)
nrowinend() # content is added again
dbGetQuery(tdb2, "SELECT COUNT(*) FROM raw_trade_data;")
# content is added again

Trade Flows Database structure

Change from the raw_flow table to the validated_flow table.

ALTER TABLE `tradeflows`.`validated_flow` ADD COLUMN `quantitypartner` BIGINT NULL  AFTER `quantityreporter` , CHANGE COLUMN `quantity2` `quantityreporter` BIGINT(20) NULL DEFAULT NULL    ;

Use functions from database.R

Load data

charcoal90 <- loadrawdata(440290)

Write data


Data types

Data types are infered from the function dbDataType, for example

con <- RMariaDB::dbConnect(RMariaDB::MariaDB(), dbname = "test")
RMariaDB::dbDataType(con, 1)
RMariaDB::dbDataType(con, 1L)
RMariaDB::dbDataType(con, Sys.Date())
RMariaDB::dbDisconnect(con)


EuropeanForestInstitute/tradeflows documentation built on Oct. 7, 2019, 10:57 a.m.