library(knitr) opts_knit$set(root.dir="../..") # file paths are relative to the root of the project directory opts_knit$set(eval = FALSE) library(tradeflows) library(dplyr) library(RMySQL)
The following commands can be used to load mysql dump files: ```{bash eval=FALSE} cat raw_flow_yearly.sql | mysql -u paul -p tradeflows cat validated_flow_yearly.sql | mysql -u paul -p tradeflows
# SELECT using dplyr * Description and examples in the `?src_mysql` documentation To connect to a database : ``` # First create a src my_db <- src_mysql(host = "blah.com", user = "hadley", password = "pass") # Then reference a tbl within that src my_tbl <- tbl(my_db, "my_table") ``` * [dplyr vignette on databases](http://cran.r-project.org/web/packages/dplyr/vignettes/databases.html) explains the verbs that can be used to communicate with the database: `select`, `filter`, `arrange`, `mutate`, `summarise`. * [Stackoverflow question on insert using dplyr](http://stackoverflow.com/questions/26568182/is-it-possible-to-insert-add-a-row-to-a-sqlite-db-table-using-dplyr-package) says it is possible to add a row to a database using dplyr. I added an answer mentionning the point below. * [a newsgroup](https://groups.google.com/forum/#!topic/manipulatr/IsRVJWoMPe4). explained the purpose of the function `dplyr::copy_to`. It is intended to create temporary test tables. The email exchange ends by suggesting to to use `RMariaDB::dbWriteTable()` to append data to an existing table. ### Check the presence of database columns ```r # ai'nt working baby checkdbcolumns(c("validated_flow_yearly")) # Connect to db setdatabaseconfig(silent = FALSE) db <- getOption("tradeflowsDB") DBwrite <- dbConnect(MySQL(), user=db["user"], host=db["host"], password=db["password"], dbname=db["dbname"]) tableread <- "validated_flow_yearly" sqlquery <- paste(c("SELECT `COLUMN_NAME`", "FROM `INFORMATION_SCHEMA`.`COLUMNS` ", "WHERE `TABLE_SCHEMA`='tradeflows' ", "AND `TABLE_NAME`='",tableread,"';"), collapse = "") res <- dbSendQuery(DBwrite, sqlquery) columnname <- dbFetch(res) columnname <- columnname$COLUMN_NAME # See ?dbSendQuery # delete colums deletedbproduct(440799, "validated_flow_yearly") sqlquery <- "SELECT `COLUMN_NAME` FROM `INFORMATION_SCHEMA`.`COLUMNS` WHERE `TABLE_SCHEMA`='tradeflows' AND `TABLE_NAME`='validated_flow_yearly';"
DB <- src_mysql(user="R", host="localhost", password="localhost", dbname="tradeflows") # Create `tbl` rawdata <- tbl(DB, sql("SELECT * FROM raw_flow_yearly")) # number of rows rawdata %>% summarise(nrow = n()) %>% collect # forces computation and brings data back into a data.frame # Get data for one reporter # number of flows by product for Germany in 2011 germany <- rawdata %>% filter(reporter=="Germany" & year==2011) germany %>% group_by(productcode, flow) %>% summarise(n()) %>% collect %>% kable # Play with verbs germimport <- rawdata %>% filter(reporter=="Germany" & flow=="Import" ) %>% select(year, flow, reporter, partner, tradevalue) %>% collect
# Create table copy_to(DB, swdhead, "enddata2", temporary = FALSE) # Insert into the new table db_insert_into(DB = tdb$con, table = "enddata2", fields = swdhead)
Look at ?db_insert_into
for a list of database actions.
If you find yourself overriding many of these functions it may suggest that you should just override copy_to instead.
Number of unique product codes and of unique countries ```{sql eval=FALSE} SELECT count(distinct(productcode)) FROM tradeflows.raw_flow_yearly; / 127 / SELECT count(distinct(reporter)) FROM tradeflows.raw_flow_yearly; / 190 / SELECT count(distinct(partner)) FROM tradeflows.raw_flow_yearly; / 248 /
## SELECT, INSERT and more using RMySQL, Source of the package : [RMySQL](https://github.com/jeffreyhorner/RMySQL) Following developments inspired by : * [a finnish blog](http://gettinggeneticsdone.blogspot.fi/2011/12/query-mysql-database-from-r-using.html). * [a question on stackoverflow](http://stackoverflow.com/questions/5431961/connect-to-mysql-database-with-rmysql) ### SELECT or dbReadTable() ```r tdb2 <- dbConnect(MySQL(), user="tradeflows", host="localhost", password="tradeflows", dbname="tradeflows") dbListTables(tdb2) querytradedb <- function(...) dbGetQuery(tdb2, ...) rawdatahead <- querytradedb("SELECT * FROM raw_flow LIMIT 2;") kable(rawdatahead) rawdata <- dbReadTable(tdb2, "raw_flow")
querytradedb("CREATE TABLE enddata LIKE tradeflows.raw_trade_data;") dbListTables(tdb2)
Using queries is not convenient with large dataframes
#dbGetQuery(tb2, "INSERT into SELECT * FROM raw_trade_data LIMIT 10;")
nrowinend <- function(){ dbGetQuery(tdb2, "SELECT COUNT(*) FROM enddata;") } nrowinend() dbWriteTable(tdb2, "enddata", swdhead, append=TRUE, row.names = FALSE) nrowinend() # Same instruction a seccond time dbWriteTable(tdb2, "enddata", swdhead, append=TRUE, row.names = FALSE) nrowinend() # Content is not added twice # If you remove the id swdhead$id <- NULL dbWriteTable(tdb2, "enddata", swdhead, append=TRUE, row.names = FALSE) nrowinend() # content is added again dbGetQuery(tdb2, "SELECT COUNT(*) FROM raw_trade_data;") # content is added again
Change from the raw_flow
table to the validated_flow
table.
ALTER TABLE `tradeflows`.`validated_flow` ADD COLUMN `quantitypartner` BIGINT NULL AFTER `quantityreporter` , CHANGE COLUMN `quantity2` `quantityreporter` BIGINT(20) NULL DEFAULT NULL ;
charcoal90 <- loadrawdata(440290)
Data types are infered from the function dbDataType
, for example
con <- RMariaDB::dbConnect(RMariaDB::MariaDB(), dbname = "test") RMariaDB::dbDataType(con, 1) RMariaDB::dbDataType(con, 1L) RMariaDB::dbDataType(con, Sys.Date()) RMariaDB::dbDisconnect(con)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.