knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This note describes the database internals. In addition to the main events
table, there are other tables that impact how the updating process works.
library(icews)
SQLite has a special sqlite_master
table that describes the schema. This can be queried to find out what objects are present in the database, e.g.:
query_icews("select * from sqlite_master;")
With the default setup, four tables should be present:
events
: contains the actual ICEWS eventsstats
: statistics for the event table; this currently only contains the total number of rows/events in the events
table, and is used by dr_icews()
when showing the total number of events in the database (if using the database option)source_files
and null_source_files
: these tables are related to the raw data files (.tab) that have been ingested into the database. If a raw source file only contains duplicate events^[Specifically, records that share both event ID and event date.], no events are ingested into the database and the events
table contains no reference to this file. To avoid an attempt to re-download and re-ingest the file at a future update, it is in this case instead added to the null_sources_files
table, which is taken into account when resolving the various states (dataverse, database, files).Note that no indices are created by default for the events
table. Usually it makes sense to add some to speed up common queries. For example, I have at some points locally added indices for country, year, and yearmonth.
The package installation includes SQL scripts that create these tables. These can be accessed an printed from R like below. (One can also get these from SQLite with something like cat(query_icews("select sql from sqlite_master where name = 'events';")$sql)
.)
cat(readLines(system.file("sql/events.sql", package = "icews")), sep = "\n")
cat(readLines(system.file("sql/stats.sql", package = "icews")), sep = "\n")
cat(readLines(system.file("sql/source_files.sql", package = "icews")), sep = "\n")
If you are running similar queries repeatedly, it might make sense to add indices for the events table. These can dramatically increase the speed of queries if setup correctly. They do take a while to create and take up extra storage space in the database file, though. They will also slow updating the data, and what I did in the past was to remove indices, update, and then re-add the indices.
I found the general guide on indices at Use the index, Luke! to be helpful, as well as the EXPLAIN QUERY PLAN
command, which can be used to tell what, if any, indices are going to be used for a query.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.