knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

This note describes the database internals. In addition to the main events table, there are other tables that impact how the updating process works.

library(icews)

SQLite has a special sqlite_master table that describes the schema. This can be queried to find out what objects are present in the database, e.g.:

query_icews("select * from sqlite_master;")

With the default setup, four tables should be present:

Note that no indices are created by default for the events table. Usually it makes sense to add some to speed up common queries. For example, I have at some points locally added indices for country, year, and yearmonth.

Create table statements

The package installation includes SQL scripts that create these tables. These can be accessed an printed from R like below. (One can also get these from SQLite with something like cat(query_icews("select sql from sqlite_master where name = 'events';")$sql).)

Events

cat(readLines(system.file("sql/events.sql", package = "icews")), sep = "\n")

Stats table

cat(readLines(system.file("sql/stats.sql", package = "icews")), sep = "\n")

Source and null source file tables

cat(readLines(system.file("sql/source_files.sql", package = "icews")), sep = "\n")

Indices

If you are running similar queries repeatedly, it might make sense to add indices for the events table. These can dramatically increase the speed of queries if setup correctly. They do take a while to create and take up extra storage space in the database file, though. They will also slow updating the data, and what I did in the past was to remove indices, update, and then re-add the indices.

I found the general guide on indices at Use the index, Luke! to be helpful, as well as the EXPLAIN QUERY PLAN command, which can be used to tell what, if any, indices are going to be used for a query.



andybega/icews documentation built on July 7, 2023, 1:29 p.m.