build_db: building a RDS database from SQL source data

Description Usage Arguments Value

View source: R/build_db.R

Description

The health authority uses access-restricted SQL tables to deliver data to users. This is probably a link to underlying SAS data. But access is also restricted by server side memory. So pulling big data chunks often gives a 'java heap memory' error. To overcome this, we slice big data-sets in chunks by 1- or 2-level prefixes of table keys (cpr or record id) and then store these chunks in a local database made up of RDS files in a structure resembling the original SQL table structure. This solves the problem with java heap memory limits and it also enables us to have parallel access to data (which is not really feasible with one SQL connection).

Usage

1
2
3
4
5
6
7
8
9
build_db(
  projct,
  db_dir,
  prefix,
  prefix.add = NULL,
  mode,
  drv = RJDBC::JDBC("com.microsoft.sqlserver.jdbc.SQLServerDriver", "c:/sqljdbc42.jar"),
  conn_def = "jdbc:sqlserver://fm-sql;databaseName=Forsker;integratedSecurity=true"
)

Arguments

projct

4-digit project number, assuming first four are zeros

db_dir

path to where local db should be written. Will overwrite db_dir

prefix

the initial number of characters to use for data slicing. Will be extended by one, if a first character in a key is constant across the table

prefix.add

add names of any tables that will need an extra character on the slicing prefix

mode

'clear' will overwrite everything again, 'fill' will fill those chunks that are expected based on SQL source, but missing in local db. If really big data generates java errors even when slicing, then this option will work for iterative db-building in a loop. Check log and data index carefully when using 'fill' option.

drv

Driver for the sql connection. Is a long URL with MS SQL parameters you get with data delivery.

conn_def

the 'conn' parameter in DBI-requests. This is the definition of the SQL connection. Also acquired when you get data from the Health Data Authority

Value

this function will return a RDS database in a folder-file structure resembling the original SQL tables, but with all data content slices. Will also add a log file in the db_dir.


socioskop/grit documentation built on Dec. 23, 2021, 3:30 a.m.