pseudoDB: Pseudo-database class
In moturoa/shintopseudo: Read, validate, encrypt, and anonymize data

pseudoDB

R Documentation

Pseudo-database class

Description

Pseudo-database class

Public fields

con: Database connection
project: Project block from config
config: Config block from config
files: File names in the configuration file
datalog: Dataframe with file statistics. Filled before and during processing.

Methods

Public methods

pseudoDB$new()
pseudoDB$create_directories()
pseudoDB$open_logfile()
pseudoDB$write_datalog()
pseudoDB$log()
pseudoDB$set_data_log()
pseudoDB$set_status()
pseudoDB$set_error()
pseudoDB$read_config()
pseudoDB$check_files_exist()
pseudoDB$open_sqlite()
pseudoDB$vacuum_sqlite()
pseudoDB$close_sqlite()
pseudoDB$close()
pseudoDB$read_data()
pseudoDB$read_data_fread()
pseudoDB$write_data()
pseudoDB$encrypt()
pseudoDB$decrypt()
pseudoDB$symmetric_encrypt_columns()
pseudoDB$make_hash()
pseudoDB$anonymize_column()
pseudoDB$anonymize_columns()
pseudoDB$read_bag_extract()
pseudoDB$validate_address()
pseudoDB$process_files()
pseudoDB$date_to_year()
pseudoDB$to_age_bracket()
pseudoDB$keep_columns()
pseudoDB$delete_columns()
pseudoDB$clone()

Method `new()`

Make a new object of class 'pseudoDB'. When initializing, reads the configuration file, checks and makes the output directories specified in the configuration file, opens a connection to the sqlite ('shinto_pseudomaker.sqlite'), checks if all files in the config exist (if not, they are skipped later).

Usage

pseudoDB$new(
  config_file,
  secret,
  log_to = c("file", "stdout"),
  max_n_lines = NULL
)

Arguments

config_file: Path to the YML file with settings
secret: Secret key used for (extra) symmetric encryption
log_to: Log to a file or stdout (pertains to old logging in .log files, see shintopseudo.csv in the file output folder(s)).
max_n_lines: Max number of lines to read from the input files; used for testing only

Method `create_directories()`

Create output/log/sqlite directories if not exist

Usage

pseudoDB$create_directories()

Method `open_logfile()`

Opens a log file in the log output directory

Usage

pseudoDB$open_logfile()

Method `write_datalog()`

Writes shintopseudo.csv in the file output directory

Usage

pseudoDB$write_datalog()

Method `log()`

Logs to the old-style logging file

Usage

pseudoDB$log(msg, how = c("info", "fatal", "warn"))

Arguments

msg: Logging message
how: Either info, fatal or warn

Method `set_data_log()`

Update a field in the datalog during processing

Usage

pseudoDB$set_data_log(file, what, value)

Arguments

file: For which file to set the datalog
what: Set which field (column)
value: Set the value

Method `set_status()`

Set the status in the datalog (for e.g. errors)

Usage

pseudoDB$set_status(file, status)

Arguments

file: Filename to set a status
status: Status to set

Method `set_error()`

Set an error in the data log for a file (and a timestamp)

Usage

pseudoDB$set_error(file, error)

Arguments

file: Filename to flag an error
error: Error code

Method `read_config()`

Reads the config from a .yml/.yaml file

Usage

pseudoDB$read_config(fn)

Arguments

fn: Path to yml

Method `check_files_exist()`

Check if all files mentioned in the config exist

Usage

pseudoDB$check_files_exist()

Method `open_sqlite()`

Opens a connection to the SQLite with 'DBI::dbConnect(RSQLite::SQLite()...)', prepares an empty 'datadienst' table in the database if it does not exist already.

Usage

pseudoDB$open_sqlite()

Method `vacuum_sqlite()`

Performs a vacuum on the SQLite. Automatically done before closing the connection.

Usage

pseudoDB$vacuum_sqlite()

Details

from sqlite.org: "The VACUUM command rebuilds the database file, repacking it into a minimal amount of disk space [...] Frequent inserts, updates, and deletes can cause the database file to become fragmented - where data for a single table or index is scattered around the database file. Running VACUUM ensures that each table and index is largely stored contiguously within the database file.".

Method `close_sqlite()`

Close the DB connection and perform a vacuum

Usage

pseudoDB$close_sqlite(vacuum = TRUE)

Arguments

vacuum: Whether to vacuum the SQLite or not. See $vacuum_sqlite method.

Method `close()`

Close everything (also the log file)

Usage

pseudoDB$close()

Method `read_data()`

Reads a file from the config. Includes multiple methods.

Usage

pseudoDB$read_data(fn)

Arguments

fn: Bare filename to read (full path is read from config).

Details

Normally $read_data_fread is used unless readmethod='json', in which case the config setting 'post_read_function' is applied to the result of 'jsonlite::fromJSON', so that you might attempt to flatten a JSON into a neat CSV.

Method `read_data_fread()`

Default method to read the CSV using 'data.table::fread'.

Usage

pseudoDB$read_data_fread(fn, quote, sep, fill, skip = 0, encoding = NULL)

Arguments

fn: Filename WITH full path (unlike '$read_data')
quote: Argument 'quote' in fread()
sep: Argument 'sep' in fread()
fill: Argument 'fill' in fread()
skip: Argument 'skip' in fread()
encoding: Either UTF-8 or Latin-1 (or leave blank for 'unknown', which is not very reliable!)

Method `write_data()`

Writes an output CSV with 'data.table::fwrite'

Usage

pseudoDB$write_data(data, fn)

Arguments

data: Dataframe
fn: Filename

Method `encrypt()`

Symmetrically encrypt a vector using the secret

Usage

pseudoDB$encrypt(x)

Arguments

x: A character vector

Method `decrypt()`

Symmetrically decrypt an encrypted vector using the secret

Usage

pseudoDB$decrypt(x)

Arguments

x: A character vector

Method `symmetric_encrypt_columns()`

Symmetric encryption for multiple columns at once

Usage

pseudoDB$symmetric_encrypt_columns(data, columns, new_names = NULL)

Arguments

data: A Dataframe
columns: Vector of column names
new_names: Vector of new column names in the output dataframe (to be added in addition to the original).

Method `make_hash()`

The most basic function: making a 9-character hash used to make all pseudo-IDs.

Usage

pseudoDB$make_hash(n = 1, n_phrase = 9)

Arguments

n: Number of hashes to make
n_phrase: Length of the hash (default = 9 chars)

Method `anonymize_column()`

Anonymize a column. This is the largest and most crucial method.

Usage

pseudoDB$anonymize_column(
  data,
  column,
  db_key = NULL,
  store_key_columns = NULL,
  normalise_key_columns = NULL,
  file = NULL
)

Arguments

data: Dataframe
column: Column name to hash
db_key: Key name of the column
store_key_columns: Special method; do not use.
normalise_key_columns: Add a normalized ASCII version of the column to the dataframe (special characters replaced with ASCII 'equivalents')
file: Unused argument; ignore

Details

Replaces every value in the column of the dataframe with 'hashes', so that each same value in the data will get the same hash. Values already hashed will be read from the sqlite (so that the same hashes/value) combinations get made in each file, and each run of the process), values not previously hashed will get a new value/hash combination which is written to the sqlite.

Method `anonymize_columns()`

See $anonymize_column; this is the vectorized version for multiple columns

Usage

pseudoDB$anonymize_columns(data, columns, db_keys, file, ...)

Arguments

data: See $anonymize_column
columns: See $anonymize_column
db_keys: See $anonymize_column
file: See $anonymize_column
...: Further passed to $anonymize_column

Method `read_bag_extract()`

Only used for a very specific case. Not further encouraged.

Usage

pseudoDB$read_bag_extract(path)

Arguments

path: Filename

Method `validate_address()`

Only used in a very specific case. Not supported or encouraged.

Usage

pseudoDB$validate_address(data, column, columns_out, bag_path)

Arguments

data: Dataframe
column: Column name
columns_out: Names of output columns
bag_path: Path to BAG file

Method `process_files()`

Run the entire process. Read files, anonymize, encrypt, write, log.

Usage

pseudoDB$process_files(files = NULL)

Arguments

files: Optional vector of filenames to process, otherwise processes all in the loaded config.

Method `date_to_year()`

Specific for dd-mm-yyyy dates in the data; not configurable (and not used in any application)

Usage

pseudoDB$date_to_year(data, column)

Arguments

data: Dataframe
column: Name of column

Method `to_age_bracket()`

Age in years to bracket (5-10, 10-15 etc.)

Usage

pseudoDB$to_age_bracket(data, columns)

Arguments

data: Dataframe
columns: Name of columns

Method `keep_columns()`

Keep these columns

Usage

pseudoDB$keep_columns(data, columns)

Arguments

data: Dataframe
columns: Name of columns

Method `delete_columns()`

Delete these columns

Usage

pseudoDB$delete_columns(data, columns)

Arguments

data: Dataframe
columns: Name of columns

Method `clone()`

The objects of this class are cloneable with this method.

Usage

pseudoDB$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

moturoa/shintopseudo documentation built on Nov. 21, 2023, 6:57 p.m.

moturoa/shintopseudo index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

moturoa/shintopseudo Read, validate, encrypt, and anonymize data

pseudoDB: Pseudo-database class In moturoa/shintopseudo: Read, validate, encrypt, and anonymize data

Pseudo-database class

Description

Public fields

Methods

Public methods

Method new()

Usage

Arguments

Method create_directories()

Usage

Method open_logfile()

Usage

Method write_datalog()

Usage

Method log()

Usage

Arguments

Method set_data_log()

Usage

Arguments

Method set_status()

Usage

Arguments

Method set_error()

Usage

Arguments

Method read_config()

Usage

Arguments

Method check_files_exist()

Usage

Method open_sqlite()

Usage

Method vacuum_sqlite()

Usage

Details

Method close_sqlite()

Usage

Arguments

Method close()

Usage

Method read_data()

Usage

Arguments

Details

Method read_data_fread()

Usage

Arguments

Method write_data()

Usage

Arguments

Method encrypt()

Usage

Arguments

Method decrypt()

Usage

Arguments

Method symmetric_encrypt_columns()

Usage

Arguments

Method make_hash()

Usage

Arguments

Method anonymize_column()

Usage

Arguments

Details

Method anonymize_columns()

Usage

Arguments

Method read_bag_extract()

Usage

Arguments

Method validate_address()

Usage

Arguments

Method process_files()

Usage

moturoa/shintopseudo
Read, validate, encrypt, and anonymize data

pseudoDB: Pseudo-database class
In moturoa/shintopseudo: Read, validate, encrypt, and anonymize data

Method `new()`

Method `create_directories()`

Method `open_logfile()`

Method `write_datalog()`

Method `log()`

Method `set_data_log()`

Method `set_status()`

Method `set_error()`

Method `read_config()`

Method `check_files_exist()`

Method `open_sqlite()`

Method `vacuum_sqlite()`

Method `close_sqlite()`

Method `close()`

Method `read_data()`

Method `read_data_fread()`

Method `write_data()`

Method `encrypt()`

Method `decrypt()`

Method `symmetric_encrypt_columns()`

Method `make_hash()`

Method `anonymize_column()`

Method `anonymize_columns()`

Method `read_bag_extract()`

Method `validate_address()`

Method `process_files()`

Method `date_to_year()`

Method `to_age_bracket()`

Method `keep_columns()`

Method `delete_columns()`

Method `clone()`