Home

/

GitHub

/

In stephlocke/Rtraining: Training Materials on R

knitr::opts_chunk$set(echo = TRUE)

Agenda

Data
Docker
Data + Docker
Demo setup
Basic demo
A database
Solving database challenges

Data

Data is a business' life blood. For some companies, it's their entire value proposition. The generation, access, and retention of data is paramount. This yields a few rules of thumb:

Never delete, never surrender
Change with due consideration
Keep data safe

Types of data

Config
Reference data
Telemetry
Transactional data

Challenges

Refreshing data
Scaling access
Being safe against disaster
Security

Data + Docker

Whenever you kill a container, you lose it's contents so data can't be stored in a container. So what's the point?

External volumes

Docker containers can access a file share external to them.

This is a great way to persist data, especially if you use an external facility like Azure File Storage or Amazon S3 so they handle all infrastrastructure stuff.

Creating external volumes

# Create azure volumes
docker volume create \
       --name logs \
       -d azurefile \
       -o share=logs

Using external volumes

docker run \
    -v logs:/logs \
    stephlocke/ddd-simplewrites

Demo Setup

Process

Create a docker-machine on Azure
Configure docker-machine to use external file system plugin
Create mapped volumes

Core script {data-background-iframe="https://github.com/stephlocke/datadockerdisconbobulating/blob/master/setup/azure-docker-machine.sh"}

azure-docker-machine.sh

Plugin script {data-background-iframe="https://github.com/stephlocke/datadockerdisconbobulating/blob/master/setup/azure-file-plugin.sh"}

azure-file-plugin.sh

All together

Basic Demo

Write to a file system

Multiple containers writing to same file

Why is this way bad?

Reading data

Databases

Starting a database

Get a docker container up and running. This will initialise database files in the directory.

docker run \
   -d -v dbs:/var/lib/mysql \
   -p 6603:3306 \
   --env="MYSQL_ROOT_PASSWORD=mypassword" \
   --name mydb \
   mysql

Make a database

Attach to existing database

docker run \
   -d -v dbs:/var/lib/mysql \
   -p 6603:3306 \
   --env="MYSQL_ROOT_PASSWORD=mypassword" \
   --name mydb \
   mysql

Attach to existing

Multiple databases running off same files

Can we do this multiple times with mysql?

What's the problem, even if we could?

Multiple databases, same files

Database challenges

Primary challenges

Refreshing data
Scaling access
Being safe against disaster
Security

Refreshable data

Reference data can be stored in a number of ways:

A core DB that gets replicated into local db
A core DB and cross DB queries
Take this data out of the DB and into caches

Scaling access

To scale access, you need to avoid locks:

Performance tuning goes a long way
Distributed databases
Sharding

Disaster-safe

Keeping your data up and available:

Self healing DB clusters
Backups and restore
Let someone else take care of it

Security

Data needs to be secure, especially in a multi-tenant model:

ACLs and row-level security
Physically seperated databases

Translating challenges to technical solutions

Per instance databases

Pro: Scale resources per customer
Pro: Put other aspects per customer and control roll-out
Pro/Con: Can't access all the customer's data at once
Con: More migration operations

File / NoSQL dbs

Pro: Single db
Pro: Could do without schema migration efforts
Pro / Con: Unlikely to get ACID
Con: ACLs

DBaaS

Pro: Someone else worries about infrastructure
Pro: Can put into practice different single db / sharded db to suit
Pro: Scale resources per customer
Pro/Con: You don't manage the infrastructure
Con: Unless containers hosted near the SaaS data store, potential latency

Self-healing Docker clusters

Pro: All Docker solution
Pro: Keeps control in the hands of the dev
Pro/con: The data is probably on infrastructure you manage
Con: Quite a complex solution with a low number of OoB solutions

Schema changes

Use something like Flyway and migrate schema on new container creation
Use something schemaless
Use DBaaS and apply in one location, using feature flags etc for rollout

Wrapup

Maybe Docker will solve some challenges for us?

Docker acquires Infinit, who've been building a distributed file system which Docker could utilise. Watch that space!

A contrasting opinion

Read the Joyent piece on persisting data

Wrapup

Thank you!
Questions?
Get the stuff GH:stephlocke/datadockerdisconbobulating
Follow up T: SteffLocke

stephlocke/Rtraining documentation built on May 30, 2019, 3:36 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

stephlocke/Rtraining Training Materials on R

In stephlocke/Rtraining: Training Materials on R

Agenda

Agenda

Data

Data

Types of data

Challenges

Data + Docker

Data + Docker

External volumes

Creating external volumes

Using external volumes

Demo Setup

Process

Core script {data-background-iframe="https://github.com/stephlocke/datadockerdisconbobulating/blob/master/setup/azure-docker-machine.sh"}

Plugin script {data-background-iframe="https://github.com/stephlocke/datadockerdisconbobulating/blob/master/setup/azure-file-plugin.sh"}

All together

Basic Demo

Write to a file system

Multiple containers writing to same file

Reading data

Databases

Starting a database

Make a database

Attach to existing database

Attach to existing

Multiple databases running off same files

Multiple databases, same files

Database challenges

Primary challenges

Refreshable data

Scaling access

Disaster-safe

Security

Translating challenges to technical solutions

Per instance databases

File / NoSQL dbs

DBaaS

Self-healing Docker clusters

Schema changes

Further reading

Wrapup

Maybe Docker will solve some challenges for us?

A contrasting opinion

Wrapup

R Package Documentation

Browse R Packages

We want your feedback!

stephlocke/Rtraining
Training Materials on R