knitr::opts_chunk$set(echo = TRUE)

Agenda

Agenda

Data

Data

Data is a business' life blood. For some companies, it's their entire value proposition. The generation, access, and retention of data is paramount. This yields a few rules of thumb:

Types of data

Challenges


Data + Docker

Data + Docker

Whenever you kill a container, you lose it's contents so data can't be stored in a container. So what's the point?

External volumes

Docker containers can access a file share external to them.

This is a great way to persist data, especially if you use an external facility like Azure File Storage or Amazon S3 so they handle all infrastrastructure stuff.

Creating external volumes

# Create azure volumes
docker volume create \
       --name logs \
       -d azurefile \
       -o share=logs

Using external volumes

docker run \
    -v logs:/logs \
    stephlocke/ddd-simplewrites

Demo Setup

Process

  1. Create a docker-machine on Azure
  2. Configure docker-machine to use external file system plugin
  3. Create mapped volumes

Core script {data-background-iframe="https://github.com/stephlocke/datadockerdisconbobulating/blob/master/setup/azure-docker-machine.sh"}

azure-docker-machine.sh

Plugin script {data-background-iframe="https://github.com/stephlocke/datadockerdisconbobulating/blob/master/setup/azure-file-plugin.sh"}

azure-file-plugin.sh

All together

Basic Demo

Write to a file system

Multiple containers writing to same file

Why is this way bad?

Reading data

Databases

Starting a database

Get a docker container up and running. This will initialise database files in the directory.

docker run \
   -d -v dbs:/var/lib/mysql \
   -p 6603:3306 \
   --env="MYSQL_ROOT_PASSWORD=mypassword" \
   --name mydb \
   mysql

Make a database

Attach to existing database

docker run \
   -d -v dbs:/var/lib/mysql \
   -p 6603:3306 \
   --env="MYSQL_ROOT_PASSWORD=mypassword" \
   --name mydb \
   mysql

Attach to existing

Multiple databases running off same files

  • Can we do this multiple times with mysql?
  • What's the problem, even if we could?

Multiple databases, same files

Database challenges

Primary challenges

Refreshable data

Reference data can be stored in a number of ways:

  1. A core DB that gets replicated into local db
  2. A core DB and cross DB queries
  3. Take this data out of the DB and into caches

Scaling access

To scale access, you need to avoid locks:

  1. Performance tuning goes a long way
  2. Distributed databases
  3. Sharding

Disaster-safe

Keeping your data up and available:

  1. Self healing DB clusters
  2. Backups and restore
  3. Let someone else take care of it

Security

Data needs to be secure, especially in a multi-tenant model:

  1. ACLs and row-level security
  2. Physically seperated databases

Translating challenges to technical solutions

Per instance databases

File / NoSQL dbs

DBaaS

Self-healing Docker clusters

Schema changes

  1. Use something like Flyway and migrate schema on new container creation
  2. Use something schemaless
  3. Use DBaaS and apply in one location, using feature flags etc for rollout

Further reading

Wrapup

Maybe Docker will solve some challenges for us?

Docker acquires Infinit, who've been building a distributed file system which Docker could utilise. Watch that space!

A contrasting opinion

Read the Joyent piece on persisting data

Wrapup



stephlocke/Rtraining documentation built on May 30, 2019, 3:36 p.m.