Tutorial: PipelineDB Persistence with Flocker and Docker Swarm

photo credit: Pipes, Nick Saltmarsh creative commons 2.0

Flocker + PipelineDB + Docker Swarm logo

PipelineDB, is an open-source relational database that runs SQL queries continuously on streams, incrementally storing results in tables. A compelling feature for users interested in PipelineDB is that it is built into the PostgreSQL core and can be used as a drop in replacement for PipelineDB without making any application code changes.

Flocker is a container data volume manager that is designed to allow databases like PipelineDB to easily run in containers in production. When running a database in production, you have to think about things like recovering from host failure. Flocker provides tools for managing data volumes across a cluster of machines like you have in a production environment. For example, as a PipelineDB container is scheduled between hosts in response to server failure, Flocker can automatically move its associated data volume between hosts at the same time. This means that when your PipelineDB container starts up on a new host, it has its data. This operation can be accomplished manually using the Flocker API or CLI, or automatically by a container orchestration tool that Flocker integrates with like Docker Swarm, Kubernetes or Mesos.

In this example, we’ll be manually removing the container and moving it to another node using Docker Swarm, Docker Compose, and the constraints feature. We’ll run a continuous service that will continually stream data into our PipelineDB database. Future blog posts will show how to do the same thing using all the orchestration tools that Flocker supports.

Why run PipelineDB with Docker?

As your database workload scales up, you will want to make sure your PipelineDB server has enough CPU, RAM and network bandwidth to handle nearterm and longterm capacity needs. Running your PipelineDB Server in a container makes it portable so you can manually or automatically move that container to a more powerful machine with ease. You can also better respond to system failures like crashed servers. Flocker comes in by making sure your data directory moves to the new host along with your container, reducing downtime and headaches. The same thought process can be used if you wanted to downgrade the host server to something more affordable with moderate performance attributes.

What you will learn

In this tutorial, we will demonstrate running a basic PipelineDB server container on a host machine using Docker. The PipelineDB container will be created using the Flocker plugin for Docker being declared within our docker-compose file. Flocker will automatically create and mount a dataset to your host for storage of PipelineDB /mnt/pipelinedb/data directory.

When the PipelineDB container is shut down and started on a new host using the same –volume-driver flocker flag, Flocker will automatically recognize that the container has moved and unmount its data volume from the first host and remount it to the new host. This means that when your PipelineDB container starts up, it will have all of its data.

Tutorial

Architecture

In this tutorial there are:

  • 4 nodes
  • 1 client node that we will execute Docker commands from
  • 1 master node with our Flocker Control Service and Swarm Master installed
  • 2 nodes with our Flocker Agent Services and Docker installed (our database is going to move between these two nodes)

In this example, we will be running our nodes on Amazon EC2 and creating and attaching volumes from Amazon’s EBS service.

Getting your Swarm cluster set up

We have a simple walkthrough on setting up this 3-node cluster on Amazon Web Services using CloudFormation.

Setup Docker Swarm cluster using CloudFormation

If you are using an existing cluster or setting up your Swarm cluster manually, please restart the docker daemons, “flocker-node==1” for the first node and “flocker-node==2” for the second node.

$ docker daemon --label flocker-node==1
$ docker daemon --label flocker-node==2

Running PipelineDB with Docker on Node 1

SSH into our client node using its public ip address.

[email protected] $ ssh -i PEM_KEY_FILE [email protected]

Clone this Github repo for the sample docker compose files you’ll use with your cluster.

git clone https://github.com/myechuri/flocker-pipelinedb.git
cd flocker-pipelinedb

Create our PipelineDB container using this docker compose file.

pipelinedb:
  image: pipelinedb/pipelinedb
  ports:
     -  "5432:5432"
  environment:
     - "constraint:flocker-node==1"
  volume_driver: flocker
  volumes:
     - 'pipelinedb:/mnt/pipelinedb/data'
[email protected] $ docker-compose -f pipelinedb-node1.yml up -d

We can confirm that a volume has been created and attached on creation of this container by checking with the control service by running flockerctl list.

[email protected] $ flockerctl list

DATASET        SIZE     METADATA              STATUS        SERVER
2fafa058c356   75.00G   name=pipelinedb   attached ✅   b3134669 (10.0.81.220)

Let’s connect to our pipeline container by using psql.

[email protected] $ docker run -it --rm postgres sh -c 'exec psql -h <public ip address of your pipeline container>  -p 5432 -U pipeline'

# you can also connect to your pipeline db instance from your local environment via
# $ psql -h <public ip address of your pipeline container>  -p 5432 -U pipeline

# enter password for user pipeline:
# the default user/password for this container is "pipeline"/"pipeline"

Once you have logged into your PipelineDB server, execute the following SQL commands.

pipeline=# create database twitter;

CREATE DATABASE

pipeline=# \c twitter


twitter=# create stream tweets ( content json );
CREATE STREAM

twitter=# CREATE CONTINUOUS VIEW tagstream as
twitter-# SELECT json_array_elements(content #>
twitter(# ARRAY['entities','hashtags']) ->> 'text' AS tag
twitter-# FROM tweets
twitter-# WHERE arrival_timestamp >
twitter-# ( clock_timestamp() - interval '1 hour' );
CREATE CONTINUOUS VIEW

Get Twitter app credentials

Before we can launch our containerized Twitter stream service, we will need to retrieve Twitter app credentials and user tokens.

Twitter App Signup

Run Twitter stream service to generate some data

For this container to run correctly you will need to set it up with environment variables that allow it to connect to the Twitter pipeline. You can get your credentials by registering a Twitter application at https://apps.twitter.com/.

twitterstream:
  image: stephenitis/python-twitterstream:latest
  environment:
    CONSUMER_KEY: ''
    CONSUMER_SECRET: ''
    ACCESS_TOKEN_KEY: ''
    ACCESS_TOKEN_SECRET: ''
    PIPELINE_SERVER_HOST_IP: <pipeline-container-ip>

Move the PipelineDB container to node 2

Now let’s remove our container from node1. This step is necessary as you cannot have the same dataset mounted to multiple hosts.

docker-compose -f pipelinedb-node1.yml stop
docker-compose -f pipelinedb-node1.yml rm

Start up our pipeline container on node 2

docker-compose -f pipelinedb-node2.yml up -d

You will see your Flocker dataset move from “attached” -> “detached” -> “attached” states.

Flocker will detect that you want to run the same container on this host and unmount and mount the appropriate volume onto host node2 and boom your PipelineDB server will still have your data.

Let’s connect to our PipelineDB container via psql again and verify.

[email protected] $ docker run -it --rm postgres sh -c 'exec psql -h <public ip address of your pipeline container>  -p 5432 -U pipeline'

Check for our data.

pipeline=# \c twitter
twitter=# select * from tagstream limit 5;

Take-aways

This was a basic example of manually migrating a database container from one node to another. PipelineDB has additional documentation options that involve replication, streaming replication and high availability. Using Flocker along with a mix of these strategies will provide a resilient PipelineDB cluster with persistent storage.

Thanks

I want to give huge thanks to Madhuri Yechuri of ClusterHQ for working on this demo with Josh Berkus, Derek Nelson, and Usman Masood of PipelineDB.


We’d love to hear your feedback!

Like what you read?

Signup for a free FlockerHub account.
Sign Up

Get all the ClusterHQ News

Stay up to date with our newsletter. No spam. Ever.