Tuesday, September 23, 2014

Kubernetes Under The Hood: Etcd

Kubernetes is an effort which originated within Google to provide an orchestration layer above Docker containers.  Docker operation is limited to actions on a single host.  Kubernetes attempts to provide a mechanism to manage large sets of containers on a cluster of container hosts.  Above that will eventually be job management services like Mesos or Aurora.

Anatomy of  Kubernetes Cluster


A Kubernetes cluster is made up of three major active components

  1. Kubernetes app-service
  2. Kubernetes kubelet agent
  3. etcd distributed key/value database

The app-service is the front end of the Kubernetes cluster.  It accepts requests from clients to create and manage containers, services and replication controllers within the cluster. This is the control interface of Kubernetes.

The kubelet is the active agent.  It resides on a Kubernetes cluster member host.  It polls for instructions or state changes and acts to execute them on the host.

The etcd services are the communications bus for the Kubernetes cluster.  The app-service posts cluster state changes to the etcd database in response to commands and queries.  The kubelets read the contents of the etcd database and act on any changes they detect.

There's also a kube-proxy process which does the Service network proxy work but that's not relevant to the larger operations.

This post is going to describe and play with the etcd.

OK, so what is Etcd?


Etcd (or etcd) is a service created by the CoreOS team to create a shared distributed configuration database.  It's a replicated key/value store.  The data are accessed using ordinary HTTP(S) GET and PUT queries.  The status, metadata and payload are returned as members of a JSON data structure.

Etcd has a companion CLI client for testing and manual interaction.  This is called etcdctl.  Etcdctl is merely a wrapper that hides the HTTP interactions and the raw JSON that is used as status and payload.

Installing and Running Etcd


Etcd (and etcdctl, the CLI client) aren't yet available in RPM format from the standard repositories, or if they are they're very old. If you're running on 64 bit Linux you can pull the most recent binaries from the Github repository for CoreOS. Download them, unpack the tar.gz file and place the binaries in your path.


curl -s -L https://github.com/coreos/etcd/releases/download/v0.4.6/etcd-v0.4.6-linux-amd64.tar.gz | tar -xzvf -
etcd-v0.4.6-linux-amd64/
etcd-v0.4.6-linux-amd64/etcd
etcd-v0.4.6-linux-amd64/etcdctl
etcd-v0.4.6-linux-amd64/README-etcd.md
etcd-v0.4.6-linux-amd64/README-etcdctl.md
cd etcd-v0.4.6-linux-amd64


Once you have the binaries, check out the Etcd and Etcdctl github pages for basic usage instructions.  I'll duplicate here a little bit just to get moving.

Etcd doesn't run as a traditional daemon.  It remains connected to STDOUT and logs activity.  I'm not going to demonstrate here how to turn it into a proper daemon.  Instead I'll run it in one terminal session and use another to access it.

NOTE 1: Etcd does not use standard longopts conventions.  All of the options use single leading hyphens.
NOTE 2: Etcdctl does follow the longopt conventions.  Go figure.

./etcd
[etcd] Sep 23 10:36:04.655 WARNING   | Using the directory myhost.etcd as the etcd curation directory because a directory was not specified. 
[etcd] Sep 23 10:36:04.656 INFO      | myhost is starting a new cluster
[etcd] Sep 23 10:36:04.658 INFO      | etcd server [name myhost, listen on :4001, advertised url http://127.0.0.1:4001]
[etcd] Sep 23 10:36:04.658 INFO      | peer server [name myhost, listen on :7001, advertised url http://127.0.0.1:7001]
[etcd] Sep 23 10:36:04.658 INFO      | myhost starting in peer mode
[etcd] Sep 23 10:36:04.658 INFO      | myhost: state changed from 'initialized' to 'follower'.
[etcd] Sep 23 10:36:04.658 INFO      | myhost: state changed from 'follower' to 'leader'.
[etcd] Sep 23 10:36:04.658 INFO      | myhost: leader changed from '' to 'myhost'.

As you can see the daemon listens by default to the localhost interface on port 4001/TCP for client interactions and on port 7001/TCP for clustering communications.  See the output of etcd -help for detailed options.  You can also see the process whereby the new daemon attempts to connect to peers and determine its place within the cluster.  Since there are no peers, this one elects itself leader.

That output looks as if the etcd is running. I can check by querying the daemon version and some other information.

curl -s http://127.0.0.1:4001/version
etcd 0.4.6

I can also get some stats from the daemon directly as well:

curl -s -L http://127.0.0.1:4001/v2/stats/self | python -m json.tool
{
    "leaderInfo": {
        "leader": "myhost",
        "startTime": "2014-09-23T10:37:04.839453766-04:00",
        "uptime": "5h10m13.053046076s"
    },
    "name": "myhost",
    "recvAppendRequestCnt": 0,
    "sendAppendRequestCnt": 0,
    "startTime": "2014-09-23T10:37:04.83945236-04:00",
    "state": ""
}

So now I know it's up and responding.

Playing with Etcd


Etcd responds to HTTP(S) queries both to set and retrieve data.  All of the data are organized into a hierarchical key set (which for normal people means that the keys look like files in a tree of directories).  The values are arbitrary strings. This makes it very easy to test and play with etcd using ordinary CLI web query tools like curland wget. The binary releases also include a CLI client called etcdctl which simplifies the interaction, allowing the caller to focus on the logical operation and the result rather than the HTTP/JSON interaction. I'll show both methods where they are instructive, choosing the best one for each example.

The examples here are adapted from the CoreOS examples on Github.  There's also a complete protocol document for it as well

Once the etcd is running I can begin working with it.

Etcd is a hierarchical key=value store. This means that each piece of stored data has a key which uniquely identifies it within the database. The key is hierarchical in that the key is composed of a set of elements that form a path from a fixed known starting point for the database known as the root. Any given element in the database can either be a branch (directory) or a leaf (value).  Directories contain other keys and are used to create the hierarchy of data.

This is all formal gobbledy-gook for "it looks just like a filesystem". In fact a number of the operations that etcdctl offers are exact analogs of filesystem commands: mkdir, rmdir, ls, rm.

The first operation is to look at the contents of the root of the database. Expect this to be boring because there's nothing there yet.


./etcdctl ls /


See? There's nothing there. Boring.

It looks a little different when you pull it using curl.

curl -s http://127.0.0.1:4001/v2/keys/ | python -m json.tool
{
    "action": "get",
    "node": {
        "dir": true,
        "key": "/"
    }
}

The return payload is JSON. I use the python json.tool module to pretty print it.

I can see that this is the response to a GET request. The node hash describes the query and result. I asked for the root key (/) and it's an (empty) directory.

Life will be a little more interesting if there's some data in the database. I'll add a value and I'm going to put it well down in the hierarchy to show how the tree structure works.

./etcdctl set /foo/bar/gronk "I see you"
I see you

Now when I ask etcdctl for the contents of the root directory I at least get some output:

./etcdctl ls /
/foo

But that's much more interesting when I look using curl.

curl -s http://127.0.0.1:4001/v2/keys/ | python -m json.tool
{
    "action": "get",
    "node": {
        "dir": true,
        "key": "/",
        "nodes": [
            {
                "createdIndex": 7,
                "dir": true,
                "key": "/foo",
                "modifiedIndex": 7
            }
        ]
    }
}

This looks very similar to the previous response with the addition of the nodes array.  I can infer that this list contains the set of directories and values that the root contains.  In this case it contains one other subdirectory named /foo.
Creating a new value is also more fun using curl:

curl -s http://127.0.0.1:4001/v2/keys/fiddle/faddle -XPUT -d value="popcorn" | python -m json.tool
{
    "action": "set",
    "node": {
        "createdIndex": 8,
        "key": "/fiddle/faddle",
        "modifiedIndex": 8,
        "value": "popcorn"
    }
}

The return payload is the REST acknowledgement response to the PUT query. It looks similar to the GET query response, but not identical. The action is (appropriately enough) set. Only a single node is returned, not the node list you get when querying a directory and the value is provided as well.  The REST protocol (and the etcdctl command) allow for a number of modifiers for queries. Two I'm going to use a lot are sort and recursive.

If I want to see the complete set of nodes underneath a directory I can use etcdctl ls with the --recursive option:


./etcdctl ls / --recursive
/foo
/foo/bar
/foo/bar/gronk
/fiddle
/fiddle/faddle

That's a nice pretty listing. As you can imagine, this gets a bit messier if you use curl for the query. This is probably the last time I'll use curl for a query here.  

curl -s http://127.0.0.1:4001/v2/keys/?recursive=true| python -m json.tool
{
    "action": "get",
    "node": {
        "dir": true,
        "key": "/",
        "nodes": [
            {
                "createdIndex": 7,
                "dir": true,
                "key": "/foo",
                "modifiedIndex": 7,
                "nodes": [
                    {
                        "createdIndex": 7,
                        "dir": true,
                        "key": "/foo/bar",
                        "modifiedIndex": 7,
                        "nodes": [
                            {
                                "createdIndex": 7,
                                "key": "/foo/bar/gronk",
                                "modifiedIndex": 7,
                                "value": "I see you"
                            }
                        ]
                    }
                ]
            },
            {
                "createdIndex": 8,
                "dir": true,
                "key": "/fiddle",
                "modifiedIndex": 8,
                "nodes": [
                    {
                        "createdIndex": 8,
                        "key": "/fiddle/faddle",
                        "modifiedIndex": 8,
                        "value": "popcorn"
                    }
                ]
            }
        ]
    }
}

Clustering Etcd


Etcd is designed to allow database replication and the formation of clusters.  When two etcds connect, they use a different port from the normal client access port.  An etcd that intends to participate listens on that second port and also connects to a list of peer processes which also are listening.

You can set up peering (replication) using the command line arguments --peer-addr and --peers or you can set the values in the configuration file /etc/etcd/etcd.conf

Complete clustering documentation can be found on Github.

Etcd and Security


Etcd communications can be encrypted using SSL, but there is no authentication or access control. This makes it simple to use, but it makes it critical that you be careful never to place sensitive information like passwords or private keys into Etcd. It also means that you assume when using etcd that there are no malicious actors in the network space which has access.  Any process with network access can both read and write any keys and values within the etcd.  It is absolutely essential that access to etcd be protected at the network level because there's nothing else restricting access.

Instructions for enabling SSL to encrypt etcd traffic is also on Github

Etcd can be configured to restrict access to queries which use a client certificate but this provides very limited access control.  Clients are either allowed full access or denied.  There is no concept of a user, or authentication or access control policy once a connection has been allowed.

Additional Capabilities of Etcd


Don't make the mistake of thinking that Etcd is a simple networked filesystem with an HTTP/REST protocol interface. Etcd has a number of other important capabilities related to its role in configuration and cluster management.

Each directory or leaf node can have a Time To Live or TTL value associated with it.  The TTL indicates the lifespan of they key/value pair in seconds.  When a value is set, if the TTL is also set then that key/value pair will expire when the TTL drops to zero.  After that the value will no longer be available.

It is also possible to create hidden nodes. These are nodes that will not appear in directory listings.  To access them the query must specify the correct path explicitly.  Any node name which begins with an underscore character (_) will be hidden from directory queries.

Most importantly it is possible for clients to wait for changes to a key.  If I issue a GET query on a key with the wait flag set then the query will block, leaving the query incomplete and the TCP session open. Assuming that the client doesn't time out the query will remain open and unresolved until the etcd detects (and executes) a change request on that key.  At that point the waiting query will also complete and return the new value.  This can be used as an event management or messaging system to avoid unnecessary polling.

Etcd in Kubernetes


Etcd is used by Kubernetes as both the cluster state database and as the communications mechanism between the app-server and the kubelet processes on the minion hosts.  The app-server places values into the etcd in response to requests from the users for things like new pods or services, and it queries values from it to get status on the minions, pods and services.

The kubelet processes also both query and update the contents of the database.  They poll for desired state changes and create new pods and services in response.  They also push status information back to the etcd to make it available to client queries.

The root of the Kubernetes data tree within the etcd database is /registry. Let's see what's there.



./etcdctl ls /registry --recursive
/registry/services
/registry/services/specs
/registry/services/specs/db
/registry/services/specs/msg
/registry/services/endpoints
/registry/services/endpoints/db
/registry/services/endpoints/msg
/registry/pods
/registry/pods/pulpdb
/registry/pods/pulpmsg
/registry/pods/pulp-beat
/registry/pods/pulp-resource-manager
/registry/hosts
/registry/hosts/10.245.2.3
/registry/hosts/10.245.2.3/kubelet
/registry/hosts/10.245.2.4
/registry/hosts/10.245.2.4/kubelet
/registry/hosts/10.245.2.2
/registry/hosts/10.245.2.2/kubelet

I'm running the Vagrant cluster on Virtualbox with three minions.  These are listed under the hosts subtree. 

I've also defined two services, db and msg which are found under the services subtree.  The service data is divided into two parts.  The specs tree contains the definitions I provided for the two services.  The endpoints subtree contains records which indicate the actual locations of the containers labeled to accept the service connections.

Finally I've defined four pods which make up the service I'm building (which happens to be a Pulp service). Each host is listed by its IP address at the moment. Work is on-going to allow the minions to be referred to by their host-name but that requires control of the nameservice which is available inside the containers. Without a universal nameservice for containers, IP addresses are the only way for processes inside a container to find hosts outside.

Some of the values here will look familiar to someone who has created pods and services using the kubecfg client.  They are nearly identical to the JSON query and response payloads from the Kubernetes app-server.

I don't recommend making any changes or additions to the etcd database in a running Kubernetes cluster. I haven't looked deeply enough yet into how the app-server and kubelet interact with etcd and it would be very easy I think to upset them.  For now I'm able to query etcd and confirm that my commands have or have not been initiated and compare what I see to what I expect.


Summary


Etcd is a neat tool for storing and sharing configuration data.  It's only useful (so far) in limited cases where there are no malicious or careless users, but it's a very young project.  I am speculating that etcd is a a temporary component of Kubernetes.  It provides the features needed to facilitate the development of the app-server and kubelet which are the core functions of Kubernetes.  Once those are stable, if others feel the need to use a more secure or scalable component then it can be done.  The configuration payload can remain and only the communications mechanism will need to be replaced.


References



3 comments:

  1. Thanks for continuing your posts on topics such as Docker, Kubernetes, etc. In your introduction you mentioned that Mesos or Aurora might run "above" Kubernetes. I just thought I'd share a link (for anyone that may have not seen it) to an announcement from Mesosphere about Kubernetes + Mesos: https://mesosphere.com/2014/07/10/mesosphere-announces-kubernetes-on-mesos/

    ReplyDelete
  2. Thank you for such an informative post, learnt some real stuff.

    ReplyDelete
  3. Very useful information. Thanks so much for sharing.

    ReplyDelete