Tuesday, March 28, 2017
I’m tired of having the same conversation over and over again with people so
I figured I would put it into a blog post.
Many people ask me if I have tried or what I think of Solaris Zones / BSD Jails. The
answer is simply: I have tried them and I definitely like them. The conversation
then heads towards them telling me how Zones and Jails are far superior to
containers and that I should basically just give up with Linux containers and use VMs.
Which to be honest is a bit forward to someone who has spent a large portion of
her career working with containers and trying to make containers more secure.
Here is what I tell them:
The Design of Solaris Zones, BSD Jails, VMs and containers are very different.
Solaris Zones, BSD Jails, and VMs are first class concepts. This is clear from
the Solaris Zone Design Spec and the BSD Jails Handbook.
I hope it can go without saying that VMs are very much a first class object
without me having to link you somewhere :P.
Containers on the other hand are not real things. I have said this in many
talks and I’m saying it again now.
— Jorge Silva (@thejsj) March 10, 2017
A “container” is just a term people use to describe a combination of Linux
namespaces and cgroups. Linux namespaces and cgroups ARE first class objects.
I am trying to make this distinction very clear to make a point. The designs
are different. PERIOD.
Let’s go over some of the things you can do with containers that you CANNOT do
with Jails or Zones or VMs.
Since containers are made with specific building blocks of namespaces this
allows for doing some super neat things like sharing namespaces.
There are many different namespaces but I will give a couple examples.
This specific example can be seen in a demo by Arnaud Porterie from our talk at
Dockercon EU in 2015. You can
have your application running in one container, then in a different
container sharing a net namespace you can run wireshark and inspect the packets
from the first container.
You could also do the same with sharing a pid namespace, except instead of
running wireshark you can run strace and debug your application from an
entirely different container.
Sharing X socket
I assume if you are on my blog you are familiar with my posts on running
containers on your desktop.
To really drive home a point I’m going to make an analogy describing each of
these things in terms of legos.
VMs, Jails, and Zones are if you bought the legos already put together AND
glued. So it’s basically the Death Star and you
don’t have to do any work you get it pre-assembled out of the box. You can’t even take it apart.
Containers come with just the pieces so while the box says to build the Death
Star, you are not tied to that. You can build two boats connected by a flipping
ocean and no one is going to stop you.
This kind of flexibility allows for super awesome things but of course comes at
Complexity == Bugs
Now is the point where the person I would be having the conversation with starts
yelling at me that containers are not secure. Hello, thank you, I am aware.
Also if anyone gives a shit about actually fixing this, it’s me.
Again, containers were not a top level design, they are something we build
from Linux primitives. Zones, Jails, and VMs are designed as top level
The cool things I expressed above allow for a level of flexibility and control that Zones,
Jails, and VMs do not. By design.
This extra complexity leads to bugs that lead to container escapes. Don’t get
me wrong you could also escape a VM, Jail or Zone, but the design is not as
complicated as that of the primitives that make up containers.
Less is more, and the less complexity you have the less likely you will have odd,
edge case bugs.
The point I am trying to make is that Jails, Zones, VMs and containers were
designed and built in different ways. Containers are not a Linux isolation primitive, they
merely consume Linux primitives which allow for some interesting interactions.
They are not perfect; Nothing is.
We can make them better by reducing some of the complexity and building
hardening features around them which is a goal I have been trying and will
continue trying to do.
You can get a sandbox level of isolation with containers, which I wrote in
more detail about here.
But this requires doing the work of building the Death Star from your pieces of
Seccomp, AppArmor, and SELinux profiles.
I personally love Zones, Jails, and VMs and I think they all have a particular
use case. The confusion with containers primarily lies in assuming they fulfill
the same use case as the others; which they do not. Containers allow for a flexibility
and control that is not possible with Jails, Zones, or VMs. And THAT IS A FEATURE.