Poor Man's Container

In this exercise, I am going to play with some building blocks for Linux application container. And the goal is to be able to manually create simplified application containers.

Motivation

After using Docker for about 2 years, I finally try to understand how a Linux application container runtime works. Jérôme Petazzoni’s presentation (slides) explains both concept and implementation details of application container runtime. To me, the most interesting part of the presentation is the live demo of lunching a container without using any container manager/runtime such as Docker or rkt.

One missing part of the demo is that the network setup portion doesn’t full work. So it’s also a good chance for me to review Linux networking to get that work.

Requirement

From a developer point of view, Docker provides more value for deployment. Features like resource limiting by cgroups and copy-on-write storage by btrfs are great in production environments, but might not be that useful if we want to bring container concept into development cycle.

In the other hands, isolation by namespaces are quite useful for development, especially for file system and networking.

There are tools for creating isolated compilation or runtime environments for specific programming languages, such as Virtualenv for Python, RVM for ruby, Go’s native GOROOT and GOPATH, but file system isolation would be a simple and one-size-fits-all solution for all kinds of programming languages and tools.

And network isolation would also be pretty useful. Think about that if you want to implement a software load balancer, and want to test with multiple instances of upstream service locally. It’s possible to archive this by

  • Tweak upstream services to listen on different ports
  • Create virtual network interfaces and tweak upstream service to listen on specific interface

But using network namespace would be a generic solution and require no special tweaking for applications.

So in this exercise, I’ll figure out how to do file system isolation by well-known chroot to change current process root (/) directory, and use ip to manage network interfaces and namespace.

Let’s Do It!

Prepare Host Network

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# create a network bridge, containers will use host's external network
# via this bridge
ip link add pmc0 type bridge
ip addr add 10.0.0.1/24 dev pmc0
# route for traffic from host to bridge will be added automatically
# when the interface is up
ip link set pmc0 up
# enable NAT, so container network could access Internet
echo 1 > /proc/sys/net/ipv4/ip_forward
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

Prepare Container Network

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# create a veth pair, vthc0h will be in host’s bridge and vthc0h will
# be in container
ip link add dev vthc0h type veth peer name vthc0c
ip link set dev vthc0h up
# put vthc0h in bridge
ip link set vthc0h master pmc0
# create a network namespace
ip netns add c0
# put vthc0c in the namespace
ip link set vthc0c netns c0
# bring up container’s loopback device
ip netns exec c0 ip link set dev lo up
# bring up vthc0c in container’s network namespace
ip netns exec c0 ip addr add 10.0.0.2/24 dev vthc0c
ip netns exec c0 ip link set dev vthc0c up
# make the bridge default gateway for the container
ip netns exec c0 ip route add default via 10.0.0.1

Prepare Container File System

1
2
3
4
5
6
mkdir -p /export/containers/images
# Download rootfs of apline for Docker
curl -L -v -o /export/containers/images/alpine-3.4-rootfs.tar.gz https://github.com/gliderlabs/docker-alpine/raw/45ba65c1116aaf668f7ab5f2b3ae2ef4b00738be/versions/library-3.4/rootfs.tar.gz
# create container's root directory
mkdir /export/containers/c0
tar -C /export/containers/c0 -xf /export/containers/images/alpine-3.4-rootfs.tar.gz

Switch to Container Context

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# mount special file systems 
mount -t proc proc /export/containers/c0/proc/
mount -t sysfs sys /export/containers/c0/sys/
mount -o bind /dev /export/containers/c0/dev/
mount -t devpts pts /export/containers/c0/dev/pts/
echo nameserver 8.8.8.8 > /export/containers/c0/etc/resolv.conf
# run shell in container’s root fs and network namespace, with
unshare ip netns exec c0 chroot /export/containers/c0 /bin/sh
# change container hostname
hostname c0
# check network interface
ifconfig
# verify connection
apk add --no-cache curl
curl www.google.com
# Done! You are in a container

References

Also posted on https://www.linkedin.com/pulse/poor-mans-container-steven-chin by Steven Chin.