Description
The Containers and Checkpoint/Restore MC at Linux Plumbers is the opportunity for runtime maintainers, kernel developers and others involved with containers on Linux to talk about what they are up to and agree on the next major changes to kernel and userspace.
Common discussions topic tend to be improvement to the user namespace, opening up more kernel functionalities to unprivileged users, new ways to dump and restore kernel state, Linux Security Modules and syscall handling.
Opening session
openat2
landed in Linux 5.6, but unfortunately (though it does make it easier to implement safer container runtimes) there are still quite a few remaining tricks that attackers can use to attack container runtimes. This talk will give a quick overview of the remaining issues, some proposals for how we might fix them, and how libpathrs
will make use of them. In addition, a brief update on...
OpenVZ and Virtuozzo containers use CRIU as the core technology for
container migration in production. And Virtuozzo containers are slightly
different thing to what most people would imagine containers today. They are
"system containers" which is the one with full systemd inside, the one you
would enter via ssh, the one which is an analogy to a virtual machine where the
user gets root...
CRIU is not easy to use for the average user. What to do with the file system? How and where to store images?
We developed an easy-to-use checkpoint/restore tool that uses the CRIU engine. It provides the following features: * It does not require root access to operate. Only an empty container (e.g. kubernetes) is required * Provides time virtualization, critical when migrating (java)...
Containers are by far the biggest use case for overlayfs.
Yet, there seems to be very little cross talk between overlayfs and containers mailing lists.
This talk is going to present some opt-in overlayfs features that were added in recent years (redirect_dir, index, nfs_export, xino, metacopy).
Most of those features have not been enabled by most container runtimes, because of various...
CRIU is the most advanced Checkpoint-Restore project on Linux.
But even with CRIU at the moment it is not feasible to checkpoint - restore
all possible topologies of processes and namespaces. Even relatively simple
case of a process tree with two UTS/IPC namespaces is not supported by CRIU,
not mentioning more complex cases like a process tree with more than one PID
namespaces.
In...
New cloud offerings such as Google preemtible VMs are up to 5x cheaper than regular machines. These VMs come with tight eviction deadlines (~30secs). This introduces a new goal: How can we evacuate an application from a machine as fast as possible?
Note that this problem is different from live migration, which aims at minimizing application downtime.
To do fast checkpointing, we...
We would like to discuss a proposal for more advanced in-kernel idmap isolation.
This is a first brainstorm around building a sensible, better capability model on top of pidfds.
This summarizes my (not-so-good) experience wrt using the kernel API exposed as /proc/*/mount{s,info} in various container projects (docker, runc, aufs, cri-o, cilium etc.), and outlines various problems with this API and its (ab)use.
Mountinfo API is quite adequate for 10s of mounts (systems with
no containers). With containers, each one adds a few mounts, and there might be thousands of...