# Linux Plumbers Conference 2021

20-24 September 2021
US/Pacific timezone

# LPC 2021  Microconferences

## Containers and Checkpoint/Restore MC

CFP Ends: Aug 15, 2021

The Containers and Checkpoint/Restore Microconference focuses on both userspace and kernel related work. The micro-conference targets the wider container ecosystem ideally with participants from all major container runtimes as well as init system developers.

Contributions to the micro-conference are expected to be problem statements, new use-cases, and feature proposals both in kernel- and userspace.

Suggested Topics:

• How to best use CAP_CHECKPOINT_RESTORE in CRIU to make it possible to run checkpoint/restore as non-root (with CAP_CHECKPOINT_RESTORE)
• Extending the idmapped mount feature to unprivileged containers, i.e. agreeing on a sane and safe delegation mechanism with clean semantics.
• Porting more filesystems to support idmapped mounts.
• Making it possible for unprivileged containers and unprivileged users in general to install fanotify subtree watches.
• Discussing and agreeing on a concept of delegated mounts, i.e. the ability for a privileged process to create a mount context that can be handed of to a lesser privileged process which it can interact with safely.
• Fixing outstanding problems in the seccomp notifier to handle syscall preemption cleanly. A patchset for this is already out but we need a more thorough understanding of the problem and its proposed solution.
• With more container engines and orchestrators supporting checkpoint/restore there has come up the idea to provide an optional interface with which applications can be notified that they are about to be checkpointed. Possible example is a JVM that could do cleanups which do not need to be part of a checkpoint.
• Discussing an extension of the seccomp API to make it possible to  ideally attach a seccomp filter to a task, i.e. the inverse of the current model instead of caller-based seccomp sandboxing enabling full supervisor-based sandboxing.
• Integration of the new Landlock LSM into container runtimes.
• Although checkpoint/restore can handle cgroupv1 correctly the cgroupv2 support is very limited and there is a need to figure out what is still missing to have v2 supported just as good as v1.
• Isolated user namespaces (each with full 32bit uid/gid range) and easier way for users to create and manage them.
• Figure out what is missing on the checkpoint/restore level and maybe the container runtime level to support optimal checkpoint/restore integration on the orchestration level. Especially the pod concept of Kubernetes introduces new challenges which have not been part of checkpoint/restore before (containers sharing namespaces for example).

If you are interested in participating in this microconference and have topics to propose, please use the CfP process, and select "Containers and Checkpoint/Restore MC" for the "Track". More topics will be added based on CfP for this microconference.

• Stéphane Graber <stgraber@stgraber.org>, Mike Rapoport <mike.rapoport@gmail.com>, Adrian Reber <areber@redhat.com>, and Christian Brauner <christian.brauner@ubuntu.com>

## Confidential Computing MC

CFP Ends: Sept 1, 2021

The Confidential Computing microconference focuses on solutions to the development of using the state of the art encyption technologies for live encryption of data, and how to utilize the technologies from AMD (SEV), Intel (TDX),  s390 and ARM Secure Virtualization for secure computation of VMs, containers and more.

Suggested Topics:

For more references, see:

If you are interested in participating in this microconference and have topics to propose, please use the CfP process, and select "Confidential Computing MC" for the "Track". More topics will be added based on CfP for this microconference.

• Joerg Roedel <joro@8bytes.org>

## Confidential Computing MC Summary

• TDX Live Migration[video][slides]
• Uses separate Migration Trusted Domains (TDs): SrcMigTD and DstMigTD
• MigTDs are part of the TCB, and they do pre-migration checks and prepare the encryption key for migrating guest states

• Guest state encryption/decryption is done by the TDX Module when VMM exports/imports it via SEAM calls

• MigTD to host communication can use a VMM agnostic transport based on vmcall, or a VMM specific transport

• virtio-vsock

• hyperv-vsock

• vmci-vsoc

• Intel provides a rust-based reference implementation for MigTD

• MigTD is provided by Hypervisor, Guest TD can measure MigTDs provided by the cloud vendor

• Interface to QEMU is a kvm-migration-device which calls TDX module to re-encrypt pages for migration

• How to track guest private and shared pages?

• Bitmap is fast but sparsely populated

• Region list is slower but likely sufficient

• Live Migration of Confidential Guests[video][slides]

• Problem: How to migrate AMD SEV guests

• Solution one: Using only the AMD Platform Security Processor (PSP)

• PSP establishes migration key and re-encrypts guest pages for migration

• Approach is slow because PSP has low re-encryption bandwidth

• Solution two: Use a migration helper in a separate VM sharing the same encryption context as the migrated VM

• Faster because re-encryption happens on the CPU

• Migration helper needs to be loaded at guest launch and is part of the measurement

• Implementation uses a region list to track shared guest memory regions

• Open problem: Find a common solution for QEMU which works for SEV (including SEV-ES and SEV-SNP) and TDX

• TDX Linux Support[video][slides]

• Patches under development

• SWIOTLB used for IO, Buffers are shared memory

• Work in progress to split the SWIOTLB spin-lock

• Lots of time (20%) spent on the spin-locks

• Hyper-V has a better bounce buffer implementation

• Lazy memory accept is not yet implemented, work ongoing

• Needs an approach which can be shared with AMD SEV-SNP

• Memory must not be accepted twice

• Current approach uses a bitmap with 2M granularity

• Acceptance information needs to be preserved across kexec

• Trusting device drivers

• Traditionally Linux trusts the hardware, but in a TDX guest the hardware is emulated and becomes untrusted

• Drivers need to be hardened against malicious Hypervisor device emulations

• A driver white-list of allowed drivers in a TDX guest is needed

• Debug Support for Confidential Guests[video][slides]

• Debugging AMD SEV guests

• TDX debug support builds on-top of SEV debug support

• AMD PSP is used encrypt and decrypt guest memory

• Add a QEMU.MemTxAttr.debug flag to indicate memory accesses from a debugger

• Additional debug ops in 'struct MemoryRegion'

• Needed for further SEV-ES and SEV-SNP development

• Possibly implement decryption of guest register state on automatic exits like SHUTDOWN

• Currently only possible with out-of-tree patches

• Upstream solution to this would help a lot

• Confidential Computing with Secure Execution (IBM Z)[video][slides]

• Uses a fully encrypted boot image encrypted with an asymmetric key

• Key is specific to the host machine and so is the image

• Decryption happens in Ultra-visor, data not visible to QEMU

• Ultra-visor is a combination of hardware and firmware and part of the guest TCB

• RootFS is encrypted with LUKS an dm-crypt

• Kernel and Initrd are encrypted with hardware public key

• In confidential containers (e.g. Kata), attestation can be substituted

• Keys for the decryption and verification of container images can be baked into the initrd

• Initrd encrypted with host key

• Confidential Containers[video]

• Protect containers and host from inter-container attacks

• Remove Container Service Provider (CSP) from the TCB

• Put containers into confidential VMs based on kata-containers

• One VM per pod

• Problem: Container image provisioning

• Container images usually come from an untrusted registry

• Protect them from the host

• Move container image management inside the guest

• Container images need to be signed and have verifiable integrity

• kata agent refuses to run unsigned containers

• Maybe using dm-integrity and dm-verity

• Deploying Confidential VMs via Linux[video][slides]

• Experience report of deploying SEV in Google Compute Engine (GCE)

• GCE provides VMs based on SEV (CVMs), SEV-ES not yet supported

• Problem: Getting fixes into LTS kernels

• For example: Fixes for SWIOTLB bug were only partially accepted for LTS kernels

• Need to establish a process to get fixes into distributions

• SUSE looks at stable patches and at patches with Fixes tags

• Updating the images with hardened device drivers important

• GCE VMs do not use virtio, so no virtio hardening was done

• Problem: Testing of encrypted virtualization environments

• Working on a self-test framework for SEV which can be used by distribution maintainers

• SEV-ES enablement for kvm-unit-test is being worked on

• A good start for further testing is to also run all tests for unencrypted VMs also in encrypted VMs

• How to prioritize SEV and TDX testing wrt. to upstreaming new features?

• Attestation and Secret Injection of Confidential VMs, Containers and Pods[video][slides]

• SEV and SEV-ES support pre-attestation only

• Measurement and secret injection happen before guest starts executing

• Anything which is not measured needs to be gated by a secret

• Approach: Bundle OVMF and Grub and measure them together

• Grub retrieves key from injected secrets and decrypts root partition

• Also compute hashes of kernel, initrd and command line and put them into guest memory, so that they become part of the initial measurement

• OVMF compares the hashes with kernel and initrd loaded from disk

• Kernel exports secret later via SecurityFS

• Used for attestation of software components in the running system

• Approach can be used for Confidential VMS too, not limited to containers

• In general discussion was around how to consolidate attestation and measured-boot work-flows

• It was agreed that more discussion is needed

• Different approaches for containers are discussed in the Confidential Container Initiative

• Securing Trusted Boot of Confidential VMs[video][slides]

• Decentriq running SGX in production for several years

• Problem: Providing Control Flow Integrity (CFI)

• Minimizing code base by disabling kernel features

• Harding of the remaining kernel features

• Code size does not matter as much as number of communication points between guest and hypervisor

• TDX supports attestation of kernel and initrd loaded from unencrypted media

• Removing grub from the TCB

• Difficult with standard distributions

• OVMF too heavy, need a minimal firmware which just boots a Linux kernel

• libkrun is a possible solution, provides a minimal firmware based on qboot which just jumps to the 64-bit kernel entry point

## Scheduler MC

CFP Ends: Aug 31, 2021

The Scheduler microconference focuses on deciding what process gets to run when and for how long. With different topologies and workloads, it is no easy task to give the user the best experience possible.  Schedulers are one of the most discussed topics at the Linux Kernel Mailing List, but many of these topics need further discussion in a conference format.  Indeed, the scheduler microconference is responsible for many topics to make progress.

Suggested Topics:

• Cgroup interface and other updates for core-scheduling
• Capacity Awareness – For busy systems
• Interrupt Awareness

If you are interested in participating in this microconference and have topics to propose, please use the CfP process, and select "Scheduler MC" for the "Track". More topics will be added based on CfP for this microconference.

• Dhaval Giani <dhaval.giani@oracle.com>
• Daniel Bristot de Oliveira <bristot@redhat.com>
• Chris Hyser <chris.hyser@oracle.com>
• Juri Lelli <juri.lelli@redhat.com>
• Vincent Guittot <vincent.guittot@linaro.org>

## Performance and Scalability MC

CFP Ends: Aug 31, 2021

The Performance and Scalability microconference focuses on enhancing performance and scalability in both the Linux kernel and userspace projects.  In fact, one of the purposes of this microconference is for developers from different projects to meet and collaborate – not only kernel developers but also researchers doing more experimental work.  After all, for the user to see good performance and scalability, all relevant projects must perform and scale well.

Because performance and scalability are very generic topics, this track is aimed at issues that may not be addressed in other, more specific sessions. The structure will be similar to what was followed in previous years, including topics such as synchronization primitives, bottlenecks in memory management, testing/validation, lockless algorithms and RCU, among others.

Suggested topics:

If you are interested in participating in this microconference and have topics to propose, please use the CfP process, and select "Performance and Scalability MC" for the "Track". More topics will be added based on CfP for this microconference.

• Davidlohr Bueso <dave.bueso@gmail.com>
• Daniel Jordan <daniel.m.jordan@oracle.com>
• Pavel Tatashin <pasha.tatashin@soleen.com>

## IoThree's Company MC

CFP Ends: Aug 24, 2021

The IoThree's Company microconference is moving into its third year at Plumbers.  Talks cover everything from the real-time operating systems in wireless microcontrollers, to products and protocols, to Linux kernel and tooling integration, userspace, and then all the way up to backing cloud services. The common ground we all share is an interest in improving the developer experience within the Linux ecosystem.

Suggested topics:

If you are interested in participating in this microconference and have topics to propose, please use the CfP process, and select "IoT's Company MC" for the "Track". More topics will be added based on CfP for this microconference.

• Christopher Friedt <chris@friedt.co>
• Jason Kridner  <jkridner@beagleboard.org>
• Drew Fustini <drew@beagleboard.org>

## IoThree's Company MC Summary

• Overview of LoRa and LoRaWAN support in Zephyr [slides][video]

• LoRa:

• LoRa API supported: config, send, recv, test

• P2P, no gateway needed

• LoRaWAN:

• Complete list of APIs in slides

• Only Class-A tested, but other classes should work

• Battery level, ADR, OTAA, and ABP are supported

• Based on LoraWAN Spec v1.0.4 and Regional Parameters 2-1.0.1

• Improvements planned in Zephyr for LoRa/LoRaWAN

• Persistent storage / non-sensitive data in EEPROM

• WIP code by Andreas Sandberg

• Key management / sensitive data in Secure Element (SE)

• WIP code by Andreas Sandberg

• Proposal for LoRa and LoRaWAN in Linux kernel

• A socket for LoRa and LoRaWAN

• PF_LORA, PF_LORAWAN

• LoRa as PHY: device drivers for LoRa modules

• LoRa as Soft MAC: stack written from scratch

• Long-standing effort by Andreas Farber, Jian-Hong Pan

• Very much needs volunteers to upstream

• Not updated in 3 years

• Needs to be merged upstream in small, reviewable parts

• Wireshark has lorawan filters

• Good devices to use with Zephyr / Linux support

• mikroBUS Driver Add-on Boards [video][slides][demo (no audio)]

• mikroBUS:

• an add-on board socket standard by MikroElektronika

• includes SPI, I2C, UART, PWM, ADC, GPIO and power

• 800 Click modules available today!

• Uses 1-wire EEPROM for board identification

• Status in Linux:

• Expose mikroBus as probe-able bus

• Devices are probed with combination of

• Devicetree overlay fragment

• Board-specific EEPROM ID

• How is mikroBUS different over Greybus?

• mikroBUS is a variant of the gbphy class

• gbphy cport devices created

• gbphy device has associate SPI, I2C, GPIO, controllers

• probe devices according to Greybus Manifest

• Probe board ID / metadata in EEPROM

• Instantiate mikroBUS

• 140 mikroBUS Click add-on boards tested and ready to use today!

• Sort of like a transport + discovery mechanism for non-discoverable buses (SPI, I2C, UART, PWM, GPIO)

• Originally for Project Ara modular phone

• RPC protocol to manage and control remote devices

• Greybus allows nodes to be locally connected or remote via TCP / IP

• Devices and busses appear in Linux as if they were locally connected

• Keep intelligence in the host, not the remote sensor

• What Next?

• IoT Gateway Blueprint with Thread and Matter [slides][video]

• Current State of IoT (in Home Automation context)

• A box from every different vendor (Apple, Google, Amazon, …) - needs to be reduced!

• Ignore branding and vendor lock-in - technology only!

• Brainstorming: ASOS Gateway (based on Yocto & Linux)

• WiFi AP, BLE, OpenThread Border Router

• Basic Web UI

• Matter protocol support

• containerized services / software-related updates

• Yocto used to build Linux & Zephyr

• Gateway will be run Linux, Devices will run Zephyr

• Share libraries between host and device (mbedTLS, openthread, matter)

• Predictions

• Vendors will eventually not bundle gateways, but..

• .. unlikely that there will ever be 1 box for all HA

• 6lowpan / IPv6 will become dominant

• Devices may be legally required to be capable of OTA updates

• Guidelines

• Use end-to-end solutions (IPv6 device to device or device to cloud)

• Focus on IPv6-only solutions (OpenThread, Matter)

• ISPs: provide IPv6 prefix delegation to the home, if not already

• NAT64 for IPv4-only transit networks

• Very active Open Source project

• Product and App compatibility

• Supported by numerous Zephyr devices

• IPv6-based (WiFi, Ethernet, 802.15.4)

• Matter

• Active Open Source project

• Connectivity Standards Alliance (formerly Zigbee)

• IPv6-based (WiFi, Ethernet, 802.15.4)

• Member-only forums

• IEEE 802.15.4 / linux-wpan updates

• MLME scanning, joining, coordinator, beacons need to be mainlined!

• Very similar conceptually to 802.11 (WiFi)

• Several companies have tried over the years but become too busy maintaining their own gateway (ironically)

• More functionality is needed in-kernel rather than in userspace if we want a common linux-wpan platform abstraction

• Apps not boilerplate, leveraging Android’s CHRE in Zephyr [slides][video]

• Main motivation

• Get rid of code and processes that are done over and over again

• Reduce time to market

• Reduce cost of maintainership

• Reduce implementation complexity

• Improve testability

• Improve modularity

• Focus on WHAT to build instead of HOW

• Zephyr is currently being integrated into Chromium OS Embedded Controller

• CHRE in a nutshell

• nanoapps have 3 entry points:

• nanoappStart()

• nanoappStop()

• nanoappHandleEvent()

• nanoapps run in a low-power environment

• Offload processing from applications processor

• E.g. lid angle calculation, wifi events, sensor calibration, geofencing

• RE manages events in a pub/sub model between nanoapps, devices, and other event sources

• CHRE itself is implemented in a restricted subset of C++

• nanoapp entry points have C linkage (to simplify RTOS integration)

• Application is responsible for transport

• Data serialized / deserialized using Protocol Buffers (gRPC?)

• Sensor Framework included with CHRE

• Sample-rate arbitration

• Sample-batching

• Supports power management

• Embedded Linux & RTOSes: Why not both? [slides][video]

• Board Support:

• Linux and Zephyr both do this well, with a device model, interface-based drivers, Devicetree (or ACPI) interfaces for platform data

• Kconfig for software configuration, Devicetree for hardware configuration (Zephyr)

• Other solutions include CMSIS (ARM Cortex-M)

• Vendor HALs are generally not portable

• Real-time:

• Linux has come a long way for soft-real time

• RTOSes are designed for hard real-time (e.g. via interrupts)

• Programming Languages:

• Linux supports virtually any language on most platforms

• RTOSes typically support C, but we are seeing increased support for MicroPython, Rust, WebAssembly, etc

• Distros

• Linux provides shared code / portability, learning resources, and professional support. Multiple distros, even distribution builders like Yocto

• RTOSes are typically proprietary with paid support, sponsorship for cloud services

• Linux uses e.g. apt, yum, containers, etc

• RTOSes can use Mender, SWupdate, RAUC, and Hawkbit

• Updates for RTOS firmware can be painful though, and there are a lot of areas where Linux is far ahead in terms of software (firmware) updates

• Software Integrity

• Linux and RTOSes use SBOM, RTOSes also seek e.g. safety-critical certification (e.g. MISRA)

• Confidential Computing

• Linux and RTOSes both have several frameworks available

• In summary, Linux does a number of things really well, and there are a number of areas where RTOSes could take advantage. Zephyr does a great job of this by adopting Devicetree and Kconfig from the Linux kernel. RTOSes generally are able to offer more traditional real-time capabilities.

## Tracing MC

CFP Ends: Sept 10, 2021

The Tracing microconference focuses on improvements of the Linux kernel tracing infrastructure. Ways to make it more robust, faster and easier to use. Also focus on the tooling around the tracing infrastructure will be discussed. What interfaces can be used, how to share features between the different toolkits and how to make it easier for users to see what they need from their systems.

Suggested topics:

• Tracepoints that allow faults. It may be necessary to read user space address, but currently because tracepoints disable preemption, it can not sleep, nor fault. And then there's the possibilities of causing locking issues.

• Function parameter parsing. Now that on x86 function tracing has full access to the arguments of a function, it is possible to record them as they are being traced. But knowing how to read the parameters may be difficult, because it is necessary to know the prototype of the function to do so. Having some kind of mapping between functions and how to read their parameters would be useful. Using BTF is a likely candidate.

• Consolidating tracing of return of a function. Currently there's three use cases that hook to the return of a function, and they all do it differently. kretprobes, function graph tracer, and eBPF.

• User space libraries. Now that libtraceevent, libtracefs, and libtracecmd have been released, what tooling can be built around them. Also, improving the libtraceevent API to be more intuitive.

• Improving the libtracefs API to handle kprobes and uprobes easier.

• Python interface. Working on getting the libraries a python interface to allow full tracing from within python scripts.

• Tracing containers. What would be useful to expose on creating and running containers.

If you are interested in participating in this microconference and have topics to propose, please use the CfP process, and select "Tracing MC" for the "Track". More topics will be added based on CfP for this microconference.

• Steven Rostedt <rostedt@goodmis.org>

## Tracing MC Summary

• DTrace based on BPF and tracing facilities: challenges [video][slides]

• DTrace scripts are compiled into BPF functions. There are pre-compiled functions in C to BPF. BPF is still "Bleeding edge" but production kernels usually prefer "stability" over "bleeding edge". BPF was missing "bounding values" from loading and reading from the stack (but it was stated that this was fixed on 7/13/2021 - v5.14-rc4), but this is still not in production systems. Although it was backported, there are other issues (not specified) that are not.
• Current solution uses BPF maps, but this has limitations, such as verifier cannot validate anything stored or loaded on the maps, and the values are pointers.
• Possible suggested solutions:
• Allow BPF maps to have a larger size

• Use multiple map values (but is cumbersome)

• New per-cpu memory resource. Does not need to be visible to user space.

• Preallocated with bpf_malloc/free helpers.

• Complex scripts and function loops. Perhaps a BPF helper for loops can be added that is safe to use.

• No way to currently save pt_regs from tracepoints. But as tracepoints are jumps and not break points they are similar to calling a function. How would one save registers from a function call?

• Issues with the verifier where GCC/LLVM can produce "verified" code but the fails the kernel verifier.

• Enabling user mode programs to emit into trace_event / dyn_event [video][slides]

• Want a way to allow user mode applications to send data to a user mode defined trace event. Currently possible with uprobes, but is hard for C#, Java, etc to be attached to uprobes.

• Problem: Many processes running in cgroups using different languages (Python, Rust, Go, C/C++, C#, Java), but want a single monitoring agent in the root namespace. There could be multiple tooling to trace (LTTng, uprobes, perf, etc) Need a way to have consistent data across the tools. Do not want daemons or agents running in the container name spaces.

• Proposed Solution:

• Have the user applications create / open a "user_events_data" file in tracefs. Get an event_id from this file by passing in an event name via an ioctl(). (Similar to the "inject" file for tracepoints).

• Use a mmapped page shared between the kernel and user space, where the bits of a byte lets you know what is attached (bit 0-6 for ftrace, perf, etc). Bit 7 reserved for "others". Zero mean nothing in consuming / tracing that event. The user application will check this byte, and if it is set, it will then write the trace event data into a file which will be passed to the connected consumers (the tracers).

• trace_markers was mentioned, but they do not have all the features that a trace event has (attaching triggers, histograms, synthetic events, etc).

• Discussion over field argument formats were made.

• Issues with it being a binary blob. Should be easily parsable.

• Needs to be accessible from non root. Can change permissions of files in tracefs.

• Container tracing: [video][slides]

• First, how do we define a container? As containers are just a bunch of user tasks. Yordan Karadzhov said it is trivial to figure out what tasks are part of a container if you start tracing it during its creation. But finding out what is part of a container after the fact is more difficult. Christian Brauner said most of the time the PID name space defines the container, but that is not always the case. He suggested using a "tagging" method, where one could tag the process creating a container, or adding tasks to a container, and all the children will maintain this "tag". Then to define the container the tag will be used.

• Mathieu said that LTTng traces by inode numbers. But is missing the "user given" name of the container, and tagging would solve this too.

• Mathieu also said that we need to be concerned about "nested containers".

• Masami asked about tracing from within the container, but Steven shot it down due to the fact that once you can see the events of the kernel, you are no longer inside the container. Although, it should be fine for privileged containers.

• Christian said there's some ways to get by that with eBPF from inside a container.

• Mathieu suggested having "safe" tracepoints that the container is allowed to use.

• Yordan asked about other use cases for container tracing, like for security, network activity across nodes, etc, but the conversation still went back to what is a container.

• Christian mentioned that there is mechanisms in core scheduling that could be used as well.

• Tracepoints that allow faults [video][slides]

• Tracing of system calls happen at the start of the system call. If there's a pointer to user space, and that address is not yet mapped in, then it can not be retrieved because tracepoints are protected by preemption disabled and retrieving user space memory requires scheduling.

• Suggested extending the tracepoint and tracepoint APIs to include callbacks with preemption enabled, but the tracers will need to know this, as they currently expect preemption disabled. Will require updating to use task trace RCU mechanisms.

• Steven mentioned that the use of synthetic events connecting the entry and exit of a system call to trigger the event on the exit of the system call where the user space addresses would be resolved. Mathieu asked about exec, as the old addresses would be discarded. Although, the exec system call usually doesn't suffer not having the address loaded, as the path is usually loaded as one of the exec system call parameters.

• Mathieu suggested that we could just have preemption disabled before calling the perf and ftrace tracers.

• Peter Zijlstra noticed that the rcu task trace sends IPIs to CPUs that are running NO HZ user space tasks, which must be fixed first.

• LTTng as a fast system call tracer [video][slides]

• How to get at least part of LTTng into mainline.

• Need to break it up and find a minimal set for upstreaming. Could look at the systemcall with page faults first.

• Steven wants more kernel developers to be interested in having LTTng upstreamed, or at least users that are pushing for it. Having a demand is needed, instead of just Steven and Mathieu deciding it.

• Need a way to pass argument types "user" and "kernel" to the tracers.

• Masami suggested using ioctl() as they have lots of data structures.

• Steven suggested using BTF for defining arguments for system calls.

• Eventfs based upon VFS to reduce memory footprint [video][slides]

• The tracefs memory footprint is large due to the events directory. Every event system has a directory dentry, as well as an enable and filter file.

• The events themselves each have their own directory as well as several files to enabled the events and to enable triggers, filters, and to read the event formats. The event directory was measured to be around 9 MB. When an instance is created (for more than one trace buffer) it duplicates the events directory allocating a new 9 MB of memory. There needs to be a way to have just one copy of the meta data, and dynamically create the directories and files as they are referenced. Ideally, this will remove 75% from the 9MB.

• Ted Tso mentioned that sysfs has the same complaints about the size of the files for the objects and it could use this too.

• There was a lot of acceptance that this would be good to achieve, not only for the proposed eventfs, but for the other pseudo file systems as well.

• Function tracing with arguments [video][slides]

• Currently function tracing only traces the function name (being traced) and the parent. As of 5.11 the function trace (on x86_64) gets access to the registers and stack that is needed to retrieve the parameters for every function by default. Now all that is needed is to implement a way to do so. Needed is the way to know what is needed for the arguments on a function by function basis. BTF currently is in the kernel with that information, but there isn't a fast way to retrieve it (needed at time the functions are being traced).

• Masami mentioned that BTF describes the arguments for each function but does not describe the registers to retrieve those arguments. That is different on each arch. It was mentioned to record the raw data (just having knowing what regs and stack is needed to save into the trace buffer, then post process the names and parsing at the time of reading the trace.

• BTF information may be tricky as finding data for modules may be different than for the core kernel. The split BTF base for modules is may not be global unique.

• Merging the return caller infrastructures [video][slides]

• There are currently three subsystems that do tricks to add a callback to the return of a function: kretprobes, function graph tracer, and BPF direct trampolines. Kretprobes and function graph tracer "hijack" the return pointer and insert a return to a trampoline that does the callback and then returns to the original return address that was saved on a shadow stack. BPF has the trampoline call the function being traced and simply has the return side on the trampoline do the callback and return normally itself. But this has issues if there are parameters on the stack, as those need to be copied again when calling the traced function.

• Peter stated that having one infrastructure would be helpful for the CFI "shadow stack" verification code.

• Steven stated that function_graph tracer is simplest because it is fixed in what it does (record return and timestamp). The kretprobe calls generic code that may expect more (need to know more about the state of the system at the return, regs, etc).

• Masami wants to make a single "shadow stack" that both function graph tracer and kretprobes can use. Having multiple users of the shadow stack can probably work the same way tail calls work. The first caller "hijacks" the return address and places the address of its return trampoline on the stack, then when the next user of the shadow stack does its replacement, it will "hijack" the stack in the same location and save the return to the previous return trampoline onto the shadow stack and replace it on the main stack with the address of its return trampoline. When its return trampoline is called, it will put back the return to the previous trampoline and call that.

• Masami mentioned that the kretprobes can be updated to use a generic shadow stack, as it currently uses a shadow stack per probe. Peter said that he had coded to do that somewhere, but doesn't think it went anywhere. Need to investigate that further. Steve said that he would rip out the logic of the function graph tracer's shadow stack and work on making a generic. On x86, Peter said that task stacks are now 16K each. Steven thinks that 4K for the shadow stack should work. But work needs to make sure that if there's no room on the stack, the tracer needs to test if it gets the stack and be able to safely fail if there's no room left on the stack.

• BPF can't do it generically, because it only saves the necessary args per function.

## Toolchains and Kernel MC

CFP Ends: Aug 14, 2021

The Toolchains and Kernel microconference focuses on topics of interest related to building the Linux kernel. The goal is to get kernel developers and toolchain developers together to discuss outstanding or upcoming issues, feature requests, and further collaboration.

Suggested topics:

If you are interested in participating in this microconference and have topics to propose, please use the CfP process, and select "Toolchains and Kernel MC" for the "Track". More topics will be added based on CfP for this microconference.

• Jose E. Marchesi <jose.marchesi@oracle.com>

## Real-time MC

CFP Ends: Sept 10, 2021

The Real-time microconference focuses on finishing the last lap of getting the PREEMPT_RT patch set into mainline. Many of these missing pieces, however, are not at the core of real-time features (like locking, and scheduling), but instead, on other subsystems that compose the kernel, like file systems and memory management. Making this Linux subsystems compatible with PREEMPT_RT requires finding a solution that is acceptable by subsystem maintainer, without having these subsystems suffer from performance or complexity issues.

Suggested topics:

• New tools for PREEMPT_RT analysis.
• How do we teach the rest of the kernel developers how not to break PREEMPT_RT?
• Stable maintainers tools discussion & improvements.
• The usage of PREEMPT_RT on safety-critical systems: what do we need to do?
• Make NAPI and the kernel-rt working better together.
• Migrate disable and the problems that they cause on rt tasks.
• It is time to discuss the "BKL"-like style of our preempt/bh/irq_disable() synchronization functions.
• How do we close the documentation gap
• The status of the merge, and how can we resolve the last issues that block the merge.
• Invite the developers of the areas where patches are still under discussion to help to find an agreement.
• How can we improve the testing of the -rt, to follow the problems raised as Linus tree advances?
• What’s next?

If you are interested in participating in this microconference and have topics to propose, please use the CfP process, and select "Real-time MC" for the "Track". More topics will be added based on CfP for this microconference.

• Daniel Bristot de Oliveira <bristot@redhat.com>
• Clark Williams <williams@redhat.com>
• Steven Rostedt <rostedt@goodmis.org>
• Dhaval Giani <dhaval.giani@oracle.com>
• Kate Stewart <stewart@linux.com>

## Real-time MC Summary

• Maintaining PREEMPT_RT: Now and Then [video][slides]

• Regarding the current approach to manage patches, the PREEMPT_RT uses quilt for queueing patches, because git is not viable for initial modifications. The changes that are applicable to upstream are submitted to Linus' tree.
• One of the main questions for the future is how things will work upstream. Sebastian asks that the RT developers be consulted before attempting a fix for a warning. Daniel Bristot then asked: how will the developers know who to contact for -rt bugs when the rt merge happens? Steven Rostedt and Clark Williams asked about a methodology to know who to contact in case of problems. But Thomas mentioned that the things will work as they work nowadays with the existing problems. Thomas said it's obvious who to talk to when testing picks up a splat with PREEMPT_RT on and it goes away when that is off. Steve agrees that this problem is not just rt, but other hard areas such as RCU. Clark worries that people outside of our community may still not know who to go to. The final agreement is that these problems should be discussed on the rt-linux-users List.
• Juri Lelli suggested more automated CI to automate the BUG reports, and Luis Claudio Suggested a list of tests to use as CI, using the already existing infrastructure from companies (intel, red hat, ...). Mark Brown mentioned that there are other companies, mostly in the embedded world that could also help on that. Daniel Wagner working on getting Lava testing working for PREEMPT_RT on embedded boards. Clark mentioned the list of problems that we can have with different configs, and also mentioned BIOS being a problem for getting real-time performance, and tuning that can interfere with automated testing of real-time performance. But then Daniel said that the BIOS problem is not necessarily an issue: Instead of focusing on comparing all machines one to another, the results should be compared only for that previous versions on the same machine.

• MRTLA: An Interface for osnoise/timerlat tracers [video][slides]

• Nowadays there are two main types of RT workloads: periodic (timer-based) and polling (busy loop). cyclictest is the most used periodic test, and sysjitter/oslat are poll-mode tests. When a user sees bad results, it is hard for most people to get started troubleshooting. The tracers, themselves, already give you valuable information, though with the risk of high volume. (Natural to a tracing session).
• Recently, two new tracers were added to the kernel, the timerlat, and osnoise tracers which aim to provide a metric similar to cyclictest and sysjitter/oslat. While they are good at providing snapshots of the results in the trace, they are not good for collecting long-term results. Daniel is proposing a tool named RTLA to overcome this limitation.
• RTLA stands for Real-time Linux Analysis tool. It is an interface for osnoise and timerlat tracer and provides a "benchmark like" interface for these tools.
• Thomas asked why not in rust, and Daniel said that most of the infrastructure around it is in C, like libtracefs. Daniel also mentioned that he is not using eBPF for now, but will likely use it in the future. Daniel and John Kacur discussed how we might integrate rtla as a measurement module in rteval, and it seems to be feasible.
• Finally, Daniel made questions about tracing features and discovered that he misunderstood the "size" of kernel histograms. He will fix it in the next version. Daniel raised the idea of adding features to libtracefs. Steven said that requests should be filed in Bugzilla. Daniel requested non-consuming text output from the trace, but that would be hard. So Daniel will continue with the multiple instances approach.

• Linux kernel support for kernel thread starvation avoidance [video][slides]

• Using PREEMPT_RT for 5G networks requires less than 10 us latency. In these cases, the users want to run a busy-loop tool that takes 100% of the time, using FIFO priority. As a side effect causing starvation of housekeeping threads from the kernel that needs to run on all CPUs. The starvation of threads can cause some malfunctions on other subsystems, for example, container destroy causes the hang immediately.
• While the kernel has a mechanism to avoid starvation, named real-time throttling, it lacks precision on the us scale. To work around this problem, people at red hat have developed a tool named stalld. The tool works in user-space, parsing sched debug options, detecting real-time tasks causing starvation, and boosting starving threads using SCHED_DEADLINE. The tool works but presents some limitations.
• The authors mentioned that stalld does not scale because it needs one monitoring thread per cpu, that the tool can starve on locks, most notably on those locks taken to log. The authors proposed an in-kernel, per-cpu algorithm, to detect starving threads using hooks to the scheduler events and per-cpu high-resolution timers.
• Daniel Bristot (stalld's author) mentioned that the mentioned limitations of stalld were gone. Daniel has implemented a single-threaded version that had no drawback when compared to the multi-threaded version. The CPU usage was also dramatically reduced. To solve the starvation limitation, stalld can run with FIFO or even SCHED_DEADLINE priority.
• But the main point raised by Daniel is that using a per-cpu approach, with on interrupts to track starvation is that the mechanism can cause noise to the busy-loop thread, which is not what users want, and that is why stalld monitors the CPUs remotely. Stalld does not run any code on the monitored/isolated CPU, thus reducing the system noise to the minimum possible.
• Thomas says the real issue is with the other areas of the kernel running threads on the NOHZ_FULL CPU. Daniel said that this was a reason why stalld was implemented in user-space: when the fully isolated CPU becomes a reality, stalld can be threshed away, without adding restrictions to the kernel.

• futex2: next steps [video][slides]

• futex2 is a new set of syscalls to solve long-standing issues with the current interface. It solves NUMA, wait on multiple and variable size issues. The futex2 is already a patchset under discussion, refactoring to make patches smaller, easier to comprehend.
• The discussion went toward the problem with the structure to define the time. timespec is not the best way to go, peter would like to see __u64 for timeouts. André asked if that is a good way, and Thomas said that he agrees with __64 - even if it will be a problem in the long term future.- Steven mentioned working around the future problem via libc. Thomas, said we want both absolute and relative values for the time value. In the end, the recommendation was to stay with __kernel_timespec rather than __u64 time value. Also recommending that a clockid be added as an argument to futex_waitv() so that we can select the clock being used for the time value.
• futex_waitv structure looking at dropping the __reserved member to save memory, argument on whether the memory saved is worth the effort. Arnd asserts that we should optimize for 64-bit performance going forward. The conclusion is to move forward with the __u64 uaddr struct. NUMA aware futex2 is interesting but needs to look at it much harder, need buy-in from users (glibc, databases, JITs using raw futexes).

• printk and real-time [video][slides]

• The topic started explaining why printk is so important to PREEMPT_RT users. One of the main challenges is to ensure that printk can print from any context. To do that the printk call part needs to be decoupled from actual printing output to terminal (any type of terminal), and this is done using a ring-buffer to store the printk messages.
• The 5.15 will have completely safe lockless ringbuffer, but this component is not upstream yet. The idea going foward is to use printk caller when not in the panic(), with atomic consoles that will be used in this situation (panic consoles). Panic consoles do not need to worry about synchronization, with the exception of the serial terminals?
• KGDB is a special case where we are trying to debug the kernel, so it is a special case. The debate going on about how to transfer ownership of console cpu-lock to kgdb. Rename the lock to "cpu_sync" could be an option since the console cpu-lock isn't really a lock but just a mechanism to syncronize access to consoles (somewhat cooperatively)
• Regarding atomic consoles, they are implemented on the mainline with polling APIs. But they are only implemented on PREEMPT_RT with physical 8250 UART. Developers are currently trying to find best path for implementing them for PREEMPT_RT.

• PREEMPT_RT: Status and QA [video]

• The first question was regarding the PREEMPT_RT merge. The answer was that the kernel 5.15 has locking and mm patches for PREEMPT_RT. The major parts of PREEMPT_RT are already mainline. But still won't boot due to open issues. For instance, namespace issues still exist, but it is not the major concern. Thomas points to problems in the network side as the most critical part. There are also some mm and fs problems, but they are not a big point because these problems are not strictly related to the PREEMPT_RT.
• Softirq latencies have problems with the non-rt kernel: is any work of the RT patchset helping that? Mainly because of latency problems faced with android. Thomas replied that all the work done in this regard is a band-aid. We need to get out from softirq - and softirq disable limitations.
• Another question was if there is any plan to get ktimersoftd (timer softirq specific thread) back?
• Thomas said that It might happen, but it is not a priority - it is a budget limitation - that limits the bandwidth.
• Do we need to care about arch-specific problems for PREEMPT_RT? Thomas said that there is not too much to worry about them.
• Finally, in a discussion about system safety, Thomas said that there are already cases of Linux being used on critical systems, like on military systems. but things are way more complex in the ADAS. Daniel Bristot said that there will be a talk about this topic the next day. This talk was A maintainable, scalable, and verifiable SW architectural design model for the Linux Kernel.

## Testing and Fuzzing MC

CFP Ends: Sept 10, 2021

The Testing and Fuzzing microconference focuses on advancing the current state of testing of the Linux kernel. We aim to create connections between folks working on similar projects, and help individual projects make progress.

We ask that any topic discussions will focus on issues/problems they are facing and possible alternatives to resolving them. The Microconference is open to all topics related to testing & fuzzing on Linux, not necessarily in the kernel space.

Suggested topics:

• KernelCI: Extending coverage and improving user experience.
• Growing KCIDB, integrating more sources.
• Better sanitizers: KFENCE, improving KCSAN.
• Using Clang for better testing coverage: Now that the kernel fully supports building with clang, how can all that work be leveraged into using clang's features?
• How to spread KUnit throughout the kernel?
• Testing in-kernel Rust code.

If you are interested in participating in this microconference and have topics to propose, please use the CfP process, and select "Testing and Fuzzing MC" for the "Track". More topics will be added based on CfP for this microconference.

• Sasha Levin <sashal@kernel.org>
• Guillaume Tucker <guillaume.tucker@collabora.com>

## File System MC

CFP Ends: Sept 15, 2021

The File system microconference focuses on a variety of file system related topics in the Linux ecosystem. Interesting topics about enabling new features in the file system ecosystem as a whole, interface improvements, interesting work being done, really anything related to file systems and their use in general. Often times file system people create interfaces that are slow to be used, or get used in new and interesting ways that we did not think about initially. Having these discussions with the larger community will help us work towards more complete solutions and happier developers and users overall.

Suggested topics:

• DAX - are we finally ready for prime time?
• Optimizing for cloud block devices. How do we deal with unstable transport? Do we need to rethink our IO path?
• Atomic writes, and FIEXCHANGE_RANGE
• Writeback throttling - we have a lot of different solutions, are we happy with the current state of affairs?
• Page Folios
• RWF_ENCODED
• Performance testing

If you are interested in participating in this microconference and have topics to propose, please use the CfP process, and select "File System MC" for the "Track". More topics will be added based on CfP for this microconference.

• Josef Bacik <josef@toxicpanda.com>
• Amir Goldstein <amir73il@gmail.com>
• Ted Ts'o <theodore.tso@gmail.com>
• Jan Kara <jack@suse.cz>

## File System MC Summary

• Efficient buffered I/O (Page folios) [video]

• Matthew Wilcox talked about the work that filesystem developers need to do in order to convert disk or network filesystems to use Folios

• Josef Bacik and Ted Ts’o offered to help with performance testing

• Covered better by https://lwn.net/Articles/869942/

• Idmapped mounts [video][slides]

• Christian Brauner gave a quick overview about the idmapped mounts feature that was merged for v5.12, what they are used for and how they are implemented in the VFS.

• More background can be found at https://lwn.net/Articles/837566/

• At the moment, only privileged users (typically systemd or container runtime) are allowed to create idmapped mounts for the use by processes running inside a user namespace

• In a followup session, Jan Kara talked about Project ids and what their semantics should be w.r.t idmapped mounts use cases

• The existing project id semantics are quite old, so the community may propose new semantics to meet modern use cases, but first, a proper specification for the new semantics needs to be drafted

• Atomic file writes [video]

• Darrick Wong talked about a proposal for implementing filesystem level atomic writes using FIEXCHANGE_RANGE ioctl (https://lwn.net/Articles/851392/)

• The proposed API is a lot more flexible and predictable than the hardware atomic write capability that is available on some storage arrays, that is very hard to use in portable applications

• Filesystem shrink [video][slides]

• Allison Henderson talked about the technical challenges of shrinking the logical size of a filesystem, in cases where thin provisioning is not provided by the underlying block layer.

• Ted Ts'o explained that the requirement is driven by the fact that Cloud vendors charge customers by the logical size of the provisioned block device and not by the actual Cloud backend storage usage -  if we could get Cloud vendors to change their pricing model, there would probably be no need to shrink filesystem logical size

• Bad Storage vs. File Systems [video][slides]

• Ted Ts’o and Darrick Wong talked about lessons learned from running filesystems on top of unreliable Cloud backend storage.

• Different use cases vary greatly in what is the best thing to do when I/O error occurs when writing data or metadata blocks

• Josef Bacik held a strong opinion that error handling should be delegated to applications and that it should not be a filesystem decision

• Ted Ts’o argued that some mechanisms, such as forcing a kernel panic, makes sense to do in the kernel without involving userspace.   Another mechanism to delegate the decision to system administrators might involve using eBPF.

• Darrick Wong talked about some of the XFS work done in 2021 and what is planned for 2022

• Josef Bacik talked about some of the btrfs work done in 2021 and what is planned for 2022

• Matthew Wilcox talked about some more plans for improvements of page cache after Folios

• Josef Bacik, Ted T’so and Darrick Wong talked about their regression test setups and how to better collaborate the regression tracking efforts

## VFIO/IOMMU/PCI MC

CFP Ends: Sept 10, 2021

The VFIO/IOMMU/PCI micro-conference focuses on coordination between the PCI devices, the IOMMUs they are connected to and the VFIO layer used to manage them (for userspace access and device passthrough) with related kernel interfaces and userspace APIs to be designed in-sync and in a clean way for all three sub-systems, and on the kernel code that enables these new system features that often require coordination between the VFIO, IOMMU and PCI sub-systems.

Suggested topics:

• VFIO

• Write-combine on non-x86 architectures

• I/O Page Fault (IOPF) for passthrough devices

• Shared Virtual Addressing (SVA) interface

• Single-root I/O Virtualization(SRIOV)/Process Address Space ID (PASID) integration

• PASID in SRIOV virtual functions

• Device assignment/sub-assignment

• IOMMU

• IOMMU virtualization

• IOMMU drivers SVA interface

• I/O Address Space ID Allocator (IOASID) and /dev/ioasid userspace API (uAPI) proposal

• Possible IOMMU core changes (e.g., better integration with device-driver core, etc.)

• PCI

If you are interested in participating in this microconference and have topics to propose, please use the CfP process, and select "VFIO/IOMMU/PCI MC" for the "Track". More topics will be added based on CfP for this microconference.

• Alex Williamson <alex.williamson@redhat.com>
• Bjorn Helgaas <bjorn@helgaas.com>
• Joerg Roedel <joro@8bytes.org>
• Krzysztof Wilczyński <kw@linux.com>
• Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

## Open Printing MC

CFP Ends: Sept 10, 2021

The Open Printing microconference focuses on improving and modernizing the way we print in Linux.

Suggested topics:

• Changes in CUPS 2.4.x
• Print sharing changes for mobile
• OAauth support to replace Kerberos
• Printer drivers replaced with Printer Applications
• TLS/X.509 changes
• CUPS in containers
• CUPS 3.0
• Future CUPS development
• Identify support platforms
• Key printing system components
• Discuss integration with Printer Applications and application stores like Snap Store
• Print Management GUI
• Migrating from working with CUPS queues to IPP services
• Handling legacy devices that do not handle IPP services
• Common Print Dialog Backends
• CPDB, CUPS backend.
• Separating GUI toolkits and the print technology support to be independent from each other.
• Printer/Scanner Driver Design and Development

If you are interested in participating in this microconference and have topics to propose, please use the CfP process, and select "Open Printing MC" for the "Track". More topics will be added based on CfP for this microconference.

• Aveek Basu <basu.aveek@gmail.com>
• Till Kamppeter <till.kamppeter@gmail.com>
• Michael Sweet <msweet+lpc@msweet.org>
• Ira McDonald <blueroofmusic@gmail.com>

## RISC-V MC

CFP Ends: Sept 10, 2021

The RISC-V microconference focuses on the development of RISC-V.

Suggested topics:

• Platform specification progress, including SBI-0.3 and the future plans for SBI-0.4. There has been significant progress on the platform specifications, including a server profile, that needs discussion.
• Privileged specification progress, possible 1.12 (which is a work in progress at the foundation).
• Support for the V and B specifications, along with questions about the drafts. The V extension is of particular interest, as there are implementation of the draft extensions that are likely to be incompatible with what will eventually be ratified so we need to discuss what exactly user ABI compatibility means.
• H extension / KVM discussion, which is probably part of the drafts.  The KVM port has been hung up on the H extension ratification process, which is unlikely to proceed any time soon. We should discuss other options for a KVM port that avoid waiting for the H extension.
• Support for the batch of SOCs currently landing (JH7100, D1)
• Support for non-coherent systems
• How to handle compliance.

If you are interested in participating in this microconference and have topics to propose, please use the CfP process, and select "RISC-V MC" for the "Track". More topics will be added based on CfP for this microconference.

• Palmer Dabbelt <palmer@dabbelt.com>
• ATISH PATRA <atish.patra@wdc.com>

## Kernel Dependability and Assurance MC

CFP Ends: Sept 10, 2021

The Kernel Dependability and Assurance Microconference focuses on infrastructure to be able to assure software quality and that the Linux kernel is dependable in applications that require predictability and trust.

Suggested topics:

• Identify missing features that will provide assurance in safety critical systems.
• Which test coverage infrastructures are most effective to provide evidence for kernel quality assurance? How should it be measured?
• Explore ways to improve testing framework and tests in the kernel with a specific goal to increase traceability and code coverage.
• Regression Testing for safety: Prioritize configurations and tests critical and important for quality and dependability

If you are interested in participating in this microconference and have topics to propose, please use the CfP process, and select "Kernel Dependability and Assurance MC" for the "Track". More topics will be added based on CfP for this microconference.

• Shuah Khan <skhan@linuxfoundation.org>
• Gabriele Paoloni <paoloni.gabriele@gmail.com>

## Kernel Dependability MC Summary

The Kernel Dependability and Assurance Microconference focuses on infrastructure to be able to assure software quality and that the Linux kernel is dependable in applications that require predictability and trust.

• Runtime redundancy and monitoring for critical subsystem/components [video] [slides]

• ASIL decomposition: Separate out the simpler component from the complex component by breaking the top level requirement into multiple simpler requirements. ASILB Hypervisor separates out the QM(B) Linux kernel from the ASILB Safe OS. But this is expensive.

• One way to accomplish ASIL decomposition is separating out the subsystems that have these requirements. The monitor and the element must be separate. Things to consider are:

• Can use the runtime verification on the monitor?

• How do we separate the monitor and the element as they are running in the same address space?

• The monitor is driven by events (via trace events?). It keeps variables to save state in read only mode. It must keep the element from corrupting the monitor variables. One way is to accomplish this goal is by making it all read only, but have the fault handler on write to check if the write is from the monitor and then perform write.

• The hypervisor will detect a kernel crash and triple fault concern is covered by this.

• Could the monitor be used by security purposes too?

• Perhaps keep the element's address space from corrupting the kernel. But still need to protect against the kernel itself corrupting the monitor. The monitor must be verified.

Outcome: Work on a few ideas to design a monitor, implement a prototype, and get feedback.

• Traceability and code coverage: what we have in Linux and how it contributes to safety [video] [slides]

• Overview of RH Automotive initiative

• OS distribution based on RHEL to support automotive safety critical applications

• Overview of CKI and Kernel CI

• CKI is RH public facing CI and member of the Kernel CI

• Importance of code coverage in ISO26262:

• need to evaluate the completeness of the test coverage and can be used also to verify correct traceability from top level requirements to design element of the Kernel

• Code Coverage Analysis

• fetches a pre-built RHEL gcov kernel

• uses lcov to generate reports

• multiple reports can be combined to have an overall view

• Code Coverage Example in the slide set.

• Targeted Testing: it's a tool called KPET that triggers tests automatically as soon as a patch is submitted to verify if it is covered.

• Possible improvement for CKI:

• gating patches that are missing code coverage;

• Integrating code coverage analysis into the pipeline

• Discussion

• Who is using code coverage? Is it useful?

• What is the % coverage target to aim for? Is 90% achievable

• How often should we generate coverage reports?....for every patch it is expensive

• What is the alternative to code coverage

• Like the idea of gating patches with missing targeted tests. currently we are doing this manually. This process will be automated integrating code coverage in the pipeline

• What code is covered by customer workloads; gcov has performance impacts that make it inappropriate for customer-deployed kernels. It is hard to get a report if a test misbehave and stop the test compile. Looking at different parts of the Kernel I think we are away to make gating patches a feasible feature, but it is a good long term goal. We need maintainers commitment to make this happen. There is not a correct coverage target. Don’t know of any project achieving 90% target. In most of code basis the target is between 70% and 80% percent as achievable. Approach of tackling it with maintainers subsystem by subsystem is reasonable way to approach this.

• Coverage is important to make sure that we tested all the meaningful states of the system. In Automotive code coverage analysis is used at subsystem level to evaluate the coverage trend and make sure it is positive. Once we have a model for the subsystem, we can identify the states we're missing to trigger. Using RV to monitor to understand which path being taken. They are correlated. It is important to come up with a strategy to share with maintainers what their coverage looks like in concise and easy to read reports. Challenge is to have the requirements that motivate new patches also be systematically recorded, and the testing appropriate be tracked. It would be good if requirements were written in a way that help to derive test cases

• Coverage reports from syzkaller provides coverage: https://syzkaller.appspot.com/upstream
• Coverage reports are very useful for assessing how good is the testing, what's missing, etc. The only way to ensure that what developers think is covered is really covered.

Outcome: Agreement that having code coverage would be useful, challenge is now how to make it practical and accessible to maintainers. Goal would be to evolve to coverage on patches matched back to requirements.

• Adding kernel-specific test coverage to GCC's -fanalyzer option [video] [slides]

• fanalyzer is a compiler option to enable static analysis inside the compiler. fanalyzer is not complete nor fully correct; it shows false negatives as well as false positives

• Wow to extend the analyzer: information leaks uninitialized kernel memory copied to user space
untrusted data used by a trusted kernel service where should the code to enable this live? Right now David is using is own branch with few kernel hacks

• Across trust boundaries. Suggestion to use__user (Annotation of function that does the copy across trust boundaries) David can reinvestigate if gcc provide an attribute space that the analyzer can pick up on. David invented also the attribute tainted to mark a risk of untrusted data coming as input to this function. Prefer untrusted over tainted in kernel, and use define.

• Kernel specific parts are in Red Hat until responsible disclosure path figured out. Without LTO it is hard to check the whole call tree where data can be tainted. We can quiet false positives by turning off checkers to fix false positive and then turn back on. This should probably be a config option. So time is not necessarily an issue. Default to off. Sparse is running all the time. Request to share the tools so we can be one step ahead of black hats. The best way to integrate – send it to Kernel community to figure out integration.

Outcome: Like to see this in the kernel and available.

• A bug is NOT a bug is NOT a bug [video] [slides]

Differences in bug classes, bug tracking and bug impact. Explore what is a bug, how to track a bug and what is the impact of a bug?

• What is a bug: spelling mistake? compiler warning race conditions?

• every commit that doesn't add a feature is a bug-fix

• Exploring which tools are available to: detect bugs, track bugs, and report bugs upstream

• Classifying different classes of bugs: identified by humans, automated tools, fuzzing, bug reported by humans, reported by bugzilla, and bugs reported by automated testing systems:

• how are these bugs tracked? how are the fixes linked to the failing tests

• Compiler warning bug report: is the parsing precise? how the reporting works?

• Bug report identified by fuzzing: uses its own bug tracking tool

Discussion

Can find a way to consistently report bugs without all these tools spreading around? Assuming we find a tool to report all the bug classes, how do we present meaningful information to the maintainers?
How can we build something that is able to report bugs fast enough?

• If a bug gets to linux-next it is still fine as long as it does not land upstream, however a bug implies a rework that would mean another patch on top of linux-next. Not all spelling mistakes are a security threat. However, typo squatting is a vulnerability so they could be considered bugs. It depends where the spelling mistake is. In comments, no. In code, it could be. It would be good if we can get to a point where we reject features breaking kselftest. A final test by Linus involves compile test backed by a set of CIs that we can point to showing that there are no bugs. Doing sanity checks is a good idea as it can spot bugs even though it doesn't guarantee that there no integration issues in linux-next. Every maintainer has got his own test suite. It isn’t possible to test on all supported hardware. CI rings don’t or can’t host all supported systems and hardware. It isn’t practical to expect them to. It would be nice to have a checklist to be met by contributors before sending a patch

• How do you know which tests to run for a subsystem? Should maintainer re-run tests identified for the patch? How do you prove that the tests were run.

• Threshold for spelling mistakes: Compiler warnings are considered before pulling into Linux-next; spelling mistakes get pulled into linux

Outcome: No clear outcome from this discussion.

• Kernel cgroups and namespaces: can they contribute to freedom from interference claims? [video] [slides]

FFI is the lack of cascading failures from one item to another item in the system. Dependent failure analysis as well as fault tree analysis can b very valuable to support FFI claims. Containers are enabled by namespaces and cgroups.

How to mitigate time interference due to storage BW being eaten up? For instance, Intel having CAT technology but it is never used to prevent this type of interference. From a QE point of view we can develop tests to verify this. How can you isolate the CPU resource itself? Don't think you can isolate CPUs using cgroups. cgroups allow pinning but they do not make them invisible. Isolating CPUs can provide the right visibility of available CPUs to applications running on top or to deny a view of resources allocation to prevent security attacks.

Discussion

• Do we have enough namespaces or cgroups controllers?

• Controller subsystem negotiation (granularity ok)?

• Cgroup-v2 not yet controlling RT process, all must be in root cgroup for cpu controller to be enabled...updates?

• Any thoughts about KVM? (long version: Can HVs enable an IO contention that existing control surfaces cannot ameliorate?)

• Are virtualized GPU functions under control?

Outcome: No clear outcome from this discussion.

• Kernel testing frameworks [video] [slides]

GCOV - summaries don’t tell whether your testing is good or bad. Kselftest & KUnit - combined can be achieve goals. Test plan to think about paths being tested. Keeping these frameworks in kernel tree, so that tests are kept with code. KCOV is quite different from GCOV and replace one with another and vice versa. There's also the testing overview page in the kernel docs which covers the differences between KUnit and kselftest: https://www.kernel.org/doc/html/latest/dev-tools/testing-overview.html

Outcome: Feel free to continue conversation at next ELISA workshop.

## System Boot and Security MC

CFP Ends: Sept 15, 2021

The System Boot and Security microconference focuses on the firmware, bootloaders, system boot and security around the Linux system. It also welcomes discussions around legal and organizational issues that hinder cooperation between companies and organizations to bring together a secure system.

Suggested topics:

If you are interested in participating in this microconference and have topics to propose, please use the CfP process, and select "System Boot and Security MC" for the "Track". More topics will be added based on CfP for this microconference.

• Daniel Kiper <dkiper@net-space.pl>
• Piotr Król  <piotr.krol@3mdeb.com>
• Matthew Garrett <mjg59-plumbers@srcf.ucam.org>

## Android MC

CFP Ends: Sept 7, 2021

The Android microconference focuses on cooperation between the Android and Linux communities.

Suggested topics:

• Alignment issues between Android and Cgroups v2: Issues in refactoring Android's use of cgroups to utilize cgroups v2
• Thermal: Issues around performance and thermal handling between the kernel and Android's HAL
• Fuse/device-mapper/other storage: Discuss out-of-tree dm/storage drivers and how they might go upstream or better align with upstream efforts
• In kernel memory tracking: Tracking/account GPU (and other multi-device shared) memory and how it might fit with cgroups
• fw_devlink: Remaining fw_devlink issues to resolve, now that its enabled by default.
• Hermetic builds/Kbuild
• GKI updates: Whats new this year in GKI and where its going next year
• Rust in AOSP / Kernel / Binder: How Android is adopting rust for userland and potentially for kernel drivers
• Android Automotive OS Reference Platform: Details on recent Android Automotive work
• Community devboard/device Collaboration: Ways to better collaborate on enabling various devboard against AOSP, without needing close interlock with Google

If you are interested in participating in this microconference and have topics to propose, please use the CfP process, and select "Android MC" for the "Track". More topics will be added based on CfP for this microconference.

• Karim Yaghmour <karim.yaghmour@opersys.com>
• John Stultz <john.stultz@linaro.org>
• Amit Pundir  <amit.pundir@linaro.org>
• Sumit Semwal <sumit.semwal@linaro.org>
• Ric Wheeler <ricwheeler@fb.com>

## GPU/media/AI buffer management and interop MC

CFP Ends: Sept 10, 2021

The GPU/media/AI buffer management and interop microconference focuses on Linux kernel support for new graphics hardware that is coming out in the near future.  Most vendors are also moving to firmware control of job scheduling, additionally complicating the DRM subsystem's model of open user space for all drivers and API. This has been a lively topic with neural-network accelerators in particular, which were accepted into an alternate subsystem to avoid the open-user space requirement, something which was later regretted.

As all of these changes impact both media and neural-network accelerators, this Linux Plumbers Conference microconference allows us to open the discussion past the graphics community and into the wider kernel community. Much of the graphics-specific integration will be discussed at XDC the prior week, but particularly with cgroup integration of memory and job scheduling being a topic, plus the already-complicated integration into the memory-management subsystem, input from core kernel developers would be much appreciated.

Suggested topics:

If you are interested in participating in this microconference and have topics to propose, please use the CfP process, and select "GPU/media/AI buffer management and interop MC" for the "Track". More topics will be added based on CfP for this microconference.

• Daniel Stone <daniel@fooishbar.org>

## Diversity, Equity and Inclusion MC

CFP Ends: Sept 10, 2021

The Diversity, Equity and Inclusion microconference focuses on identifying how we can improve the diversity of new contributors and retain existing developers and maintainers in the kernel community.  As Linux kernel community turns 30 this year, we need to understand what is working and where we can improve (and how).  Experts from the DEI research community will share their perspectives, together with the perspectives from the Linux community members to help determine some next steps.

Suggested topics:

• What are the challenges in attracting and retaining a diverse group of developers that are worth focusing on.
• Does the Code of Conduct and Inclusive naming efforts help people of diverse groups feel at home? What else is missing?
• How effective have the kernel mentoring initiatives been? Are there best practices emerging that help the limited pool of mentors be more effective?
• What will be the most effective next steps for advancing Diversity, Equity and Inclusion that will improve the trends, and help us scale?

If you are interested in participating in this microconference and have topics to propose, please use the CfP process, and select "Diversity

Equity and Inclusion MC" for the "Track". More topics will be added based on CfP for this microconference.