System Update

From Yocto Project
Jump to navigationJump to search

Introduction

This page compares different system update mechanisms. The purpose is to help the project with picking a suitable mechanism that the project then will support going forward. Users may find this page relevant for picking a mechanism that suits their specific needs.

A system update mechanism must ensure that a device running an older release of the operating systems runs with a more recent release when the update mechanism is done. This includes updating everything that defines the system (rootfs, kernel, bootloader, etc.), restarting running processes and potentially a reboot. An ideal mechanism:

  • never ends up in an inconsistent state (atomic update),
  • always keeps the device usable (fallback to previous state when there are problems, or at least supporting a recovery mode),
  • requires little additional resources (disk space, RAM),
  • minimizes downtime while updating,
  • works in combination with security technology (integrity protection),
  • is secure (does not install or execute software created by an attacker).

These are conflicting requirements. Different mechanisms will have different strengths and weaknesses. Therefore the first chapter provides a more detailed definition of the different aspects and has a table comparing the mechanisms. The following sections then describe each mechanism in more detail.

Some talks at Embedded Linux Conference presented an overview of the current mechanisms:

Comparison

Type
Block-based update mechanisms directly modify blocks in the partition(s) that they update, without going through the filesystem. This implies that the partition has to be the same for all devices and that devices must use exactly the same partition size. File-based update mechanisms modify files and directories. Therefore devices with different partition sizes can use the same update data and it may be possible to update without a reboot.
Disk layout
Dependencies on boot loader, number and kind of partitions, etc. Flexible mechanisms make no or only few assumptions about the system, but typically then require additional work to integrate with a specific setup.
Rootfs
The partition which contains the OS. May be strictly read-only (block-based update mechanisms) or read/write (file-based). Some update mechanisms support installing and updating a subset of the full OS.
Updates from
describes from where the update mechanism gets the update.
Updates what
describes which parts of the overall system the mechanism updates.
Code stability
Based on how long the code has been in use, personal experience, security track record in existing deployments, etc.
OE/Yocto integration
Whether the mechanism is already available and who supports it.
Resource requirements on server
affect both build time and long-term storage capacity. Likely to depend on the complexity of the changes.
Resource requirements on client
Amount of temporary disk space, CPU/network load, ..., again for different scenarios.
Failure resilience
Summarizes how the mechanism copes with potential problems.
Complexity
Some mechanisms are harder to use correctly than others (usability). Also includes how difficult is to set up the mechanism because of dependencies.
Downtime
How long normal operation of the device needs to be interrupted for an update.
Security
Compatibility with other technology, protection of the update mechanism itself.
Mechanism Type Disk layout Rootfs Updates from Updates what Code stability OE/Yocto integration Resource requirements Failure resilience Complexity Downtime Security
on server on client
swupd file-based flexible read/write HTTP(S) server, local media depends on setup relatively stable, under active development meta-swupd moderate, suitable for frequent updates minimal download, needs sufficient free space in rootfs favors fast updates over failure resilience some planning required minimal, reboot optional Compatible with Linux IMA, Smack, SELinux. Signed update data, HTTPS transfer protection.
sbabic's swupdate block-based flexible depends on setup local and remote (plain HTTP(S) or custom server) depends on setup Code relatively stable, 3 months release cycle meta-swupdate archives full image per build download and write full image no builtin mechanism, to be added as part of system easy to use (but requires customization!?) reboot required signed and encrypted images, HTTPS
Mender block-based fixed, U-Boot as boot loader conceptually read-only remote Complete rootfs, including kernel. in development meta-mender archives full image per build download and write full image integrated rollback easy when using meta-mender reboot required HTTPS
OSTree file-based flexible, but supports only limited set of bootloaders read/write, OS trees bind mounted read-only, /etc and /var writable local and remote repositories kernel and filesystem relatively stable, significant user base, under active development meta-ostree (WIP), meta-updater (public) generating commits based on new builds, storing them in repository updating local repository, hard links for sharing unchanged content between deployments rollback to a different deployed OS tree some work required reboot required GPG-signed commits
Robust Auto-Update Controller (RAUC) block based?

swupd

Disk layout
swupd updates files in a single partition. Other than that, it makes no assumptions about disk layout, filesystem and boot mechanism.
Rootfs
Files provided by the OS are read-only, everything else is read/write (/etc, /var) and preserved during updates. OS can be split up into a core OS (always installed) and optional bundles which may or may not be installed.
Updates from
Uses libcurl to fetch update data and thus supports all URL schemes supported by libcurl, in particular http(s) and local files.
Updates what
swupd itself updates files in the rootfs, other components then need to be updated (boot loader, kernel, firmware, ...) then need be updated via custom plugins.
Code stability
Used and maintained in Clear Linux OS. Code relatively stable, but would benefit from a rewrite (evolved from a prototype). Code changes not written for Clear Linux OS tend to get merged only slowly or not at all (examples: assumption about filesystem, error checking, path configuration).
OE/Yocto integration
Layer available, not part of Yocto releases, experimental. Supports building incremental updates (= only changes since last build are stored for new build) and deltas (= optimized update pack for specific previous builds, typically major milestones).
Resource requirements on server
Build time and storage for each update linear with total number of files (file system analysis, zero packs) plus linear with number of modified files (compression). When using swupd purely as update mechanism (i.e. no bundles), space requirement on the server could be reduced to linear with the number of modified files by not creating the zero packs. Optionally can prepare deltas from certain previous builds, which is linear with the number of modified files since each of those builds.
Resource requirements on client
In the best case (delta prepared by server), a single archive with just some file diffs gets downloaded, unpacked and applied. In other cases, each new or modified file gets downloaded and unpacked. Staging new content needs free space in the rootfs partition, i.e. partition must be at least twice as large as the base OS.
Failure resilience
No recovery mechanism built into swupd itself. Short period of time where interrupted update may leave behind inconsistent rootfs. No updates possible when there is not enough free space left. Updating files while system services are running reduces downtime, but is also more risky if system services aren't prepared for it. Could be extended to do updates without that risk, at the expense of increased downtime.
Complexity
Upgrade path must be considered as part of release process (deltas, incompatible changes).
Downtime
Downloading and staging in parallel to normal operation. Services are kept running until after the update, at which point the device admin needs to restart services or reboot depending on what was updated (not automated at the moment in meta-swupd layer).
Security
Compatible with Linux IMA, Smack, SELinux. Conceptually incompatible with dm-verity. Relies on HTTPS and (optionally) signing to protect integrity of update data (not integrated into meta-swupd yet).

babic's SWUpdate

Disk layout
There is no constraint how software is stored. SWUPdate supports raw flash (NOR, NAND), UBI volumes, disk partitions or can update files (provided in a tarball) into an existing filesystem. Each artifact can be stored on a different storage device.
Rootfs
No constrain where software is stored. During an update, a single partition, multiple partitions or generically multiple different storages can be updated.
SWUpdate is often used in one of the following setup:
  • rescue : The system reboots in maintenance mode and SWUpdate is started from a Ramdisk. Just one copy of the Software is stored into the system.
  • dual-copy : two copies of the software (rootfs, kernel) are stored into the system and SWUpdate installs the stand-by copy.
Updates from
  • local provisioning : USB, SD, etc.
  • generic URL : HTTP(S), FTP. It uses the libcurl library and supports what libcurl provides.
  • Webserver : SWUpdate integrates a Webserver (moongoose)
  • External Backend connector (suricatta mode) to bind with an external backend server. Currently, the Hawkbit server is supported https://github.com/eclipse/hawkbit. Open to further backends.
Updates what
  • bootloader (risky !)
  • kernel
  • interface to bootloader (U-Boot) Allows to change u-boot variables and allow to use plugins to make changes to other bootloaders.
  • disk partitions
  • provide interface to update FPGAs, external microcontrollers, etc.
  • custom handlers: an interface allows to add own installers written in C or in LUA.
Resources on client
  • rescue : meta-swupdate provides recipe to generate a compressed ramdisk with small footprint. Including support for signed image, the whole rootfs is ~4MB. The minimal requirement for a complete rescue (bootloader, kernel for SWUpdate and ramdisk) is 8MB. This allows to put the rescue in a small storage like a SPI-NOR, while the software is stored on another and bigger device (NAND, eMMC).
  • dual-copy : Needs active/passive rootfs partition, and bandwidth to download compressed rootfs image. No additional space is required if the image is directly streamed to the stand-by copy.
Failure resilience
There are no recovery mechanism built in. If the bootloader has the capability to check if a boot failed, it could boot in maintenance mode again.
Downtime
  • rescue : There is need to reboot in maintenance mode and once the image is installed, needs to reboot again with the new production image.
  • dual-copy : Update is downloaded and applied during normal operation. Afterwards one reboot is required, no other downtime.
Security
  • Connection with HTTPS to the external the server.
  • Signed images : it is possible to sign the images used for the update in order to check its integrity.
  • Split in several processes: connection to the internet can run with a different userid / groupid as the installer. The installer runs often with high privileges because it has to write the hardware.
  • Support for encrypted artifacts


Mender

Disk layout
No upper limit on partitions, minimum four. Dual rootfs partitions with two extra partitions, a boot and a data partition. Depends on U-Boot as boot loader for the automatic rollback.
Rootfs
Stored in one active and one passive partition, read/write while in use, but modifications to it get lost during updates when switching partitions, so persistent data should be stored in the data partition. Kernel is stored on rootfs.
Updates from
Mender Server over HTTPS (managed mode)
manual provisioning : local file system/URL, e.g. USB, SD, HTTP(S) (standalone mode)
Code stability
Stable release. Upgrade path fully supported and tested.
Resources on server
Needs to store one compressed rootfs image for each update, plus small meta data section.
Resources on client
Needs active/passive rootfs partition, and bandwidth to download compressed rootfs image. Needs no additional space on device beyond the partitions (update is streamed), space for the Mender binary, and a tiny local database.
Failure resilience
Automatic rollback if the device either fails to boot, or the Mender daemon cannot connect to the Mender server afterwards.
Complexity
Relatively easy to build with Yocto. More complex if not using Yocto.
Downtime
Update is downloaded and applied during normal operation. Afterwards one reboot is required, no other downtime.
Security
Secure connection to the server (TLS). Artifact signing is being implemented.

OSTree

Disk layout
Only limited set of bootloaders supported (those that support The Boot Loader Specification and some exceptions). The root file system has to have the /boot directory managed by OSTree. The actual deployed OS trees have certain conventions and requirements. For example, /etc and /var need to be writable, while /usr is mounted read-only.
Rootfs
Files provided by OS are expected to be in /usr, which is mounted read-only. /etc and /var are writable.
Updates from
Local and remote repostories. OSTree works by replicating a repository to the target device and then "checking out" deployment directories from the repository. Local and HTTP methods for replicating the repository are available.
Updates what
OSTree (atomically) updates root file system trees and the kernel. If anything else needs to be updated, it needs to happen by running custom code on the target device.
Code stability
OSTree is relatively stable. It's used by many distributions and projects, such as Fedora Atomic and Flatpak. OSTree is under active development and has an open-source community around it.
OE/Yocto integration
Meta-ostree layer is work in progress.
Resource requirements on server
OSTree generates "commit objects" based on the filesystem changes between builds. These commit objects are stored in a commit chain the same way as git does. The commit objects transmitted over the network as binary diffs when the remote and local repositories are synchronized.
Resource requirements on client
Client has the local copy of the repository. The repository contains the objects, which map to the files on the OS trees. The deployments of the OS trees (or "checkouts" in git termininology) are made using hard links to the repository content. This means that the required space is only increased when the data changes: the static data files are shared between deployments.
OSTree allows deleting of data. This means that the full operating system history doesn't need to be stored in the repository. A typical OSTree-based system might have two deployed OS trees: one that is being currently used and one fallback.
Failure resilience
OSTree controls the booting to the different deployed trees using bootloader entries. If booting a deployed OS tree fails, a different bootloader entry can be chosen for booting into a different OS tree.
Complexity
Some work is needed to fit the root filesystem into the conventions that OSTree likes. Some documentation of the needed steps is available here.
Downtime
Reboot is required after updates. Changing between deployed OS trees is done by selecting a different boot loader entry.
Security
The commits can be signed with GPG-based signatures. OSTree repositories store file extended attributes, which means that the security mechanisms using extended attributes should be functional with OSTree.