System Update

From Yocto Project
Revision as of 14:48, 9 December 2016 by Patrick Ohly (talk | contribs) (summary in table, details in sections)
Jump to navigationJump to search

Introduction

This page compares different system update mechanisms. The purpose is to help the project with picking a suitable mechanism that the project then will support going forward. Users may find this page relevant for picking a mechanism that suits their specific needs.

A system update mechanism must ensure that a device running an older release of the operating systems runs with a more recent release when the update mechanism is done. This includes updating everything that defines the system (rootfs, kernel, bootloader, etc.), restarting running processes and potentially a reboot. An ideal mechanism:

  • never ends up in an inconsistent state (atomic update),
  • always keeps the device usable (fallback to previous state when there are problems, or at least supporting a recovery mode),
  • requires little additional resources (disk space, RAM),
  • minimizes downtime while updating,
  • works in combination with security technology (integrity protection),
  • is secure (does not install or execute software created by an attacker).

These are conflicting requirements. Different mechanisms will have different strengths and weaknesses. Therefore the first chapter provides a more detailed definition of the different aspects and has a table comparing the mechanisms. The following sections then describe each mechanism in more detail.

A similar comparison was done for Automotive Grade Linux (AGL) here: https://lists.linuxfoundation.org/pipermail/automotive-discussions/2016-May/002061.html

Some talks at Embedded Linux Conference presented an overview of the current mechanisms:

Comparison

Type
Block-based update mechanisms directly modify blocks in the partition(s) that they update, without going through the filesystem. This implies that the partition has to be the same for all devices and that devices must use exactly the same partition size. File-based update mechanisms modify files and directories. Therefore devices with different partition sizes can use the same update data and it may be possible to update without a reboot.
Disk layout
Dependencies on boot loader, number and kind of partitions, etc. Flexible mechanisms make no or only few assumptions about the system, but typically then require additional work to integrate with a specific setup.
Rootfs
The partition which contains the OS. May be strictly read-only (block-based update mechanisms) or read/write (file-based). Some update mechanisms support installing and updating a subset of the full OS.
Updates from
describes from where the update mechanism gets the update.
Updates what
describes which parts of the overall system the mechanism updates.
Code stability
Based on how long the code has been in use, personal experience, security track record in existing deployments, etc.
OE/Yocto integration
Whether the mechanism is already available and who supports it.
Resource requirements on server
affect both build time and long-term storage capacity. Likely to depend on the complexity of the changes.
Resource requirements on client
Amount of temporary disk space, CPU/network load, ..., again for different scenarios.
Failure resilience
Summarizes how the mechanism copes with potential problems.
Complexity
Some mechanisms are harder to use correctly than others (usability). Also includes how difficult is to set up the mechanism because of dependencies.
Downtime
How long normal operation of the device needs to be interrupted for an update.
Security
Compatibility with other technology, protection of the update mechanism itself.
Mechanism Type Disk layout Rootfs Updates from Updates what Code stability OE/Yocto integration Resource requirements Failure resilience Complexity Downtime Security
on server on client
swupd file-based flexible read/write HTTP(S) server, local media depends on setup relatively stable, under active development meta-swupd moderate, suitable for frequent updates minimal download, needs sufficient free space in rootfs favors fast updates over failure resilience some planning required minimal, reboot optional Compatible with Linux IMA, Smack, SELinux. Signed update data, HTTPS transfer protection.
sbabic's swupdate block-based flexible depends on setup local and remote (plain HTTP(S) or custom server) depends on setup Code relatively stable, 3 months release cycle meta-swupdate archives full image per build download and write full image no builtin mechanism, to be added as part of system easy to use (but requires customization!?) reboot required signed and encrypted images, HTTPS
Mender block-based fixed conceptually read-only remote Complete rootfs, including kernel. in development meta-mender archives full image per build download and write full image integrated rollback easy when using meta-mender reboot required HTTPS

TODO: add OSTree (https://bugzilla.yoctoproject.org/show_bug.cgi?id=10704)

swupd

Disk layout
swupd updates files in a single partition. Other than that, it makes no assumptions about disk layout, filesystem and boot mechanism.
Rootfs
Files provided by the OS are read-only, everything else is read/write (/etc, /var) and preserved during updates. OS can be split up into a core OS (always installed) and optional bundles which may or may not be installed.
Updates from
Uses libcurl to fetch update data and thus supports all URL schemes supported by libcurl, in particular http(s) and local files.
Updates what
swupd itself updates files in the rootfs, other components then need to be updated (boot loader, kernel, firmware, ...) then need be updated via custom plugins.
Code stability
Used and maintained in Clear Linux OS. Code relatively stable, but would benefit from a rewrite (evolved from a prototype). Code changes not written for Clear Linux OS tend to get merged only slowly.
OE/Yocto integration
Layer available, not part of Yocto releases, experimental. Supports building incremental updates (= only changes since last build are stored for new build) and deltas (= optimized update pack for specific previous builds, typically major milestones).
Resource requirements on server
Build time and storage for each update linear with total number of files (file system analysis, zero packs) plus linear with number of modified files (compression). When using swupd purely as update mechanism (i.e. no bundles), space requirement on the server could be reduced to linear with the number of modified files by not creating the zero packs. Optionally can prepare deltas from certain previous builds, which is linear with the number of modified files since each of those builds.
Resource requirements on client
In the best case (delta prepared by server), a single archive with just some file diffs gets downloaded, unpacked and applied. In other cases, each new or modified file gets downloaded and unpacked. Staging new content needs free space in the rootfs partition, i.e. partition must be at least twice as large as the base OS.
Failure resilience
No recovery mechanism built into swupd itself. Short period of time where interrupted update may leave behind inconsistent rootfs. No updates possible when there is not enough free space left. Updating files while system services are running reduces downtime, but is also more risky if system services aren't prepared for it. Could be extended to do updates without that risk, at the expense of increased downtime.
Complexity
Upgrade path must be considered as part of release process (deltas, incompatible changes).
Downtime
Downloading and staging in parallel to normal operation. Services are kept running until after the update, at which point the device admin needs to restart services or reboot depending on what was updated (not automated at the moment in meta-swupd layer).
Security
Compatible with Linux IMA, Smack, SELinux. Conceptually incompatible with dm-verity. Relies on HTTPS and (optionally) signing to protect integrity of update data (not integrated into meta-swupd yet).


babic's SWUpdate

Disk layout
There is no constraint how software is stored. SWUPdate supports raw flash (NOR, NAND), UBI volumes, disk partitions or can update files (provided in a tarball) into an existing filesystem. Each artifact can be stored on a different storage device.
Rootfs
No constrain where software is stored. During an update, a single partition, multiple partitions or generically multiple different storages can be updated.
SWUpdate is often used in one of the following setup:
  • rescue : The system reboots in maintenance mode and SWUpdate is started from a Ramdisk. Just one copy of the Software is stored into the system.
  • dual-copy : two copies of the software (rootfs, kernel) are stored into the system and SWUpdate installs the stand-by copy.
Updates from
  • local provisioning : USB, SD, etc.
  • generic URL : HTTP(S), FTP. It uses the libcurl library and supports what libcurl provides.
  • Webserver : SWUpdate integrates a Webserver (moongoose)
  • External Backend connector (suricatta mode) to bind with an external backend server. Currently, the Hawkbit server is supported https://github.com/eclipse/hawkbit. Open to further backends.
Updates what
  • bootloader (risky !)
  • kernel
  • interface to bootloader (U-Boot) Allows to change u-boot variables and allow to use plugins to make changes to other bootloaders.
  • disk partitions
  • provide interface to update FPGAs, external microcontrollers, etc.
  • custom handlers: an interface allows to add own installers written in C or in LUA.
Resources on client
  • rescue : meta-swupdate provides recipe to generate a compressed ramdisk with small footprint. Including support for signed image, the whole rootfs is ~4MB. The minimal requirement for a complete rescue (bootloader, kernel for SWUpdate and ramdisk) is 8MB. This allows to put the rescue in a small storage like a SPI-NOR, while the software is stored on another and bigger device (NAND, eMMC).
  • dual-copy : Needs active/passive rootfs partition, and bandwidth to download compressed rootfs image. No additional space is required if the image is directly streamed to the stand-by copy.
Failure resilience
There are no recovery mechanism built in. If the bootloader has the capability to check if a boot failed, it could boot in maintenance mode again.
Downtime
  • rescue : There is need to reboot in maintenance mode and once the image is installed, needs to reboot again with the new production image.
  • dual-copy : Update is downloaded and applied during normal operation. Afterwards one reboot is required, no other downtime.
Security
  • Connection with HTTPS to the external the server.
  • Signed images : it is possible to sign the images used for the update in order to check its integrity.
  • Split in several processes: connection to the internet can run with a different userid / groupid as the installer. The installer runs often with high privileges because it has to write the hardware.
  • Support for encrypted artifacts


Mender

Disk layout
Dual rootfs partitions with two extra partitions, a boot and a data partition.
Rootfs
Stored in one active and one passive partition, read/write while in use, but modifications to it get lost during updates when switching partitions. Kernel is stored on rootfs.
Updates from
Mender server and/or HTTPS server
Code stability
Master branch under development. There is a stable branch, but it's lacking many basic features.
Resources on server
Needs to store one compressed rootfs image for each update, plus small meta data section.
Resources on client
Needs active/passive rootfs partition, and bandwidth to download compressed rootfs image. Needs no additional space on device beyond the partitions, space for the Mender binary, and a tiny local database.
Failure resilience
Automatic rollback if the device either fails to boot, or the Mender daemon cannot connect to the Mender server afterwards.
Complexity
Relatively easy to build with Yocto. More complex if not using Yocto.
Downtime
Update is downloaded and applied during normal operation. Afterwards one reboot is required, no other downtime.
Security
Secure connection to the server (TLS). Signing not currently supported, but planned.