System Update: Difference between revisions
Line 103: | Line 103: | ||
| remote using Mender management server ([https://docs.mender.io/Architecture/Overview#modes-of-operation managed mode]) or local using CLI ([https://docs.mender.io/Architecture/Overview#modes-of-operation standalone mode]) | | remote using Mender management server ([https://docs.mender.io/Architecture/Overview#modes-of-operation managed mode]) or local using CLI ([https://docs.mender.io/Architecture/Overview#modes-of-operation standalone mode]) | ||
| Complete rootfs, including kernel. | | Complete rootfs, including kernel. | ||
| relatively stable, fully supported and tested upgrade path | | relatively stable, fully supported and tested [https://docs.mender.io/administration/upgrading upgrade path] | ||
| [https://github.com/mendersoftware/meta-mender meta-mender] | | [https://github.com/mendersoftware/meta-mender meta-mender] | ||
| compressed rootfs per build | | compressed rootfs per build |
Revision as of 16:52, 14 June 2017
Introduction
This page compares different system update mechanisms. The purpose is to help the project with picking a suitable mechanism that the project then will support going forward. Users may find this page relevant for picking a mechanism that suits their specific needs.
A system update mechanism must ensure that a device running an older release of the operating systems runs with a more recent release when the update mechanism is done. This includes updating everything that defines the system (rootfs, kernel, bootloader, etc.), restarting running processes and potentially a reboot. An ideal mechanism:
- never ends up in an inconsistent state (atomic update),
- always keeps the device usable (fallback to previous state when there are problems, or at least supporting a recovery mode),
- requires little additional resources (disk space, RAM),
- minimizes downtime while updating,
- works in combination with security technology (integrity protection),
- is secure (does not install or execute software created by an attacker).
These are conflicting requirements. Different mechanisms will have different strengths and weaknesses. Therefore the first chapter provides a more detailed definition of the different aspects and has a table comparing the mechanisms. The following sections then describe each mechanism in more detail.
Some talks at Embedded Linux Conference presented an overview of the current mechanisms:
- How do you update your embedded Linux devices? by Daniel Sangorrin / Keijiro Yano http://events.linuxfoundation.org/sites/events/files/slides/linuxcon-japan-2016-softwre-updates-sangorrin.pdf
- Comparison of Linux Software Update Technologies by Matt Porter http://events.linuxfoundation.org/sites/events/files/slides/Comparison%20of%20Linux%20Software%20Update%20Technologies_0.pdf. Video at https://youtu.be/pdHV9H9nZks?list=PLbzoR-pLrL6pRFP6SOywVJWdEHlmQE51q. Full paper done for Automotive Grade Linux (AGL) here: https://lists.linuxfoundation.org/pipermail/automotive-discussions/2016-May/002061.html
- Software update for IoT: the current state of play by Chris Simmonds http://de.slideshare.net/chrissimmonds/software-update-for-iot-the-current-state-of-play
- OSS Remote Firmware Updates for IoT-like Projects by Silvano Cirujano Cuesta http://events.linuxfoundation.org/sites/events/files/slides/OSS_Remote_Firmware_Updates_for_IoT-like_Projects.pdf
- "Surviving in the Wilderness: Integrity Protection and System Update" by Patrick Ohly: abstract and slides, video recording of talk
Comparison
- Type
- Block-based update mechanisms directly modify blocks in the partition(s) that they update, without going through the filesystem. This implies that the partition has to be the same for all devices and that devices must use exactly the same partition size. File-based update mechanisms modify files and directories. Therefore devices with different partition sizes can use the same update data and it may be possible to update without a reboot.
- Disk layout
- Dependencies on boot loader, number and kind of partitions, etc. Flexible mechanisms make no or only few assumptions about the system, but typically then require additional work to integrate with a specific setup.
- Rootfs
- The partition which contains the OS. May be strictly read-only (block-based update mechanisms) or read/write (file-based). Some update mechanisms support installing and updating a subset of the full OS.
- Updates from
- describes from where the update mechanism gets the update.
- Updates what
- describes which parts of the overall system the mechanism updates.
- Code stability
- Based on how long the code has been in use, personal experience, security track record in existing deployments, etc.
- OE/Yocto integration
- Whether the mechanism is already available and who supports it.
- Resource requirements on server
- affect both build time and long-term storage capacity. Likely to depend on the complexity of the changes.
- Resource requirements on client
- Amount of temporary disk space, CPU/network load, ..., again for different scenarios.
- Failure resilience
- Summarizes how the mechanism copes with potential problems.
- Complexity
- Some mechanisms are harder to use correctly than others (usability). Also includes how difficult is to set up the mechanism because of dependencies.
- Downtime
- How long normal operation of the device needs to be interrupted for an update.
- Security
- Compatibility with other technology, protection of the update mechanism itself.
Mechanism | Type | Disk layout | Rootfs | Updates from | Updates what | Code stability | OE/Yocto integration | Resource requirements | Failure resilience | Complexity | Downtime | Security | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
on server | on client | ||||||||||||
swupd | file-based | flexible | read/write | HTTP(S) server, local media | depends on setup | relatively stable, under active development | meta-swupd | moderate, suitable for frequent updates | minimal download, needs sufficient free space in rootfs | favors fast updates over failure resilience | some planning required | minimal, reboot optional | Compatible with Linux IMA, Smack, SELinux. Signed update data, HTTPS transfer protection. |
sbabic's swupdate | block-based / file based | flexible | depends on setup | local and remote (plain HTTP(S) or custom server) | depends on setup | Code relatively stable, 3 months release cycle | meta-swupdate | archives full image per build | download and write full image | no builtin mechanism, to be added as part of system | easy to use (but requires customization!?) | reboot required | signed and encrypted images, HTTPS |
Mender | block-based | flexible (minimum four partitions), U-Boot as boot loader | conceptually read-only | remote using Mender management server (managed mode) or local using CLI (standalone mode) | Complete rootfs, including kernel. | relatively stable, fully supported and tested upgrade path | meta-mender | compressed rootfs per build | download and write compressed rootfs | integrated rollback | easy when using meta-mender | reboot required | HTTPS enforced, signed images |
OSTree | file-based | flexible, but supports only limited set of bootloaders | read/write, OS trees bind mounted read-only, /etc and /var writable | local and remote repositories | kernel and filesystem | relatively stable, significant user base, under active development | meta-ostree (WIP), meta-updater (public) | generating commits based on new builds, storing them in repository | updating local repository, hard links for sharing unchanged content between deployments | rollback to a different deployed OS tree | some work required | reboot required | GPG-signed commits |
RAUC | block based / file-based (tar) | flexible (block-device/MTD) | depends on setup (read-only supported) | depends on setup | depends on setup (any storage device) | relatively stable, under active development | meta-rauc | archives full (compressed) image per build | download and write full (compressed) image | integrated rollback (requires bootloader support) | some customization required | reboot required | X509-signed update bundles |
swupd
- Disk layout
- swupd updates files in a single partition. Other than that, it makes no assumptions about disk layout, filesystem and boot mechanism.
- Rootfs
- Files provided by the OS are read-only, everything else is read/write (/etc, /var) and preserved during updates. OS can be split up into a core OS (always installed) and optional bundles which may or may not be installed.
- Updates from
- Uses libcurl to fetch update data and thus supports all URL schemes supported by libcurl, in particular http(s) and local files.
- Updates what
- swupd itself updates files in the rootfs, other components then need to be updated (boot loader, kernel, firmware, ...) then need be updated via custom plugins.
- Code stability
- Used and maintained in Clear Linux OS. Code relatively stable, but would benefit from a rewrite (evolved from a prototype). Code changes not written for Clear Linux OS tend to get merged only slowly or not at all (examples: assumption about filesystem, error checking, path configuration).
- OE/Yocto integration
- Layer available, not part of Yocto releases, experimental. Supports building incremental updates (= only changes since last build are stored for new build) and deltas (= optimized update pack for specific previous builds, typically major milestones).
- Resource requirements on server
- Build time and storage for each update linear with total number of files (file system analysis, zero packs) plus linear with number of modified files (compression). When using swupd purely as update mechanism (i.e. no bundles), space requirement on the server could be reduced to linear with the number of modified files by not creating the zero packs. Optionally can prepare deltas from certain previous builds, which is linear with the number of modified files since each of those builds.
- Resource requirements on client
- In the best case (delta prepared by server), a single archive with just some file diffs gets downloaded, unpacked and applied. In other cases, each new or modified file gets downloaded and unpacked. Staging new content needs free space in the rootfs partition, i.e. partition must be at least twice as large as the base OS.
- Failure resilience
- No recovery mechanism built into swupd itself. Short period of time where interrupted update may leave behind inconsistent rootfs. No updates possible when there is not enough free space left. Updating files while system services are running reduces downtime, but is also more risky if system services aren't prepared for it. Could be extended to do updates without that risk, at the expense of increased downtime.
- Complexity
- Upgrade path must be considered as part of release process (deltas, incompatible changes).
- Downtime
- Downloading and staging in parallel to normal operation. Services are kept running until after the update, at which point the device admin needs to restart services or reboot depending on what was updated (not automated at the moment in meta-swupd layer).
- Security
- Compatible with Linux IMA, Smack, SELinux. Conceptually incompatible with dm-verity. Relies on HTTPS and (optionally) signing to protect integrity of update data (not integrated into meta-swupd yet).
babic's SWUpdate
- Disk layout
- There is no constraint how software is stored. SWUPdate supports raw flash (NOR, NAND), UBI volumes, disk partitions or can update files (provided in a tarball) into an existing filesystem. Each artifact can be stored on a different storage device.
- Rootfs
- No constrain where software is stored. During an update, a single partition, multiple partitions or generically multiple different storages can be updated.
- SWUpdate is often used in one of the following setup:
-
- rescue : The system reboots in maintenance mode and SWUpdate is started from a Ramdisk. Just one copy of the Software is stored into the system.
- dual-copy : two copies of the software (rootfs, kernel) are stored into the system and SWUpdate installs the stand-by copy.
- Updates from
-
- local provisioning : USB, SD, etc.
- generic URL : HTTP(S), FTP. It uses the libcurl library and supports what libcurl provides.
- Webserver : SWUpdate integrates a Webserver (moongoose)
- External Backend connector (suricatta mode) to bind with an external backend server. Currently, the Hawkbit server is supported https://github.com/eclipse/hawkbit. Open to further backends.
- Updates what
-
- bootloader (risky !)
- kernel
- interface to bootloader (U-Boot) Allows to change u-boot variables and allow to use plugins to make changes to other bootloaders.
- disk partitions
- provide interface to update FPGAs, external microcontrollers, etc.
- custom handlers: an interface allows to add own installers written in C or in LUA.
- Resources on client
-
- rescue : meta-swupdate provides recipe to generate a compressed ramdisk with small footprint. Including support for signed image, the whole rootfs is ~4MB. The minimal requirement for a complete rescue (bootloader, kernel for SWUpdate and ramdisk) is 8MB. This allows to put the rescue in a small storage like a SPI-NOR, while the software is stored on another and bigger device (NAND, eMMC).
- dual-copy : Needs active/passive rootfs partition, and bandwidth to download compressed rootfs image. No additional space is required if the image is directly streamed to the stand-by copy.
- Failure resilience
- There are no recovery mechanism built in. If the bootloader has the capability to check if a boot failed, it could boot in maintenance mode again.
- Downtime
-
- rescue : There is need to reboot in maintenance mode and once the image is installed, needs to reboot again with the new production image.
- dual-copy : Update is downloaded and applied during normal operation. Afterwards one reboot is required, no other downtime.
- Security
-
- Connection with HTTPS to the external the server.
- Signed images : it is possible to sign the images used for the update in order to check its integrity.
- Split in several processes: connection to the internet can run with a different userid / groupid as the installer. The installer runs often with high privileges because it has to write the hardware.
- Support for encrypted artifacts
Mender
- Disk layout
- No upper limit on partitions, minimum four. Dual rootfs partitions with two extra partitions, a boot and a data partition. Depends on U-Boot as boot loader for the automatic rollback.
- Rootfs
- Stored in one active and one passive partition, read/write while in use, but modifications to it get lost during updates when switching partitions, so persistent data should be stored in the data partition. Kernel is stored on rootfs.
- Updates from
- Mender Server over HTTPS (managed mode)
- manual provisioning : local file system/URL, e.g. USB, SD, HTTP(S) (standalone mode)
- Code stability
- Stable release. Upgrade path fully supported and tested.
- Resources on server
- Needs to store one compressed rootfs image for each update, plus small meta data section.
- Resources on client
- Needs active/passive rootfs partition, and bandwidth to download compressed rootfs image. Needs no additional space on device beyond the partitions (update is streamed), space for the Mender binary, and a tiny local database.
- Failure resilience
- Automatic rollback if the device either fails to boot, or the Mender daemon cannot connect to the Mender server afterwards.
- Complexity
- Relatively easy to build with Yocto. More complex if not using Yocto.
- Downtime
- Update is downloaded and applied during normal operation. Afterwards one reboot is required, no other downtime.
- Security
- Secure connection to the server (TLS). Artifact signing with RSA or ECDSA.
OSTree
- Disk layout
- Only limited set of bootloaders supported (those that support The Boot Loader Specification and some exceptions). The root file system has to have the /boot directory managed by OSTree. The actual deployed OS trees have certain conventions and requirements. For example, /etc and /var need to be writable, while /usr is mounted read-only.
- Rootfs
- Files provided by OS are expected to be in /usr, which is mounted read-only. /etc and /var are writable.
- Updates from
- Local and remote repostories. OSTree works by replicating a repository to the target device and then "checking out" deployment directories from the repository. Local and HTTP methods for replicating the repository are available.
- Updates what
- OSTree (atomically) updates root file system trees and the kernel. If anything else needs to be updated, it needs to happen by running custom code on the target device.
- Code stability
- OSTree is relatively stable. It's used by many distributions and projects, such as Fedora Atomic and Flatpak. OSTree is under active development and has an open-source community around it.
- OE/Yocto integration
- Meta-ostree layer is work in progress.
- Resource requirements on server
- OSTree generates "commit objects" based on the filesystem changes between builds. These commit objects are stored in a commit chain the same way as git does. The commit objects transmitted over the network as binary diffs when the remote and local repositories are synchronized.
- Resource requirements on client
- Client has the local copy of the repository. The repository contains the objects, which map to the files on the OS trees. The deployments of the OS trees (or "checkouts" in git termininology) are made using hard links to the repository content. This means that the required space is only increased when the data changes: the static data files are shared between deployments.
- OSTree allows deleting of data. This means that the full operating system history doesn't need to be stored in the repository. A typical OSTree-based system might have two deployed OS trees: one that is being currently used and one fallback.
- Failure resilience
- OSTree controls the booting to the different deployed trees using bootloader entries. If booting a deployed OS tree fails, a different bootloader entry can be chosen for booting into a different OS tree.
- Complexity
- Some work is needed to fit the root filesystem into the conventions that OSTree likes. Some documentation of the needed steps is available here.
- Downtime
- Reboot is required after updates. Changing between deployed OS trees is done by selecting a different boot loader entry.
- Security
- The commits can be signed with GPG-based signatures. OSTree repositories store file extended attributes, which means that the security mechanisms using extended attributes should be functional with OSTree.
RAUC
- Type
- RAUC supports filesystems on block devices and MTD (NAND/NOR) as well as raw images (bootloaders, FPGA bitstreams).
- Disk layout
- RAUC does not depend on a fixed disk layout but allows to flexibly configure one. It supports A+B scenarios as well as A+recovery or more complex setups.
- Rootfs
- There are no limitations on the root file system, it can be read-only or read-write. It has to contain the RAUC system configuration file, a keyring for verification and the small RAUC binary itself.
- Updates from
- RAUC is the updating core that is meant to be integrated in a custom environment. Thus it supports installing bundles from local file paths, while another application is responsible fetching the update.
- External storage (USB, SD): direct access, no copying required
- Webserver: Application downloads Bundle to local tmpfs / storage.
- Demo-applications planned.
- Updates what
-
- everything which is updatable
- default handler for common storage types and file systems (ext4, NAND, UBI, raw copy)
- allows custom handler for each update target (script or application, information passed as environment variables)
- Resources on client
- At least one production rootfs and a rescue system. Temporary storage for update Bundle (compressed) if provided by a webserver.
- Failure resilience
- Interface to Booloader (Barebox, U-Boot, GRUB) allows atomic updates (correct fallback handling is up to the bootloader). Interface to mark updates good or bad after certain checks in the newly booted rootfs (must be supported by bootloader).
- Complexity
- Integration into the rootfs and generating update Bundles is relatively easy with Yocto (meta-rauc) or PTXdist. Manual integration described here.
- Downtime
- Normally, downloading and installing will be perforemd as background operation. A reboot is needed to make the new system active (might be scheduled).
- Security
- Signed images: Signing images is mandatory. X.509 cryptography (CMS) is used to sign the bundle and a full PKI with revocations is supported. An example script allows creating key/cert/keychain for test purposes.
- Verified boot: When using image updates, compatible with IMA, SELinux, dm-verity, ... (as it does require write access to the filesystems).