Yocto Build Failure Swat Team

From Yocto Project
Revision as of 13:13, 21 December 2020 by RossBurton (talk | contribs) (→‎Process)
Jump to navigationJump to search

Overview

The role of the Bug Swat Team is to monitor the autobuilder and do preliminary investigation of failures, to ensure that they are logged and brought to the attention of the appropriate owner.

All builds that are run on the public autobuilder are important for the Yocto Project, whether they be routine validation runs (master or release branches) or a pre-integration test builds (master-next, stable/*, and others). Random failures if ignored accumulate and can result in a significant number of builds failing.

Each week a different member of the team is on call. Every build that fails on the autobuilder should be monitored unless told otherwise. The rotation happens at the end of Friday (deliberately vague), any failures over the weekend should be triaged by the incoming member on Monday.

Importantly, the Swat Team isn't responsible for resolving issues encountered on the autobuilder, just enough analysis that it can be logged and the appropriate owner notified.

The Swat Chairs are the primary contact for the Swat Team. The current Swat Chairs are Ross Burton, Armin Kuster and Richard Purdie. The Chairs are assisted by Stephen K. Jolley who handles the rotation process.

Process

The process is simply three steps:

  1. Identify build failures
  2. Report the build failures
  3. Update the build log

Identify

To be notified when a build fails you can subscribe to the yocto-builds mailing list. This is sent a mail when an 'important' build fails, and includes direct links to the autobuilder job summary, the BuildLog, and the Error Reporting Service.

Alternatively, these services can be monitored periodically. The Autobuilder 'Yocto console view' is an overview of the top-level builds (a-full and a-quick) and all the sub-builds they create. The BuildLog is a wiki page that is updated when builds fail with links to the appropriate logs. The Error Reporting Service collates errors from the autobuilder.

Both the mail notification and the BuildLog will include notes from the build owner, so check this for any useful context. For example, it may request that failures are reported directly to a specific person instead of bugs created, or that Swat should ignore the build entirely.

Report

Unless told otherwise, the usual process is as follows:

For builds against master or a release branch, all issues observed should be filed in Bugzilla. Remember to search first to ensure that the issue isn't already filed as, for example, many bugs that occur intermittently are already filed and have "AB-INT" in the whiteboard field.

For builds against staging branches (master-next, stable/dunfell-nut, etc), attempt to identify what patch in the branch is likely responsible for the failure. For example, if wget fails with libgnutls errors and there is a GnuTLS upgrade in the branch, that is the likely candidate. If a patch can be identified reply on the mailing list with the failure details. If it isn't obvious which patch is responsible for the failure, or a patch can be identified but it has been merged to the release branch, then file a bug and ensure the branch owner is either the assignee or on the CC list.

If in doubt, file a bug. All observed errors must be actioned unless a patch has already been sent for the issue, in which case please make note of this in the BuildLog.

If the issue is in the infrastructure or autobuilder itself then file a bug against Infrastructure: Autobuilder, infrastructure bugs should be assigned to Michael Halstead and autobuilder logic bugs to Richard Purdie.

The results of pre-triage for an issue should be added to the corresponding entry in the BuildLog, including a link to the resolution (patch name, bug link, etc) and a brief summary of the issue. Every issue should be added to the build log so it acts as a build status report.

The net result is all failures listed in BuildLog should have outcomes listed against them from the person on call at the time.

Communication is key: if the build owner is on IRC then it's always worth discussing with them first before filing bugs. Also, if the build owner triages the build failures then they should update the BuildLog so that Swat doesn't duplicate the work.

Filing bugs

When filing the bug, several items must be included:

  • Relevant details about the build configuration. For example did the failure happen just once, or in all PowerPC builds? Was it specific to multilib builds? Look across the entire build run and identify any patterns.
  • The error itself. Trim the log down to just the error and any relevant context in the bug description.
  • A link to the build failure. Ideally a link to the error reports page (such as http://errors.yoctoproject.org/Errors/Details/199667/) but a link to the autobuilder build log is acceptable (such as https://autobuilder.yoctoproject.org/typhoon/#/builders/34/builds/168). If referring to an autobuilder build log, also attach the complete build log as build logs are not kept forever.

Members

Ross Burton

Leo Sandoval

Anibal Limon

Saul Wold

Alejandro Hernandez Samaniego

Paul Eggleton

Naveen Saini

Armin Kuster (place me anywhere)

Christopher Larson

Lee Chee Yang

Jon Mason

Minjae Kim