Yocto Build Failure Swat Team: Difference between revisions

Revision as of 15:17, 16 November 2020

Overview

The role of the Bug Swat Team is to monitor the autobuilder and do preliminary investigation of failures, to ensure that they are logged and brought to the attention of the appropriate owner.

All builds that are run on the public autobuilder are important for the Yocto Project, whether they be routine validation runs (master or release branches) or a pre-integration test builds (master-next, stable/*, and others). Random failures if ignored accumulate and can result in most builds failing.

Each week a different member of the team is on call. Every build that fails on the autobuilder should be monitored unless told otherwise.

Importantly, the Swat Team isn't responsible for resolving issues encountered on the autobuilder, just enough analysis that it can be logged and the appropriate owner notified.

The Swat Chairs are the primary contact for the Swat Team. The current Swat Chairs are Ross Burton, Armin Kuster and Richard Purdie. The Chairs are assisted by Stephen K. Jolley who handles the rotation process.

Process

The high-level overview of the process is simply two steps:

Identify build failures
Report the build failures

Identify

To get notified when a relevant build fails you can subscribe to the yocto-builds mailing list. This is sent a mail when an 'important' build fails, and includes direct links to the autobuilder job summary, the BuildLog, and the Error Reporting Service.

Alternatively, these services can be monitored periodically. The [autobuilder job summary Autobuilder 'Yocto console view'] is an overview of the top-level builds (a-full and a-quick) and all the sub-builds they create. Any failures with links will be added to the BuildLog under the relevant section.

Both the mail notification and the BuildLog will include notes from the build owner, so check this for any useful context. For example, it may request that failures are reported directly to a specific person instead of bugs created, or that Swat should ignore the build.

Report

Unless when told otherwise, the usual process is as follows:

For builds against master or a release branch, all issues observed should be filed in Bugzilla. Remember to search first to ensure that the issue isn't already filed as, for example, many bugs that occur intermittently are already filed and have "AB-INT" in the whiteboard field.

For builds against staging branches (master-next, stable/dunfell-nut, etc), attempt to identify what patch in the branch is likely responsible for the failure. For example, if wget fails with libgnutls errors and there is a GnuTLS upgrade in the branch, that is the likely candidate. If a patch can be identified reply on the mailing list with the failure details. If it isn't obvious which patch is responsible for the failure, or a patch can be identified but it has been merged to the release branch, then file a bug and ensure the branch owner is either the assignee or on the CC list.

If in doubt, file a bug. All observed errors must be actioned unless a patch has already been sent for the issue, in which case please make note of this in the BuildLog.

If the issue is in the infrastructure or autobuilder itself then file a bug against , infrastructure bugs should be assigned to Michael Halstead and autobuilder logic bugs to Richard Purdie.

The results of pre-triage for an issue should be added to the corresponding entry in the BuildLog, including a link to the resolution (patch name, bug link, etc) and a brief summary of the issue. Every issue should be added to the build log so it acts as a build status report.

Note that some builds, in particular the "perf" builds are not listed on BuildLog unless the build fails (to try and reduce noise on the log). Failures in performance test builds should be handled like any other build.

The net result is all failures listed in BuildLog should have outcomes listed against them from the person on SWAT at the time.

Filing bugs

When filing the bug, several items must be included:

Relevant details about the build configuration. For example did the failure happen just once, or in all PowerPC builds? Was it specific to multilib builds? Look across the entire build run and identify any patterns.
The error itself. Trim the log down to just the error and any relevant context in the bug description.
A link to the build failure. Ideally a link to the error reports page (such as http://errors.yoctoproject.org/Errors/Details/199667/) but a link to the autobuilder build log is acceptable (such as https://autobuilder.yoctoproject.org/typhoon/#/builders/34/builds/168). If referring to an autobuilder build log, also attach the complete build log as build logs are not kept forever.

Process summary

Monitor builds via one or more of:
- the autobuilder: main page, console view. Orange builds have warnings, red builds failed with errors.
- the Error Reports web page which has an entry for each recipe that fails. A link to the appropriate search is on the autobuilder's Console View.
- the BuildLog wiki page. This has links to each build log that wasn't successful.
- the yocto-builds mailing list. This gets a mail for every unsuccessful build.
Pre-triage each failure:
- File a bug or respond to a patch or note known issues
- Update the BuildLog with the result of pre-triage, linking to the resolution where possible.

Members

Alejandro Hernandez Samaniego

Paul Eggleton

Naveen Saini

Armin Kuster (place me anywhere)

Christopher Larson

Lee Chee Yang

@@ Line 31: / Line 31: @@
 For builds against master or a release branch, all issues observed should be [[#Filing_bugs | filed in Bugzilla]].  Remember to search first to ensure that the issue isn't already filed as, for example, many bugs that occur intermittently are already filed and have "AB-INT" in the whiteboard field.
-For builds against staging branches (master-next, stable/dunfell-nut, etc), attempt to identify what patch in the branch is likely responsible for the failure. For example, if <tt>wget</tt> fails with <tt>libgnutls</tt> errors and there is a GnuTLS upgrade in the branch, that is the likely candidate.  If a patch can be identified that isn't also in master or release branch, reply on the mailing list with the failure details.  If it isn't obvious which patch is responsible for the failure then [[#Filing_bugs | file a bug]] and ensure the branch owner is either the assignee or on the CC list.
+For builds against staging branches (master-next, stable/dunfell-nut, etc), attempt to identify what patch in the branch is likely responsible for the failure. For example, if <tt>wget</tt> fails with <tt>libgnutls</tt> errors and there is a GnuTLS upgrade in the branch, that is the likely candidate.  If a patch can be identified reply on the mailing list with the failure details.  If it isn't obvious which patch is responsible for the failure, or a patch can be identified but it has been merged to the release branch, then [[#Filing_bugs | file a bug]] and ensure the branch owner is either the assignee or on the CC list.
 '''If in doubt, file a bug'''. All observed errors must be actioned unless a patch has already been sent for the issue, in which case please make note of this in the [[BuildLog]].

Yocto Build Failure Swat Team: Difference between revisions

Revision as of 15:17, 16 November 2020

Contents

Overview

Process

Identify

Report

Filing bugs

Process summary

Members

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools