Yocto Build Failure Swat Team: Difference between revisions

From Yocto Project
Jump to navigationJump to search
Line 19: Line 19:


=== Identify ===
=== Identify ===
To be notified when a build fails subscribe to the [https://lists.yoctoproject.org/g/yocto-builds yocto-builds] mailing list.  This is sent a mail when a build fails, which includes direct links to the [https://autobuilder.yoctoproject.org/ autobuilder job summary], the [[BuildLog]], and the [https://errors.yoctoproject.org/Errors/Latest/Autobuilder/ Error Reporting Service].  The mail will also state if it is expected that the build is triaged by Swat, and so check this to see if the failures can be ignored as the build owner is taking full responsibility.


There are several services that are used when monitoring the builds:
There are several services that are used when monitoring the builds:
Line 25: Line 27:
* The [[BuildLog]] is a wiki page that is updated automatically when builds fail with links to the appropriate logs, and is where Swat adds notes explaining the failures and resolution. For example, "glib upgrade causes multilib failures, replied on mailing list" or "new qemu hangs on PPC, filed bug #1234".
* The [[BuildLog]] is a wiki page that is updated automatically when builds fail with links to the appropriate logs, and is where Swat adds notes explaining the failures and resolution. For example, "glib upgrade causes multilib failures, replied on mailing list" or "new qemu hangs on PPC, filed bug #1234".


To be notified when a build fails subscribe to the [https://lists.yoctoproject.org/g/yocto-builds yocto-builds] mailing list.  This is sent a mail when a build fails, which includes direct links to the [https://autobuilder.yoctoproject.org/ autobuilder job summary], the [[BuildLog]], and the [https://errors.yoctoproject.org/Errors/Latest/Autobuilder/ Error Reporting Service].  The mail will also state if it is expected that the build is triaged by Swat.
Alternatively, these services can be manually monitored.


Both the mail notification and the [[BuildLog]] will include notes from the build owner, so check this for any useful context.  For example, it may request that failures are reported directly to a specific person instead of bugs created, or that particular failures that are expected.
Both the mail notification and the [[BuildLog]] will include notes from the build owner, so check this for any useful context.  For example, it may request that failures are reported directly to a specific person instead of bugs created, or that particular failures that are expected.

Revision as of 16:26, 22 December 2020

Overview

All builds that are run on the public autobuilder are important for the Yocto Project, whether they be routine validation runs (master or release branches) or a pre-integration test builds (master-next, stable/*, and others). Random failures if ignored accumulate and can result in a significant number of builds failing.

The role of the Bug Swat Team is to monitor the autobuilder and do preliminary investigation of failures, to ensure that they are logged and brought to the attention of the appropriate owner.

Importantly, the Swat Team isn't responsible for resolving issues encountered on the autobuilder, simply just enough analysis so that it can be logged and the appropriate owner notified.

Each week a different member of the team is on call. Every build that fails on the autobuilder should be monitored unless told otherwise. The rotation happens at the end of Friday (deliberately vague), any failures over the weekend should be triaged by the incoming member on Monday.

The Swat Chairs are the primary contact for the Swat Team. The current Swat Chairs are Ross Burton, Armin Kuster and Richard Purdie. The Chairs are assisted by Stephen K. Jolley who handles the rotation process. If the person currently on call, or about to be on call, can no longer perform their duties, then they should contact Stephen to arrange a replacement.

Process

The process is simply three steps:

  1. Identify build failures
  2. Report the build failures
  3. Update the build log

Identify

To be notified when a build fails subscribe to the yocto-builds mailing list. This is sent a mail when a build fails, which includes direct links to the autobuilder job summary, the BuildLog, and the Error Reporting Service. The mail will also state if it is expected that the build is triaged by Swat, and so check this to see if the failures can be ignored as the build owner is taking full responsibility.

There are several services that are used when monitoring the builds:

  • The Autobuilder 'Yocto Console View' is an overview of the top-level builds (a-full and a-quick) and the sub-builds they trigger.
  • The Error Reporting Service archives errors from the autobuilder.
  • The BuildLog is a wiki page that is updated automatically when builds fail with links to the appropriate logs, and is where Swat adds notes explaining the failures and resolution. For example, "glib upgrade causes multilib failures, replied on mailing list" or "new qemu hangs on PPC, filed bug #1234".


Both the mail notification and the BuildLog will include notes from the build owner, so check this for any useful context. For example, it may request that failures are reported directly to a specific person instead of bugs created, or that particular failures that are expected.

Report

There are two categories of builds that Swat will be monitoring: official branches and staging branches. The official branches are the primary top-level branches in Poky, that is master and all of the release branches (gatesgarth, dunfell, etc). The staging branches are where patches are held for testing, such as master-next, stable/dunfell-nmut, or ross/mut.

Communication is important: if the build owner is on IRC then it's always worth discussing issues with them first as they may have further context and directions. Also, if the build owner triages the build failures then they must update the BuildLog so that Swat doesn't duplicate the work.

Official Branches

For builds of official branches, that is master or a release branch, all failures or warnings are critical and must be filed in Bugzilla.

First search Bugzilla and check the BuildLog to see of the issue is already known. For example, there are some bugs that occur intermittently and are already filed with AB-INT in the whiteboard field.

Every failure must be in Bugzilla, and the BuildLog updated with a link to the relevant bug.

Staging Branches

For builds against staging branches which contain patches under test for integration into an official branch (such as master-next, stable/dunfell-nut, ross/mut, etc), first attempt to identify if there is a patch in the branch that is likely to be responsible for the failure. For example, if wget fails with libgnutls errors and there is a GnuTLS upgrade in the branch, then that is a likely candidate. If a patch can be identified that hasn't yet been merged into an official branch, then reply to the patch on the mailing list with the details. If it isn't obvious which patch is responsible for the failure, or a patch can be identified but it has already been merged to the release branch, then file a bug and ensure the branch maintainer (see the Releases page for names) is either the assignee or on the CC list. In both situations the BuildLog must be updated with details of what action was taken (mailed the list, filed a bug, and so on).

If in doubt, file a bug. All errors must be actioned unless a patch has already been sent for the issue, in which case please make note of this in the BuildLog.

If the issue is in the infrastructure or autobuilder itself then file a bug against "Infrastructure: Autobuilder", infrastructure bugs should be assigned to Michael Halstead and autobuilder logic bugs to Richard Purdie.

Filing bugs

When filing the bug, several items must be included:

Members

  • Armin Kuster
  • Lee Chee Yang