Yocto Build Failure Swat Team

From Yocto Project
Jump to navigationJump to search
Note: The SWAT process has changed. Please read the new process information (up to, and including, section 6). If you're already au-fait with the new process you may want the summary bullets.

Overview

The role of the SWAT team is to monitor the autobuilder and investigate all failures to ensure they are logged and brought to the attention of a suitable owner.

Scope

All builds run on the public autobuilder are important for the Yocto Project, whether they be a post-merge validation run (for master or a release branch) or a pre-merge test build (for master-next, ross/mut and others). Any build should be monitored by the SWAT team unless the BuildLog entry for that build indicates otherwise. That is; SWAT is opt-out by whomever triggers a build on the Autobuilder, not opt-in.

Pre-triage

SWAT isn't responsible for resolving issues encountered on the Autobuilder. Their focus is on performing minimal analysis of a failure in order to ensure that it is logged and brought to the attention of a suitable owner, a process we'll refer to as pre-triage.

Rotation Process

The active member rotation takes place weekly at the end of Friday. Usually, this will take a simple round robin order through the members list. In case the next person cannot take the role due to tight schedule, vacation or some other reasons, the role will be passed to the next person.

Roles

Active member: the currently active member of the SWAT team is expected to monitor the Autobuilder and pre-triage failures in a timely fashion. Team members are active for one week at a time.

SWAT Chair: the SWAT chair provides backup cover for the active member and is a first point of contact for SWAT. Tracy Graydon is the current SWAT Chair.

SWAT Facilitator: the SWAT facilitator is responsible for managing the rotation process. Stephen Jolley is the current SWAT Facilitator.

Process

The BuildLog wiki page is automatically updated by the autobuilder when a new build is triggered. An entry should include a reason for triggering the build (as entered in the "Reason" field of the autobuilder "Force build" page when triggering a build) and may also include detail on what the expectations for the build are. For each build failure that occurs the active SWAT member is responsible for pre-triaging the failure.

The pre-triage of failures takes two forms:

  • for builds against a master or release branch of the poky repo any issues observed should be filed in bugzilla.
  • for builds against other branches (master-next, ross/mut, -next branches for stable releases, etc.), where an issue is caused by a patch not in the master branch, the relevant unmerged patch causing the problem should be replied to on the mailing list.
    • When it isn't obvious which patch caused the failure file an issue in bugzilla and alert the branch owner (CC or assignment on the bug should suffice).
    • If in doubt file a bug, all observed errors must be actioned unless a patch has already been sent for the issue (in which case please make note of this in the BuildLog).
    • Infrastructure issues can be filed in bugzilla, where they will be assigned to Michael Halstead (halstead on irc) by default.
    • Autobuilder logic bugs also go into bugzilla, where they will be assigned to Joshua Lock by default.

The results of pre-triage for an issue should be added to the corresponding entry in the BuildLog, including a link to the pre-triage outcome (bugzilla entry, mailing list post in the archive, etc) and a brief summary of the issue.

Every build failure should be addressed in the BuildLog. If it is a known issue, an entry with a single line containing "Known Issue" is sufficient (a link to further detail is, of course, much better). This assures others that the failure has been looked at and is being worked on.

Filing bugs

When filing the bug, please:

  • cut and paste the relevant error in the bug comment, and include the log file as an attachment
  • include the log from the CreateAutoConf step as an attachment (this ensures the assignee and triage team can quickly asses this issue)
  • include a pointer to the ErrorLog page associated with the failure (as ErrorLog)
Note: Autobuilder logs are non-persistent, feel free to include a link to the log in a bug report but be sure to also attach a copy of the log and include relevant sections copy/pasted into the bug.
Note: Sometimes, failures occur on autobuilders on private company networks. Do not post links into the bugzilla for these failures as nobody else can access them.

Process summary

  • Monitor builds via one (or more) of:
  • Pre-triage each failure:
    • File a bugzilla ticket OR respond to a patch OR note known issues
    • Update the BuildLog with the result of pre-triage, linking to issues/mail archives when possible

Questions / Contact

If you have queries about the SWAT process you may reach out to the SWAT Facilitator Stephen Jolley and the SWAT Chair Tracy Graydon.

Members

  • Ross Burton (UK)
  • Juro Bystricky (US)
  • Paul Eggleton (NZ)
  • Tracy Graydon (US) (SWAT Chair)
  • Stephano Cetola (US)
  • Maxin John (FI)
  • Rebecca Chang (Penang)
  • California (Cal) Sullivan (US)
  • Armin Kuster (California)