Profiling

From Yocto Project
Jump to navigationJump to search

Tips and tricks for profiling the build system

Using bitbake -P

This provides a trace about where the bitbake server is spending its time. This used to be really great for profiling all aspects of bitbake. When the -P option is added to any usual commandline, bitbake will save out a profile.log file which contains the raw python profiling information and a "profile.log.processed" file which contains a human readable version of the trace information. The processed file contains four different views of the data, one after the other. One type of chart shows the time spent in functions, the second type of chart shows the same information but includes callgraph information so we you see where the functions were called from. These two charts are printed twice, firstly sorted by internal function time and secondly be cumulative function time.

For the actual tasks themselves there is a patch, "build.py: Dump out performance data of individual tasks" available in the poky-contrib tree which enables each task to dump out performance information. Its still pending to integrate this patch into the main tree in some form.

The other big area where profiling is useful is during the parsing itself. Unfortunately this is a harder area to profile due to the parallelisation of parsing. We don't have a good way to profile this section of code at the moment.

Using pybootchart

This tool allows visualisation of the build process and task execution. Firstly, you need to have enabled the buildstats class in your local build config (buildstats in USER_CLASSES in local.conf). You can then runa command like:

<poky checkout>/scripts/pybootchartgui/pybootchartgui.py build/tmp/buildstats/bash-qemux86-64/201211190818/

where the second argument is a path to a given buildstats file of the build you want to visualise. By default, tasks taking less than 8 seconds are not graphed to remove short "noise" from the graph. The -m commandline option can be used to change the minimum task length to filter with, "0" being no filtering.

Common tasks are colour coded to aid visualisation. The graph can also be panned and scaled. The tool is good at spotting bottlenecks in builds and allowing the user to appreciate where time is being spent during the build.

If a directory is specified as a third option, graphic files are saved to the directory in svg format. For large graphs, these can be split into a number of files using the "-n" option.

Buildstats

The buildstats code writes out a data about processor and memory usage per task into the tmp/buildstats dirrectory if its enabled. We currently haven't done a large amount of analysis of that data.

Disk Usage

A simple "du" call can tell how much space a given build directory is taking up. Equally, the size of the sstate data or source downloads can also be measured. One thing to be aware of is we hardlink files to save disk space and increase performance.

Timing bitbake commands

Using commands in the form "time bitbake <options>" gives a good indication of how long a command takes and provides a simplistic way of measuring some aspects of performance.

Python memory usage

This is hard to measure but we have made worthwhile improvements in this area in the past. Bitbake takes significance advantage of "copy-on-write" behaviour both in the way it executes tasks and also the way the data store works internally.

So far the best results have been found using meliae. The best documentation of how to use this is in an email to bitbake-devel from Richard Purdie on 2012-02-18, subject "[bitbake-devel] Bitbake Memory Usage". http://lists.linuxtogo.org/pipermail/bitbake-devel/2012-February/001864.html

Patterns and notes from experience

The biggest speedups in the parser were in places where there were insane numbers of function calls, often in tight loops. By improving the algorithms, the same work could be done with much less overhead and corresponding speedups to the system. these clearly show up when run under bitbake -P.

Equally, where we keep computing particular values, caches have been shown to help massively and for better or worse, bitbake is littered with caches. Some of these are simple like the internal expansion cache in the datastore, some are more complex like the codeparser cache which has to deal with multithreading issues making its design challenging.

Several significant improvements in disk space usage were found where we were making copies of system data and moving files into place where possible helped significantly. We now also hardlink files instead of copying where we know they can't change.

For memory usage, we had issues where memory usage would spike after two bitbake runs. The issue was in the way the caches were pickled and unpickled where strings would no longer be "intern"'d. This lead to a spike in cache file size and then a spike in memory usage. We have a workaround in the code to ensure we keep strings as intern values.

We've also made great improvements by simplifying dependencies. For example having getttext-minimal-native available to provide the gettext macros and disabling NLS on native packages was a good simplification.

One tip to keep in mind is watching the processor usage in top or in the buildstats. If the values are lower than you'd expect, its worth seeing if there is a kind of "forkbomb" behaviour going on. We've seen cases where subprocesses were being rapidly forked in tight loops which sucked performance. Reworking the code to avoid this lead to performance improvements.

We used to use psyco to spee dup python. It is no longer maintained, only works on 32 bit systems and didn't seem to give much of a speedup so we ended up dropping the support.

Usage of sync calls in the build

We'd wondered if sync calls in teh build were causing any problem. To test this we tried two things, one was libeatmydata and secondly adding the same functionality to pseudo. The results were:

$time bitbake core-image-sato

real  107m9.684s
user  517m13.090s
sys   54m38.850s

$ time eatmydata bitbake core-image-sato

real  105m5.167s
user  521m47.480s
sys   54m27.170s

(http://www.flamingspork.com/projects/libeatmydata/)

$ time bitbake core-image-sato (with pseudo sync hack patch applied)

real  105m45.113s
user  520m4.430s
sys   54m54.060s

So sync is maybe worth a 1% overall speed difference (thanks Tom Zanussi for the numbers).

Areas we know we can potentially do better

Currently we do a copy operation for do_install -> do_populate_sysroot since the files may be customised and we do a second copy operation at sstate install time since again, files may be edited. Since we can tell which files we edit, we could see if we can hardlink and only copy when we edit.

If you clear the cache and parse using "bitbake -p -DD" you see a lot of messages about searching for files. Currently the files search paths are derived from overrides, it would be useful to remove some of the overrides we know we'll never use for FILESDIR/FILESPATH, hence speeding up the search operations.

On tracing profiles, bb.debug features predominantly for some workloads. We need a better event registration mechanism so handlers can filter events at source, hence removing execution overhead.

We build certain things when we don't need them such as doc generation in -native recipes. Adding PACKAGECONFIG[doc] options would be a good way to allow us to turn these off.

  • Move parts of build to tmpfs?
  • Split build over disks?
  • Disk mount options?
  • GL acceleration in QEMU? from Tizen work?
  • Multiple bitbakes in same directory
  • Psyco speedup for bitbake
  • sync calls?

Performance Metrics

This is a list of brainstormed potential performance metrics:

  • Boot performance with init systems (sysvinit vs. systemd)
  • How fast does image build (with/without sstate) (on which platforms? virtual machine?)
  • Extend build time test to -lsb image, not just sato. world-image?
  • Overall image boot time
  • Hob Profiling - (threading?)
  • Hob delays in user interaction
  • Postinstall scripts (offline vs first boot)
  • Distributed Compiliation
  • Blocking screens waiting for response (eclipse)
  • Time taken to deploy into remote target during development
  • Automated tests on target for performance (compare with history and other distros)
  • Measure how IO bound we are? SSD vs. HDD?
  • Compare packaging backends (both creation and installing)
  • Document best known configs for performance?
  • Bitbake overhead for non-execition (bitbake <Idontexist>)
  • Track buildstats in csv?
  • Track sizes of images? Track time to build? Use to estimate future?
  • Time of debugging cycle during development
  • Virtual machine overheads
  • Track disk usage using existing disk monitoring code?