Qemuppc Boot Hangs

From Yocto Project
Revision as of 13:49, 16 November 2017 by Rpurdie (talk | contribs)
Jump to navigationJump to search

qemuppc is randomly hanging at boot during sanity testing. For example:

Despite initial suspicions this was not due to entropy problems.

It is *not*:

  • Specific to a host OS (observed on centos7, debian8, ubuntu1710)
  • Specific to image recipe (happens sato, sato-sdk, lsb, lsb-sdk, minimal, full-cmdline)
  • Specific to DISTRO (poky and poky-lsb)

The hang usually occurs after:

[    6.667445] udevd[104]: starting version 3.2.2
[    6.743262] udevd[105]: starting eudev-3.2.2

and is hanging in the poll() call in udevadm-settle.c:adm_settle(), it doesn't seem to return from the syscall. Further tracing confirms that it goes into poll_schedule_timeout() in fs/select.c:do_poll() and doesn't come out again. The time parameters all look sane so its likely the kernel locking up or sleeping and failing to wake.

In some cases the boot proceeds further and hangs in the dropbear key creation.

The timing of the usb-tablet device does vary in the boot logs however the hang has been observed with the usb-tablet device removed.

It seems that running a build in parallel with testimage does help the issue occur more frequently.

I wondered about a connection with host cpufreq changes. In order to test, I booted with "intel_pstate=disable" on the kernel commandline and tried: while true; do cpupower frequency-set -f 1200000 > /dev/null; sleep 0.25; cpupower frequency-set -f 2200000 > /dev/null; done along with setting to max cpu freq and also setting just to min cpu freq. I've observed hangs in all combinations although the likelyhood of a hang seemed higher with the alternating frequency. I also tried the script:

import time
import random
import subprocess 

cpus = range(0,87)
freqs = range(1200000, 2200000, 100000)

while True:
    for cpunum in cpus:
        f = random.choice(freqs)
        with open("/sys/devices/system/cpu/cpu%d/cpufreq/scaling_setspeed" % cpunum, 'w') as speed:
            speed.write("%d\n" % f)

    time.sleep(0.05)

which also seems to make hangs more likely somehow.