I often have folks asking how the text & video consoles work on OpenPOWER machines. Here's a bit of a rundown on how it's implemented, and what may seem a little different from x86 platforms that you may already be used to.
On POWER machines, we get the console up and working super early in the boot process. This means that we can get debug, error and state information out using text console with very little hardware initialisation, and in a human-readable format. So, we tend to use simpler devices for the console output - typically a serial UART - rather than graphical-type consoles, which require a GPU to be up and running. This keeps the initialisation code clean and simple.
However, we still want a facility for admins who are more used to a keyboard & monitor directly plugged-in to have a console facility too. More about that later though.
The majority of OpenPOWER platforms will rely on the attached management controller (BMC) to provide the UART console (as of November 2016: unless you've designed your own OpenPOWER hardware, this will be the case for you). This will be based on ASpeed's AST2400 or AST25xx system-on-chip devices, which provide a few methods of getting console data from the host to the BMC.
Between the host and the BMC, there's a LPC bus. The host is the master of the LPC bus, and the BMC the slave. One of the facilities that the BMC exposes over this bus is a set of UART devices. Each of these UARTs appear as a standard 16550A register set, so having the host interface to a UART is very simple.
As the host is booting, the host firmware will initialise the UART
console, and start outputting boot progress data. First, you'll see the
ISTEP
messages from hostboot, then skiboot's "msglog"
output, then the kernel output from the petitboot bootloader.
Because the UART is implemented by the BMC (rather than a real hardware UART), we have a bit of flexibility about what happens to the console data. On a typical machine, there are four ways of getting access to the console:
- Direct physical connection: using the DB-9 RS232 port on the back of the machine;
- Over network: using the BMC's serial-over-LAN interface, using
something like
ipmitool [...] sol activate
; - Local keyboard/video/mouse: connected to the VGA & USB ports on the back of the machine, or
- Remote keyboard/video/mouse: using "remote display" functionality provided by the BMC, over the network.
The first option is fairly simple: the RS232 port on the machine is actually controlled by the BMC, and not the host. Typically, the BMC firmware will just transfer data between this port and the LPC UART (which the host is interacting with). Figure 1 shows the path of the console data.
The second is similar, but instead of the BMC transferring data between the RS232 port and the host UART, it transfers data between a UDP serial-over-LAN session and the host UART. Figure 2 shows the redirection of the console data from the host over LAN.
The third and fourth options are a little more complex, but basically involve the BMC rendering the UART data into a graphical format, and displaying that on the VGA port, or sending over the network. However, there are some tricky details involved...
UART-to-VGA mirroring
Earlier, I mentioned that we start the console super-early. This happens way before any VGA devices can be initialised (in fact, we don't have PCI running; we don't even have memory running!). This means that it's not possible to get these super-early console messages out through the VGA device.
In order to be useful in deployments that use VGA-based management though, most OpenPOWER machines have functionality to mirror the super-early UART data out to the VGA port. During this process, it's the BMC that drives the VGA output, and renders the incoming UART text data to the VGA device. Figure 3 shows the flow for this, with the GPU rendering text console to the graphical output.
In the case of remote access to the VGA device, the BMC takes the contents of this rendered graphic and sends it over the network, to a BMC-provided web application. Figure 4 illustrates the redirection to the network.
This means we have console output, but no console input. That's okay though, as this is purely to report early boot messages, rather than provide any interaction from the user.
Once the host has booted to the point where it can initialise the VGA device itself, it takes ownership of the VGA device (and the BMC relinquishes it). The first software on the host to start interacting with the video device is the Linux driver in petitboot. From there on, video output is coming from the host, rather than the BMC. Because we may have user interaction now, we use the standard host-controlled USB stack for keyboard & mouse control.
Remote VGA console follows the same pattern - the BMC captures the video data that has been rendered by the GPU, and sends it over the network. In this case, the console input is implemented by virtual USB devices on the BMC, which appear as a USB keyboard and mouse to the operating system running on the host.
Typical console output during boot
Here's a few significant points of the boot process:
3.60212|ISTEP 6. 3 4.04696|ISTEP 6. 4 4.04771|ISTEP 6. 5 10.53612|HWAS|PRESENT> DIMM[03]=00000000AAAAAAAA 10.53612|HWAS|PRESENT> Membuf[04]=0C0C000000000000 10.53613|HWAS|PRESENT> Proc[05]=C000000000000000 10.62308|ISTEP 6. 6
- this is the initial output from hostboot, doing early hardware initialisation in discrete "ISTEP"s
41.62703|ISTEP 21. 1 55.22139|htmgt|OCCs are now running in ACTIVE state 63.34569|ISTEP 21. 2 63.33911|ISTEP 21. 3 [ 63.417465577,5] SkiBoot skiboot-5.4.0 starting... [ 63.417477129,5] initial console log level: memory 7, driver 5 [ 63.417480062,6] CPU: P8 generation processor(max 8 threads/core) [ 63.417482630,7] CPU: Boot CPU PIR is 0x0430 PVR is 0x004d0200 [ 63.417485544,7] CPU: Initial max PIR set to 0x1fff [ 63.417946027,5] OPAL table: 0x300c0940 .. 0x300c0e10, branch table: 0x30002000 [ 63.417951995,5] FDT: Parsing fdt @0xff00000
- here, hostboot has loaded the next firmware stage, skiboot, and we're now executing that.
[ 22.120063542,5] INIT: Waiting for kernel... [ 22.154090827,5] INIT: Kernel loaded, size: 15296856 bytes (0 = unknown preload) [ 22.197485684,5] INIT: 64-bit LE kernel discovered [ 22.218211630,5] INIT: 64-bit kernel entry at 0x20010000, size 0xe96958 [ 22.247596543,5] OCC: All Chip Rdy after 0 ms [ 22.296864319,5] Free space in HEAP memory regions: [ 22.304756431,5] Region ibm,firmware-heap free: 9b4b78 [ 22.322076546,5] Region ibm,firmware-allocs-memory@2000000000 free: 10cd70 [ 22.341542329,5] Region ibm,firmware-allocs-memory@0 free: afec0 [ 22.392470901,5] Total free: 11999144 [ 22.419746381,5] INIT: Starting kernel at 0x20010000, fdt at 0x305dbae8 (size 0x1d251)
next, the skiboot firmware has loaded the petitboot bootloader kernel (in zImage.epapr format), and is setting up memory regions in preparation for running Linux.
zImage starting: loaded at 0x0000000020010000 (sp: 0x0000000020e94ed8) Allocating 0x1545554 bytes for kernel ... gunzipping (0x0000000000000000 <- 0x000000002001d000:0x0000000020e9238b)...done 0x13c0300 bytes Linux/PowerPC load: Finalizing device tree... flat tree at 0x20ea1520 [ 24.074353446,5] OPAL: Switch to little-endian OS -> smp_release_cpus() spinning_secondaries = 159 <- smp_release_cpus() <- setup_system()
we then get the output from the zImage wrapper, which expands the actual kernel code and executes it. In recent firmware builds, the petitboot kernel will suppress most of the Linux boot messages, so we should only see high-priority warnings or error messages.
next up, the petitboot UI will be shown:
Petitboot (v1.2.3-a976d01) 8335-GCA 2108ECA ────────────────────────────────────────────────────────────────────────────── [Disk: sda1 / 590328e2-1095-4fe7-8278-0babaa9b9ca5] Ubuntu, with Linux 4.4.0-47-generic (recovery mode) Ubuntu, with Linux 4.4.0-47-generic Ubuntu [Network: enP3p3s0f3 / 98:be:94:67:c0:1b] Ubuntu 14.04.x installer Ubuntu 16.04 installer test kernel System information System configuration Language Rescan devices Retrieve config from URL *Exit to shell ────────────────────────────────────────────────────────────────────────────── Enter=accept, e=edit, n=new, x=exit, l=language, h=help
During Linux execution, skiboot will retain control of the UART
(rather than exposing the LPC registers directly to the host), and
provide a method for the Linux kernel to read and write to this console.
That facility is provided by the OPAL_CONSOLE_READ
and
OPAL_CONSOLE_WRITE
calls in the OPAL API.
Which one should we use?
We tend to prefer the text-based consoles for managing OpenPOWER machines - either the RS232 port on the machines for local access, or IPMI Serial over LAN (SOL) for remote access. This means that there's much less bandwidth and latency for console connections, and there is a simpler path for the console data. It's also more reliable during low-level debugging, as serial access involves fewer components of the hardware, software and firmware stacks.
That said, the VGA mirroring implementation should still work well, and is also accessible remotely by the current BMC firmware implementations. If your datacenter is not set up for local RS232 connections, you may want to use VGA for local access, and SoL for remote - or whatever works best in your situation.