Kernel Debugging

The kernel contains bugs. Period. Fact of life. Some are just small annoyances, others are show stoppers. Bugs in the latter category are usually (and fortunately) easy to track down given enough information.

In this section I will try to explain what you need to report when you find a bug so it is easier for others to fix it. If you feel like looking into the problem yourself (please do!) this may be used as a brief guide of what to look for.

Interactive Debugging

Newer kernels contain a stub which makes it possible to debug the kernel in GDB via a serial line from another Linux host. This is the way to debug something since you have access to registers and memory while the system is still alive - or at the time of its death.

[Unfortunately, I don't know if this interface is working properly. I hope to get it tested RSN. I will add some more text about it when I know more.]

Non-interactive Debugging

Sometimes interactive debugging is not an option and you have to rely on printk calls. Basically you insert some printks with different texts in places of the kernel where you suspect there is a problem.

You can ouput variable values and thereby check that the values are what you expect them to be. Output strings throughout the flow of a function to check that it executes the expected if-statements, does the correct number of iterations or to get an idea of where it hangs. Use your imagination and be generous - each time you decide to insert more printks you have to rebuild the kernel and reboot; you might as well put in plenty to start with.

Debugging head.S is a bit more tricky. I use a handful of serial-print assembly routines by Roman Zippel. These can be found in the file arch/ppc/kernel/debug.h - include it from head.S, call initserial from somewhere in the top of the code and use the other functions to output characters/values where you want to investigate something. You must use functions foo when memory mapping is disabled and functions foo2 when memory mapping is enabled.

Debug Options

The text from the printks go to the console. Sometimes it makes sense to output the text to memory so you can save it for later reference or analysis. Use debug=mem as an option to bootstrap when booting and after reseting, run dmesg (from the 'misc' directory at SunSITE Denmark) under Amiga DOS. It should find the text.

You can also use the option debug=ser which will output the text over the builtin serial line at 9600 baud.

Standard Debug Output

If a user application causes an exception it will be terminated without affecting any other application running. However, if the kernel causes an exception there is no way of handling it gracefully. In those situations the kernel will dump some information on the console which may help developers track the problem down. It may look like this:

	    NIP: C00D4EA4 XER: 00000000 LR: C00D6E64 REGS: c3c83bd0 TRAP: 0700 MSR:
	    00089972 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
	    TASK = c3c82000[130] 'mingetty' mm->pgd c2856000 Last syscall: 4  last math
	    00000000
	    GPR00: 00000000 C3C83CC0 C3C82000 C02C2FC0 00000002 00000000 00000019 00000050 
	    GPR08: 00000000 C01528B4 C02C2FC0 00000000 3004E158 0024BB94 C017C6E0 C017C6E0 
	    GPR16: C3EAC000 00000002 C0150000 C01704F8 00000001 00000000 00000000 C3C83D8A 
	    GPR24: 00203578 C3F5C4A0 00000001 00000000 C02C30D4 00000000 000007D0 00000000 
	    Call backtrace: 
	    C004D580 C00D6E64 C00D7C94 C00D8710 C00CA630 C00CC4F4 C00C7988 
	    C0026818 C000447C 00201570 00201D18 002010B8 
	    Kernel panic: Exception in kernel pc c00d4ea4 signal 4
	    Rebooting in 180 seconds..	  

The NIP (next instruction pointer) tells what instruction caused the exception and the contents of the other registers can help explain why. The call backtrace shows the call sequence resulting in the exception allowing you to examine those functions (in the source code) and possibly insert a breakpoint (GDB) or printks to help track down the problem.

However, since these addresses are specific to the kernel you are running, a dump like the above is not necessarily of much help to kernel hackers. In order to help them you need to use the tool ksymoops (source found in the scripts directory of the kernel sources) which will use the System.map file to produce a more usable dump you can send to the kernel list.

When you have found out which function the exception was caused by (look up the address in the System.map file) you might be able to make a quick fix. If the function belongs to a driver that is not properly supported you can disable it when doing make config. If the driver accepts kernel options you might want to use some other options than the default or the ones you used when the exception occurred.

Other Helpful Information

The exception you experience may be caused by many things. Kernel hackers may have an idea of what is most likely to be the culprit. You can help them narrow down the list of potential problems by including the following information:

  • The bootstrap line you use to start the kernel (so we can see what options you used).

  • The output from bootstrap when using the -d option (so we can see what configuration you have).

  • The kernel debug output (use debug=mem and dmesg as described earlier).

  • If you need to do something in particular to cause the exception, tell us so we can try to reproduce it.