Adventures in the transition from C to Cocoa.

Sunday, October 7, 2007

gdb-jutsu

gdb is a debugger used to find (and hopefully fix!) bugs in programs you're writing. However, it also works on programs you aren't writing (i.e. you don't have the source code, and thus probably can't fix bugs without some super heavy lifting). Since I've normally used it with C/C++ programs, I'm more familiar with those usages than with my current Objective-C usage, which has some different semantics.

The coolest thing ever that I discovered was this (brace yourself):
(gdb) call (char*)[0x3018 UTF8String]
2007-09-09 21:32:23.311 paramsTest[4759] *** _NSAutoreleaseNoPool(): Object 0x4081f0 of class NSCFData autoreleased with no pool in place - just leaking
$4 = 0x408210 "coooool!"


Ignoring the "2007-09... - just leaking" part (which is a message from the program itself, not from gdb), we have this:

(gdb) call (char*)[0x3018 UTF8String]
$4 = 0x408210 "coooool!"


The reason why this is incomprehensibly cool is this: the little 0x3018. Normally you'd have to do something like this: [someString UTF8String] to get useful information. But if you don't have the name of "someString", that's kind of difficult. But 0x3018, that's the address of the object, and anyone can get an object's address. Of course, internally they're the exact same thing, so it's not really doing anything magical. It's just the simplicity with which it works that interests me.

(The syntax here is something like this: [{object} {method}] where object is some object you're doing something with, and method is what the "something" you're doing. In this example, the object would be "someString", and the method would be "UTF8String" which provides us with the UTF8-encoded version of the contents of the string. In English, we're asking the string to tell us what it looks like if we printed it on screen.)

The coolness continues: what if we don't even have the address of the object, but know which argument it is to the current method? This is a pretty typical situation when we're breaking on a certain method's invocation. In this case, we have to use our trusty rusty stack pointer!

In Objective-C, our stack has extra data tossed in for the ObjectiveC runtime. In contrast, C/C++ arguments pretty much correspond 1-to-1 with items on the stack. In Objective-C, ebp+0 is our stack frame or something? I don't really know what it points to, but it's somewhere on the stack. ebp+4 is our return address, ebp+8 is our object's self pointer and ebp+12 is our selector, or method. Our first argument starts at ebp+16, and from there each argument is +4 more. This means arg2 would be at ebp+20, etc. In this example, we'll want to inspect the second argument. we also don't want to fiddle around with extracting the address manually. That's what the computer is there for anyway, isn't it? Let's see a demo:

(gdb) call (char*)[*(int)($ebp+20) UTF8String]
$37 = 0x407b10 "coooool!"


Woah, the same thing! you can see the ebp+20 in there too, if you look closely.

A few data points about that command. "(char*)" at the beginning is required to tell gdb how to handle the result. char* basically means "string" in this case. Our object's address is a nightmareish glob of mess: "*(int)($ebp+20)" the * at the beginning tells us to dereference the pointer we're going to feed it. This is used because our data on the stack is actually a pointer to another pointer.

Next we have "(int)" which you may recognize as a typecast. we do this to tell gdb to interpret the data as an integer (as opposed to a floating point value, a string, a byte, etc). Without this, gdb doesn't know how to calculate the address because it doesn't know how much data you're dealing with.

Finally, the understandable "($ebp+20)". In gdb, registers are prefixed with dollar-signs to keep them separate from gdb keywords and in-program variables. The whole thing is in parenthesis so that the typecast above applies to both, not just $ebp. Without it, we'll generate an incorrect address and get a fun warning like Cannot access memory at address 0x2f737467.

If you somehow forget which method you're in, or the object does something ugly with message passing, you can find out what selector you're in with this: (gdb) x/s *(int)($ebp+12) which will then tell you something like this:
0x2fac : "setStuff:otherThings:andStr:"setStuff:otherThings:andStr: is our terribly-named example selector for this exercise :)

So there you have it. Actually, there I have it too, since I'll probably use this for reference in the future (I always forget nuggets like this when I need them). Now we can manipulate objects at our whim in any running Objective-C program without having a single line of source code. Pretty dangerous :)

No comments:

Categories