Adventures in the transition from C to Cocoa.

Saturday, October 13, 2007

More gdb-jutsu, 20071013

I guess, since Cocoa is already fairly well covered by a billion sources (including Apple's examples in /Developer/Examples/ stuff, which gives a decent introduction to most technologies available), that I'll shift focus somewhat into the darker side of Cocoa development: debugging, reverse engineering, and modifying applications.

I've spent an embarrassing amount of time reverse engineering a couple programs lately, and I've discovered a couple tips that can help us in this adventure.

First, a handy list of tools:

  • ClassDump

  • OCDisasm

  • gdb

The first two are amazingly handy. ClassDump basically takes an existing framework, and generates all the header files for interacting with the objects used in it. This includes the spiffy undocumented features Apple uses to make their applications spiffier than everyone else's. I'll probably do some exploring here as I transition into GUI stuff from plugin stuff as time goes on.

OCDisasm is a tender little program. It's unfinished, and probably abandoned. However, it's still handy (at least, until leopard comes out, then it will be useless). OCDisasm takes an application, and provides you with the disassembled methods used on the objects. The nice part is that it interprets common constructs with a bit of intelligence most of the time. The handiest feature I've grown to enjoy is the automatic method name placed inline in the code.

The unfortunate part is that OCDisasm only handles the PPC branch of an application. Since I know very little PPC assembly, it is of limited use in non-trivial code.

This is where tonight's gdb-jutsu comes in to play.

When disassembling a function using x/i [address] in gdb, you get unadorned disassembled output on the screen. It usually looks something like this:

0x4f9cf47 <-[SomeObject someMethod:arg:arg2:]+652>: mov 268094972(%ebx),%eax
0x4f9cf4d <-[SomeObject someMethod:arg:arg2:]+658>: mov %eax,4(%esp)
0x4f9cf51 <-[SomeObject someMethod:arg:arg2:]+662>: mov %edi,(%esp)
0x4f9cf54 <-[SomeObject someMethod:arg:arg2:]+665>: call 0x14f45032

The first column is the address in memory. The second column, between the < and >, is the method plus offset (how far into the function, in bytes). And finally, the disassembled code.

Messages are sent to objects using the dyld_stub_objc_msgSend function, seen at the end there. While it looks nothing like its pretty Objective-C counterpart, that's how it works in low-level space.

The unfortunate part is that All messages look like that (and even some functions, particularly CoreGraphics stuff), so it's difficult to tell which message an object is getting.

Thankfully, figuring this information out is quite simple, and lies in the preceding instructions.

Objective-C messages have selectors, which are stored as strings in the executable file. When an dyld_stub_objc_msgSend is called, it takes this selector as one of its arguments. Specifically, stuff that winds up landing at 4(%esp) ends up being our selector.

So, let's start looking at what gets put there. As you probably noticed, a few instructions above the call we have %eax land there. And one instruction before that, we have 268094972(%ebx) land in %eax. With just this knowledge, we can find our selector using this handy command:

(gdb) x/s *($ebx+268094972)
0x4ff0f04 <__FUNCTION__.100691+1516>: "CGLPixelFormatObj"

As you can see, we examine the data located where the program loads the selector from, and we can see our selector as text. Simple as that!

With a bit of prodding, it's also possible to grab other data types using similar methods. However, there isn't a simple rule to extract data, so you might need to play with it some to get it to work. I'll try to explain this more in a future article.

No comments: