Adventures in the transition from C to Cocoa.

Wednesday, October 31, 2007

NSInvocation and version detection

After upgrading from Tiger to Leopard, We've had to deal with several interface changes (mostly because we're using undocumented API/SPI stuff). To make code that still works and compiles on both, we need to create dynamic messages to get past the compiler checks and to use different objects/methods depending on the OS version.

Solving version detection is trivial. There are a few places that document it, but they miss what I consider to be the simplest and most reliable way:


NSData *sysVerData = [[NSData alloc] initWithContentsOfFile:@"/System/Library/CoreServices/SystemVersion.plist"];
NSDictionary *sysVer = [NSPropertyListSerialization propertyListFromData: sysVerData
mutabilityOption: NSPropertyListImmutable
format: NULL errorDescription: &errorString];
NSComparisonResult compare = [[sysVer objectForKey:@"ProductVersion"] compare:@"10.5"];


That's right. we get our information from the same place as sw_vers. If you think your app will be used on OS X Server, you'll want to change "SystemVersion.plist" to "ServerVersion.plist".

Then, on to NSInvocation.

NSInvocation is a way to invoke methods on objects dynamically. It's a bit tricky, and obviously a bit dangerous. However, with proper checking you can be perfectly safe.

Here's a trivial example of NSInvocation.


#import

@interface anObject : NSObject
-(void) aMethod:(NSString*)arg;
@end

@implementation anObject
-(void) aMethod:(NSString*)arg
{
NSLog(@"aMethod with %08x (%@) %@",arg, [arg className], arg);
}
@end

int main()
{
NSAutoreleasePool *p = [[NSAutoreleasePool alloc] init];
NSString *arg = @"normal";

anObject *a = [[anObject alloc] init];

[a aMethod:nil];
[a aMethod:arg];

NSInvocation *inv = [NSInvocation invocationWithMethodSignature:[anObject instanceMethodSignatureForSelector:@selector(aMethod:)]];
[inv setSelector:@selector(aMethod:)];
[inv setTarget: a];
NSLog(@"built invocation %08x",inv);

[inv setArgument:&arg atIndex:2];

NSLog(@"Set argument %08x",(void*)arg);

[inv invoke];

[p release];
return 0;
}


Important note that I ran into: when using [NSInvocation setArgument:..] objects need to be prefixed with an ampersand, "&". Without this, you'll get wonky results. If you're passing non-objects, you shouldn't use the ampersand.

There are a bunch of checking methods to see how many args a method supports and all that good stuff. You should really read the Documentation to get a good handle on what's going on.

Saturday, October 13, 2007

More gdb-jutsu, 20071013

I guess, since Cocoa is already fairly well covered by a billion sources (including Apple's examples in /Developer/Examples/ stuff, which gives a decent introduction to most technologies available), that I'll shift focus somewhat into the darker side of Cocoa development: debugging, reverse engineering, and modifying applications.

I've spent an embarrassing amount of time reverse engineering a couple programs lately, and I've discovered a couple tips that can help us in this adventure.

First, a handy list of tools:


  • ClassDump

  • OCDisasm

  • gdb



The first two are amazingly handy. ClassDump basically takes an existing framework, and generates all the header files for interacting with the objects used in it. This includes the spiffy undocumented features Apple uses to make their applications spiffier than everyone else's. I'll probably do some exploring here as I transition into GUI stuff from plugin stuff as time goes on.

OCDisasm is a tender little program. It's unfinished, and probably abandoned. However, it's still handy (at least, until leopard comes out, then it will be useless). OCDisasm takes an application, and provides you with the disassembled methods used on the objects. The nice part is that it interprets common constructs with a bit of intelligence most of the time. The handiest feature I've grown to enjoy is the automatic method name placed inline in the code.

The unfortunate part is that OCDisasm only handles the PPC branch of an application. Since I know very little PPC assembly, it is of limited use in non-trivial code.

This is where tonight's gdb-jutsu comes in to play.

When disassembling a function using x/i [address] in gdb, you get unadorned disassembled output on the screen. It usually looks something like this:


0x4f9cf47 <-[SomeObject someMethod:arg:arg2:]+652>: mov 268094972(%ebx),%eax
0x4f9cf4d <-[SomeObject someMethod:arg:arg2:]+658>: mov %eax,4(%esp)
0x4f9cf51 <-[SomeObject someMethod:arg:arg2:]+662>: mov %edi,(%esp)
0x4f9cf54 <-[SomeObject someMethod:arg:arg2:]+665>: call 0x14f45032


The first column is the address in memory. The second column, between the < and >, is the method plus offset (how far into the function, in bytes). And finally, the disassembled code.

Messages are sent to objects using the dyld_stub_objc_msgSend function, seen at the end there. While it looks nothing like its pretty Objective-C counterpart, that's how it works in low-level space.

The unfortunate part is that All messages look like that (and even some functions, particularly CoreGraphics stuff), so it's difficult to tell which message an object is getting.

Thankfully, figuring this information out is quite simple, and lies in the preceding instructions.

Objective-C messages have selectors, which are stored as strings in the executable file. When an dyld_stub_objc_msgSend is called, it takes this selector as one of its arguments. Specifically, stuff that winds up landing at 4(%esp) ends up being our selector.

So, let's start looking at what gets put there. As you probably noticed, a few instructions above the call we have %eax land there. And one instruction before that, we have 268094972(%ebx) land in %eax. With just this knowledge, we can find our selector using this handy command:


(gdb) x/s *($ebx+268094972)
0x4ff0f04 <__FUNCTION__.100691+1516>: "CGLPixelFormatObj"


As you can see, we examine the data located where the program loads the selector from, and we can see our selector as text. Simple as that!

With a bit of prodding, it's also possible to grab other data types using similar methods. However, there isn't a simple rule to extract data, so you might need to play with it some to get it to work. I'll try to explain this more in a future article.

Sunday, October 7, 2007

gdb-jutsu

gdb is a debugger used to find (and hopefully fix!) bugs in programs you're writing. However, it also works on programs you aren't writing (i.e. you don't have the source code, and thus probably can't fix bugs without some super heavy lifting). Since I've normally used it with C/C++ programs, I'm more familiar with those usages than with my current Objective-C usage, which has some different semantics.

The coolest thing ever that I discovered was this (brace yourself):
(gdb) call (char*)[0x3018 UTF8String]
2007-09-09 21:32:23.311 paramsTest[4759] *** _NSAutoreleaseNoPool(): Object 0x4081f0 of class NSCFData autoreleased with no pool in place - just leaking
$4 = 0x408210 "coooool!"


Ignoring the "2007-09... - just leaking" part (which is a message from the program itself, not from gdb), we have this:

(gdb) call (char*)[0x3018 UTF8String]
$4 = 0x408210 "coooool!"


The reason why this is incomprehensibly cool is this: the little 0x3018. Normally you'd have to do something like this: [someString UTF8String] to get useful information. But if you don't have the name of "someString", that's kind of difficult. But 0x3018, that's the address of the object, and anyone can get an object's address. Of course, internally they're the exact same thing, so it's not really doing anything magical. It's just the simplicity with which it works that interests me.

(The syntax here is something like this: [{object} {method}] where object is some object you're doing something with, and method is what the "something" you're doing. In this example, the object would be "someString", and the method would be "UTF8String" which provides us with the UTF8-encoded version of the contents of the string. In English, we're asking the string to tell us what it looks like if we printed it on screen.)

The coolness continues: what if we don't even have the address of the object, but know which argument it is to the current method? This is a pretty typical situation when we're breaking on a certain method's invocation. In this case, we have to use our trusty rusty stack pointer!

In Objective-C, our stack has extra data tossed in for the ObjectiveC runtime. In contrast, C/C++ arguments pretty much correspond 1-to-1 with items on the stack. In Objective-C, ebp+0 is our stack frame or something? I don't really know what it points to, but it's somewhere on the stack. ebp+4 is our return address, ebp+8 is our object's self pointer and ebp+12 is our selector, or method. Our first argument starts at ebp+16, and from there each argument is +4 more. This means arg2 would be at ebp+20, etc. In this example, we'll want to inspect the second argument. we also don't want to fiddle around with extracting the address manually. That's what the computer is there for anyway, isn't it? Let's see a demo:

(gdb) call (char*)[*(int)($ebp+20) UTF8String]
$37 = 0x407b10 "coooool!"


Woah, the same thing! you can see the ebp+20 in there too, if you look closely.

A few data points about that command. "(char*)" at the beginning is required to tell gdb how to handle the result. char* basically means "string" in this case. Our object's address is a nightmareish glob of mess: "*(int)($ebp+20)" the * at the beginning tells us to dereference the pointer we're going to feed it. This is used because our data on the stack is actually a pointer to another pointer.

Next we have "(int)" which you may recognize as a typecast. we do this to tell gdb to interpret the data as an integer (as opposed to a floating point value, a string, a byte, etc). Without this, gdb doesn't know how to calculate the address because it doesn't know how much data you're dealing with.

Finally, the understandable "($ebp+20)". In gdb, registers are prefixed with dollar-signs to keep them separate from gdb keywords and in-program variables. The whole thing is in parenthesis so that the typecast above applies to both, not just $ebp. Without it, we'll generate an incorrect address and get a fun warning like Cannot access memory at address 0x2f737467.

If you somehow forget which method you're in, or the object does something ugly with message passing, you can find out what selector you're in with this: (gdb) x/s *(int)($ebp+12) which will then tell you something like this:
0x2fac : "setStuff:otherThings:andStr:"setStuff:otherThings:andStr: is our terribly-named example selector for this exercise :)

So there you have it. Actually, there I have it too, since I'll probably use this for reference in the future (I always forget nuggets like this when I need them). Now we can manipulate objects at our whim in any running Objective-C program without having a single line of source code. Pretty dangerous :)

Categories