C Macro Tips and Tricks
The year is almost over, but there's time for one last Friday Q&A before 2011 comes around. For today's post, fellow Amoeba Dan Wineman suggested that I discuss tricks for writing macros in C. Preprocessor vs Compiler The preprocessor runs first, as the name implies. It performs some simple textual manipulations, such as:
Note that the preprocessor largely has no understanding of the text that it processes. There are some exceptions to this. For example, it knows that this is a string, and so does not expand the macro inside it: #define SOMETHING hello
char *str = "SOMETHING, world!" // nope
#define ONEARG(x) NSLog x
ONEARG((@"hello, %@", @"world"));
#if to check whether a type is defined or not:
// makes no sense
#ifndef MyInteger
typedef int MyInteger
#endif
#ifndef always comes out true even if the MyInteger type is already defined. Type definitions are evaluated as part of the compilation phase, which hasn't even happened yet.
Likewise, there is no need for the contents of a #define STARTLOG NSLog(@
#define ENDLOG , @"testing");
STARTLOG "just %@" ENDLOG
STARTLOG and ENDLOG with their definitions. By the time the compiler comes along to try to make sense of this code, it actually does make sense, and so it compiles as valid code.
A Word of Warning The C preprocessor is nearly Turing-complete. With a simple driver, you can compute any computable function using the preprocessor. However, the contortions required to do this are so bizarre and difficult that they make Turing-complete C++ templates look simple by comparison. While powerful, they are also very simple. Since macro expansion is a simple textual process, there are pitfalls. For example, operator precedence can be dangerous: #define ADD(x, y) x+y
// produces 14, not 20
ADD(2, 3) * 4;
#define MULT(x, y) x*y
// produces 14, not 20
MULT(2 + 3, 4);
Evaluating a macro argument multiple times can also lead to unexpected results: #define MAX(x, y) ((x) > (y) ? (x) : (y))
int a = 0;
int b = 1;
int c = MAX(a++, b++);
// now a = 1, c = 1, and b = 3!
// (a++ > b++ ? a++ : b++)
// b++ gets evaluated twice
Macro Debugging To reduce confusion, you'll want to look at the file as it appears after preprocessing. This means all of your macros are expanded, and you can see the raw C code that the compiler sees, rather than trying to expand the macro in your head. In Xcode you can do this by selecting Build->Preprocess. The resulting file will generally be very large due to all of the Multi-Statement Macros #define TIME(name, lastTimeVariable) NSTimeInterval now = [[NSProcessInfo processInfo] systemUptime]; if(lastTimeVariable) NSLog(@"%s: %f seconds", name, now - lastTimeVariable); lastTimeVariable = now
- (void)calledALot
{
// do some work
// time it
TIME("calledALot", _calledALotLastTimeIvar);
}
#define is terminated at the end of the line, but by putting \ at the end of the line, you can make the preprocessor continue the definition on the next line:
#define TIME(name, lastTimeVariable) NSTimeInterval now = [[NSProcessInfo processInfo] systemUptime]; if(lastTimeVariable) NSLog(@"%s: %f seconds", name, now - lastTimeVariable); lastTimeVariable = now
- (void)calledALot
{
if(...) // only time some calls
TIME("calledALot", _calledALotLastTimeIvar);
}
- (void)calledALot
{
if(...) // only time some calls
NSTimeInterval now = [[NSProcessInfo processInfo] systemUptime];
if(_calledALotLastTimeIvar)
NSLog(@"%s: %f seconds", name, now - _calledALotLastTimeIvar);
_calledALotLastTimeIvar = now;
}
NSTimeInterval now in the if statement is illegal. Even if that worked, only the first statement is subject to the if , and the following lines would run regardless. Not what we wanted!
This can be solved by putting brackets around the macro definition: #define TIME(name, lastTimeVariable) { NSTimeInterval now = [[NSProcessInfo processInfo] systemUptime]; if(_calledALotLastTimeIvar) NSLog(@"%s: %f seconds", name, now - _calledALotLastTimeIvar); _calledALotLastTimeIvar = now; }
- (void)calledALot
{
if(...) // only time some calls
{
NSTimeInterval now = [[NSProcessInfo processInfo] systemUptime];
if(lastTimeVariable)
NSLog(@"%s: %f seconds", name, now - lastTimeVariable);
lastTimeVariable = now;
};
}
In fact, this is a problem. Consider this code: - (void)calledALot
{
if(...) // only time some calls
TIME("calledALot", _calledALotLastTimeIvar);
else // otherwise do something else
// stuff
}
- (void)calledALot
{
if(...) // only time some calls
{
NSTimeInterval now = [[NSProcessInfo processInfo] systemUptime];
if(_calledALotLastTimeIvar)
NSLog(@"%s: %f seconds", name, now - _calledALotLastTimeIvar);
_calledALotLastTimeIvar = now;
};
else // otherwise do something else
// stuff
}
You could work around this by requiring the user of the macro not to put a semicolon at the end. However, this is highly unnatural and tends to mess with things like automatic code indenting. A better way to fix it is to wrap the function in a #define TIME(name, lastTimeVariable) do { NSTimeInterval now = [[NSProcessInfo processInfo] systemUptime]; if(lastTimeVariable) NSLog(@"%s: %f seconds", name, now - lastTimeVariable); lastTimeVariable = now; } while(0)
if statement and in all other situations. A multi-statement macro should always be wrapped in do ... while(0) for this reason.
This macro defines a variable called NSTimeInterval now; // ivar
TIME("whatever", now);
Unfortunately, C does not have a good way to generate unique variable names for this use. The best thing to do is to add a prefix, like you do with Objective-C class names: #define TIME(name, lastTimeVariable) do { NSTimeInterval MA_now = [[NSProcessInfo processInfo] systemUptime]; if(lastTimeVariable) NSLog(@"%s: %f seconds", name, MA_now - lastTimeVariable); lastTimeVariable = MA_now; } while(0)
String Concatenation char *helloworld = "hello, " "world!";
// equivalent to "hello, world!"
NSString *helloworld = @"hello, " @"world!";
NSString *helloworld = @"hello, " "world!";
#define COM_URL(domain) [NSURL URLWithString: @"http://www." domain ".com"];
COM_URL("google"); // gives http://www.google.com
COM_URL("apple"); // gives http://www.apple.com
By placing a # in front of a parameter name, the preprocessor will turn the contents of that parameter into a C string. For example:
#define TEST(condition) do { if(!(condition)) NSLog(@"Failed test: %s", #condition); } while(0)
TEST(1 == 2);
// logs: Failed test: 1 == 2
#define WITHIN(x, y, delta) (fabs((x) - (y)) < delta)
TEST(WITHIN(1.1, 1.2, 0.05));
// logs: Failed test: WITHIN(1.1, 1.2, 0.05)
#define STRINGIFY(x) #x
#define TEST(condition) do { if(!(condition)) NSLog(@"Failed test: %s", STRINGIFY(condition)); } while(0)
TEST(WITHIN(1.1, 1.2, 0.05));
// logs: Failed test: (fabs(1.1 - 1.2) < 0.05)
Token Pasting #define NSify(x) NS ## x
NSify(String) *s; // gives NSString
NSMutableArray :
#define ARRAY_ACCESSORS(capsname, lowername) - (NSUInteger)countOf ## capsname { return [lowername count]; } - (id)objectIn ## capsname ## AtIndex: (NSUInteger)index { return [lowername objectAtIndex: index]; } - (void)insertObject: (id)obj in ## capsname ## AtIndex: (NSUInteger)index { [lowername insertObject: obj atIndex: index]; } - (void)removeObjectFrom ## capsname ## AtIndex: (NSUInteger)index { [lowername removeObjectAtIndex: index]; }
// instance variable
NSMutableArray *thingies;
// in @implementation
ARRAY_ACCESSORS(Thingies, thingies)
Like the stringify operator, the concatenation operator won't evaluate macros passed to it without an extra level of indirection: #define ARRAY_NAME thingies
#define ARRAY_NAME_CAPS Thingies
// incorrectly creates accessors for "ARRAY_NAME_CAPS"
ARRAY_ACCESSORS(ARRAY_NAME_CAPS, ARRAY_NAME)
#define CONCAT(x, y) x ## y
// define ARRAY_ACCESSORS using CONCAT, and the above works
Variable Argument Lists #define LOG(string) do { if(gLoggingEnabled) NSLog(@"Conditional log: %s", string); } while(0)
LOG("hello");
Conditional log: hello
NSLog takes a format string and variable arguments. It would be really useful if LOG could do the same:
LOG("count: %d name: %s", count, name);
If you place the magic #define LOG(...) do { if(gLoggingEnabled) NSLog(@"Conditional log: " __VA_ARGS__); } while(0)
#define LOG(fmt, ...) do { if(gLoggingEnabled) NSLog(@"Conditional log: --- " fmt " ---", __VA_ARGS__); } while(0)
LOG("hello") , the NSLog line expands to:
NSLog(@"Conditional log: --- " "hello" " ---", );
To avoid this problem in a completely portable way, you have to go back to taking one parameter, and do fancier tricks. For example, you might construct the user-provided string separately, then combine it into the log: NSString *MA_logString = [NSString stringWithFormat: __VA_ARGS__]; NSLog(@"Conditional log: --- %@ ---", MA_logString);
## operator between the trailing comma and __VA_ARGS__ , the preprocessor will eliminate the trailing comma in the case that no variable arguments are provided:
#define LOG(fmt, ...) do { if(gLoggingEnabled) NSLog(@"Conditional log: --- " fmt " ---", ## __VA_ARGS__); } while(0)
Magic Identifiers
As an example, consider this logging macro: #define LOG(fmt, ...) NSLog(fmt, ## __VA_ARGS__)
#define LOG(fmt, ...) NSLog(@"%s:%d (%s): " fmt, __FILE__, __LINE__, __func__, ## __VA_ARGS__)
LOG("something happened");
MyFile.m:42 (MyFunction): something happened
LOG statements throughout your code and the log output will automatically contain the file name, line number, and function name of where each log statement was placed.
Compound Literals The syntax for compound literals is a bit odd, but not hard. It looks like this: (type){ initializer }
// regular variable and initializer
NSPoint p = { 1, 100 };
DoSomething(p);
// compound literal
DoSomething((NSPoint){ 1, 100 });
NSArray *array = [NSArray arrayWithObjects: (id []){ @"one", @"two", @"three" } count: 3];
#define ARRAY(num, ...) [NSArray arrayWithObjects: (id []){ __VA_ARGS__ } count: num]
NSArray *array = ARRAY(3, @"one", @"two", @"three");
As you probably know, the #define ARRAY(...) [NSArray
arrayWithObjects: (id []){ __VA_ARGS__ }
count: sizeof((id []){ __VA_ARGS__ }) / sizeof(id)]
NSArray *array = ARRAY(@"one", @"two", @"three");
#define IDARRAY(...) (id []){ __VA_ARGS__ }
#define IDCOUNT(...) (sizeof(IDARRAY(__VA_ARGS__)) / sizeof(id))
#define ARRAY(...) [NSArray arrayWithObjects: IDARRAY(__VA_ARGS__) count: IDCOUNT(__VA_ARGS__)]
Let's make a similar one for dictionaries. #define DICT(...) DictionaryWithIDArray(IDARRAY(__VA_ARGS__), IDCOUNT(__VA_ARGS__) / 2)
NSDictionary to create the dictionary:
NSDictionary *DictionaryWithIDArray(id *array, NSUInteger count)
{
id keys[count];
id objs[count];
for(NSUInteger i = 0; i < count; i++)
{
keys[i] = array[i * 2];
objs[i] = array[i * 2 + 1];
}
return [NSDictionary dictionaryWithObjects: objs forKeys: keys count: count];
}
NSDictionary *d = DICT(@"key", @"value", @"key2", @"value2");
typeof This is a gcc extension, not part of standard C, but it's extremely useful. It works like sizeof , except instead of providing the size, it provides the type. If you give it an expression, it evaluates to the type of that expression. If you give it a type, it just regurgitates that type.
Note that for maximum compatibility, it's best to write it as Let's take a look at that faulty #define MAX(x, y) ((x) > (y) ? (x) : (y))
#define MAX(x, y) (^{ int my_localx = (x); int my_localy = (y); return my_localx > my_localy ? (my_localx) : (my_localy); }())
int , the macro doesn't work correctly for float , long long , or other types that don't quite fit.
Using #define MAX(x, y) (^{ __typeof__(x) my_localx = (x); __typeof__(y) my_localy = (y); return my_localx > my_localy ? (my_localx) : (my_localy); }())
__typeof__ is a purely compile-time construct, the extra use of the macro parameters does not cause them to be evaluated twice. You can use a similar trick to create a pointer to any value you want:
#define POINTERIZE(x) ((__typeof__(x) []){ x })
NSValue object:
#define BOX(x) [NSValue valueWithBytes: POINTERIZE(x) objCType: @encode(__typeof__(x))]
gcc provides two built-in functions which can be useful for building macros.
The first is The second is This allows you to write macros which do different things depending on the type of the argument. As an example, here's a macro which turns an expression into an // make the compiler treat x as the given type no matter what
#define FORCETYPE(x, type) *(type *)(__typeof__(x) []){ x }
#define STRINGIFY(x) __builtin_choose_expr( __builtin_types_compatible_p(__typeof__(x), NSRect), NSStringFromRect(FORCETYPE(x, NSRect)), __builtin_choose_expr( __builtin_types_compatible_p(__typeof__(x), NSSize), NSStringFromSize(FORCETYPE(x, NSSize)), __builtin_choose_expr( __builtin_types_compatible_p(__typeof__(x), NSPoint), NSStringFromPoint(FORCETYPE(x, NSPoint)), __builtin_choose_expr( __builtin_types_compatible_p(__typeof__(x), SEL), NSStringFromSelector(FORCETYPE(x, SEL)), __builtin_choose_expr( __builtin_types_compatible_p(__typeof__(x), NSRange), NSStringFromRange(FORCETYPE(x, NSRange)), [NSValue valueWithBytes: (__typeof__(x) []){ x } objCType: @encode(__typeof__(x))] )))))
FORCETYPE macro. Even though the code branch to follow is chosen at compile time, unused branches still have to be valid code. The compiler won't accept NSStringFromRect(42) even though that branch will never be chosen. By pointerizing the value and then casting it before dereferencing it, it ensures that the code will compile. The cast is invalid for everything but the one branch that is taken, but it doesn't need to be valid for any of the others anyway.
X-Macros #define MY_ENUM MY_ENUM_MEMBER(kStop) MY_ENUM_MEMBER(kGo) MY_ENUM_MEMBER(kYield)
// create the actual enum
enum MyEnum {
#define MY_ENUM_MEMBER(x) x,
MY_ENUM
#undef MY_ENUM_MEMBER
};
// stringification
const char *MyEnumToString(enum MyEnum value)
{
#define MY_ENUM_MEMBER(x) if(value == (x)) return #x;
MY_ENUM
#undef MY_ENUM_MEMBER
}
// destringification
enum MyEnum MyEnumFromString(const char *str)
{
#define MY_ENUM_MEMBER(x) if(strcmp(str, #x) == 0) return x;
MY_ENUM
#undef MY_ENUM_MEMBER
// default value
return -1;
}
Conclusion That's it for 2010. Come back next year (in two weeks) for the next Friday Q&A. As always, if you have a topic that you would like to see covered here, send it in! Did you enjoy this article? I'm selling a whole book full of them. It's available for iBooks and Kindle, plus a direct download in PDF and ePub format. It's also available in paper for the old-fashioned. Click here for more information.
Comments: Jens Ayton at 2011-01-01 00:03:07:
Have you seen Conal Elliott’s argument that C programming is purely functional – it really consists of writing purely functional cpp code which expresses side-effectful operations in the C monad. http:///blog/posts/the-c-language-is-purely-functional/
I’ve been seeing X-macros a lot recently while trying to get SpiderMonkey building in a sane (by OS X standards) way. They’re used for obvious things like keywords and interpreter opcodes, but also for error message tables. A discussion of macros without mention of the GCC ({ ...statements...; expr; }) construct?
You could've used it for MAX and avoided the (less portable!) block, and improved your generated code at the same time. Jens Ayton: Well, that post is mildly amusing, I'll give him that.
OSC: I debated whether to include the ({}) construct, and ultimately decided not to. While portability for that construct is indeed better at the moment, it's incredibly special-purpose, not good for much else than writing macros. Semantically, the block technique does the exact same thing using a more general-purpose (and, I hope, longer-lived) construct. The poor generated code is unfortunate, but I hope that the compilers will eventually learn to optimize out such direct block invocations.When you talked about X-Macros you could also talk about how you can use #define + #include to declare reusable terms into one separate file.
For your TIME() macro, why not use token pasting to create a more-unique variable name?
NSTimeInterval lastTimeVariable##_now = ... I find the
({}) construct rather useful even outside of macros, generally when I need a throwaway variable that I want to scope nicely (I use this often when tweaking the .frame property of views). I'm surprised you made no mention of __PRETTY_FUNCTION__ when talking about the similar identifiers. It's especially helpful for Obj-C.Your MAX macros aren't quite right. It should be:
Also, I prefer to use another GCC extension (statements as expressions), which avoids using blocks:
Vasi: That's a good idea, and will make things more readable in the debugger too.
Kevin Ballard: Why can't you just use a completely standard block to get throwaway variables? Chris Suter: Thanks for the MAX tip, how embarrassing. Fixed now. For statement expressions, I discussed my reasons for preferring blocks in my previous comment, but it's not a particularly strong preference. I would like to mention an other GCC (and clang) extension useful for people like me that never managed to remember correctly the __VA_ARGS__ keyword.
You can define a name for the '…' argument. For example, if you want to use 'args' instead of __VA_ARGS__, you can define your macro like this: #define IDARRAY(args...) (id []){ args } And something you should keep in mind when you want to write macro: wherever you can use an inline function instead of a macro, choose the former. There are as fast as a macro (http://gcc./onlinedocs/gcc/Inline.html), but don't have all the pitfalls described in this article, and as they provide more informations to the compiler (type of argument), they are easier to 'debug' too. The MIN, MAX, and ABS macro definitions on OS X don't have the multiple expression evaluation problem unless you've defined STRICT_ANSI, IIRC.
Psy beat me to the punch, but related to multi-line macros and X-Macros, you have the (to use with moderation!) #define + #include technique to create mega-macros: if the textual code you want to reuse is more than 20 lines long, creating a multi-line macro begins to become impractical, what with the backslash at the end of each line and lack of syntax highlighting, in this case it's better to put the code in a separate file (ending in .i or whatever you prefer) and include it multiple times.
I've used this exactly once, when I had code that took an array of booleans and (among other things) called a function on it and then deinterlaced the elements, and I wanted to reuse that code to process floats instead, with the function varying only by name (what could make me switch from bools to floats? Simple: going from hard decoding to soft decoding in a channel coding simulation). How to avoid duplicating that code (I wanted to have both available at runtime)? You can't have a function that can manipulate either bools or floats, the only way I found was to use a mega-macro. @Pierre: much as I hate to suggest such a thing, that sounds like a case where C++ templates can actually do something useful in a sane way. :-)
Jens: I know; this was in a pure C project. Though I think templates are a completely overblown solution (what with their linkage and various other semantics) for that and the other problems they're trying to solve.
Interesting writeup. One thing to watch out for in portable code, though: some versions of xlc on AIX *will* expand macros in double-quoted strings. It depends on your flags, I believe.
(followup to 19:15:01 post): This means, of course, that you shouldn't define macros with names like "a" :)
(followup to 19:15:01 post): This means, of course, that you shouldn't define macros with names like "a" or "the" :)
C macros had their place once, when machines were slow and the C language was lacking features. For me, I find that they are ugly and much more trouble than they are worth. I'd rather write a function, or use const.
The builtin macros such as __LINE__ and __FILE__ are useful for building assert statements. Of course, I wouldn't try to discourage someone else from using macros (unless I was going to be working with them). Hank G.: I agree that functions should be favored over macros, when it is possible to use both. However, macros can do a lot of things that functions simply cannot do at all, and in a case like that, they can be invaluable.
The DictionaryWithIDArray function has switched the roles of objects and keys. The lines in the for loop should be:
objs[i] = array[i * 2]; keys[i] = array[i * 2 + 1]; ... or at least if we are going with the language "dictionary with objects and keys", hence objects are listed before keys. I consider the
object, key ordering to be broken. It's backwards from how other modern languages do it and IMO confusing as a result. Switching the order for this macro was deliberate. Of course, if you prefer it the other way around, it's an easy change as you note.One interesting fact about the C preprocessor is that it works between the lexer and the parser. It has knowledge of tokens (it knows a string literal is a string literal and a comment is a comment), and defines a very limited grammar.
#define MAX(x, y) ((x) > (y) ? (x) : (y))
int a = 0; int b = 1; int c = MAX(a++, b++); // now a = 1, c = 1, and b = 3! // (a++ > b++ ? a++ : b++) // b++ gets evaluated twice I think the value of c is 2, and I have verified in my computer(fedora 14,i686) Good point. I think that the value of c is actually undefined, and could be either 1 or 2, depending on how your particular compiler decides to do things. In any case, your result of 2 does not surprise me now that you mention it.
Not sure why I didn't post this comment when I first read this entry, but better late than never!
For simple multi-statement macros I just use the comma operator instead of do {} while(0) - probably for no good reason other than it's shorter. Eg: #define DO_TWO_THINGS do_one_thing(), do_the_other_thing That works in most cases I think - have I missed anything obvious? You can't use the comma operator in cases where you need to use loops, if statements, variable declarations, etc., but I see no reason why you couldn't use it for simple cases like you describe.
an improvement to you LOG macro:
this will shorten the file name to just the files name and not the path included and will get rid of all the NSLog prefix stuff |
|