· Removing program comments by substituting a single white space for each comment.
· Performing the file inclusion (#include) and conditional compilation (#ifdef, etc.) commands as it encounters them.
· ‘Learning’ the macros introduced by #define. It compares these names against the identifiers in the program, and does a substitution when it finds a match.
The preprocessor performs very minimal error checking of the preprocessing instructions. Because it operates at a text level, it is unable to check for any sort of language-level syntax errors. This function is performed by the compiler.
Programmer instructions to the preprocessor (called directives) take the general form:
# directive tokens
The # symbol should be the first non-blank character on the line (i.e., only spaces and tabs may appear before it). Blank symbols may also appear between the # and directive. The following are therefore all valid and have exactly the same effect:
#define size 100
#define size 100
# define size 100
A directive usually occupies a single line. A line whose last non-blank character is \, is assumed to continue on the line following it, thus making it possible to define multiple line directives. For example, the following multiple line and single line directives have exactly the same effect:
#define CheckError \
if (error) \
exit(1)
#define CheckError if (error) exit(1)
A directive line may also contain comment; these are simply ignored by the preprocessor. A # appearing on a line on its own is simply ignored.
Preprocessor directives.
|
Directive |
Explanation |
|
#define |
Defines a macro |
|
#undef |
Undefines a macro |
|
#include |
Textually includes the contents of a file |
|
#ifdef |
Makes compilation of code conditional on a macro being defined |
|
#ifndef |
Makes compilation of code conditional on a macro not being defined |
|
#endif |
Marks the end of a conditional compilation block |
|
#if |
Makes compilation of code conditional on an expression being nonzero |
|
#else |
Specifies an else part for a #ifdef, #ifndef, or #if directive |
|
#elif |
Combination of #else and #if |
|
#line |
Change current line number and file name |
|
#error |
Outputs an error message |
|
#pragma |
Is implementation-specific |
Macro Definition
Macros are defined using the #define directive, which takes two forms: plain and parameterized. A plain macro has the general form:
#define identifier tokens
It instructs the preprocessor to substitute tokens for every occurrence of identifier in the rest of the file (except for inside strings). The substitution tokens can be anything, even empty (which has the effect of removing identifier from the rest of the file).
Plain macros are used for defining symbolic constants. For example:
#define size 512
#define word long
#define bytes sizeof(word)
Because macro substitution is also applied to directive lines, an identifier defined by one macro can be used in a subsequent macro (e.g., use of word in bytes above). Given the above definitions, the code fragment
word n = size * bytes;
is macro-expanded to:
long n = 512 * sizeof(long);
Use of macros for defining symbolic constants has its origins in C, which had no language facility for defining constants. In C++, macros are less often used for this purpose, because consts can be used instead, with the added benefit of proper type checking.
A parameterized macro has the general form
#define identifier(parameters) tokens
where parameters is a list of one or more comma-separated identifiers. There should be no blanks between the identifier and (. Otherwise, the whole thing is interpreted as a plain macro whose substitution tokens part starts from (. For example,
#define Max(x,y) ((x) > (y) ? (x) : (y))
defines a parameterized macro for working out the maximum of two quantities.
A parameterized macro is matched against a call to it, which is syntactically very similar to a function call. A call must provide a matching number of arguments. As before, the tokens part of the macro is substituted for the call. Additionally, every occurrence of a parameter in the substituted tokens is substituted by the corresponding argument. This is called macro expansion. For example, the call
n = Max (n - 2, k +6);
is macro-expanded to:
n = (n - 2) > (k + 6) ? (n - 2) : (k + 6);
Note that the ( in a macro call may be separated from the macro identifier by blanks.
It is generally a good idea to place additional brackets around each occurrence of a parameter in the substitution tokens (as we have done for Max). This protects the macro against undesirable operator precedence effects after macro expansion.
Overlooking the fundamental difference between macros and functions can lead to subtle programming errors. Because macros work at a textual level, the semantics of macro expansion is not necessarily equivalent to function call. For example, the macro call
Max(++i, j)
is expanded to
((++i) > (j) ? (++i) : (j))
which means that i may end up being incremented twice. Where as a function version of Max would ensure that i is only incremented once.
Two facilities of C++ make the use of parameterized macros less attractive than in C. First, C++ inline functions provide the same level of code efficiency as macros, without the semantics pitfalls of the latter. Second, C++ templates provide the same kind of flexibility as macros for defining generic functions and classes, with the added benefit of proper syntax analysis and type checking.
Macros can also be redefined. However, before a macro is redefined, it should be undefined using the #undef directive. For example:
#undef size
#define size 128
#undef Max
Use of #undef on an undefined identifier is harmless and has no effect.
Quote and Concatenation Operators
The preprocessor provides two special operators or manipulating macro parameters. The quote operator (#) is unary and takes a macro parameter operand. It transforms its operand into a string by putting double-quotes around it.
For example, consider a parameterized macro which checks for a pointer to be nonzero and outputs a warning message when it is zero:
#define CheckPtr(ptr) \
if ((ptr) == 0) cout << #ptr << " is zero!\n"
Use of the # operator allows the expression given as argument to CheckPtr to be literally printed as a part of the warning message. Therefore, the call
CheckPtr(tree->left);
is expanded as:
if ((tree->left) == 0) cout << "tree->left" << " is zero!\n";
Note that defining the macro as
#define CheckPtr(ptr) \
if ((ptr) == 0) cout << "ptr is zero!\n"
would not produce the desired effect, because macro substitution is not performed inside strings.
The concatenation operator (##) is binary and is used for concatenating two tokens. For example, given the definition
#define internal(var) internal##var
the call
long internal(str);
expands to:
long internalstr;
This operator is rarely used for ordinary programs. It is very useful for writing translators and code generators, as it makes it easy to build an identifier out of fragments.
File Inclusion
A file can be textually included in another file using the #include directive. For example, placing
#include "constants.h"
inside a file f causes the contents of contents.h to be included in f in exactly the position where the directive appears. The included file is usually expected to reside in the same directory as the program file. Otherwise, a full or relative path to it should be specified. For example:
#include "../file.h" // include from parent dir (UNIX)
#include "/usr/local/file.h" // full path (UNIX)
#include "..\file.h" // include from parent dir (DOS)
#include "\usr\local\file.h" // full path (DOS)
When including system header files for standard libraries, the file name should be enclosed in <> instead of double-quotes. For example:
#include <iostream.h>
When the preprocessor encounters this, it looks for the file in one or more prespecified locations on the system (e.g., the directory /usr/include/cpp on a UNIX system). On most systems the exact locations to be searched can be specified by the user, either as an argument to the compilation command or as a system environment variable.
File inclusions can be nested. For example, if a file f includes another file g which in turn includes another file h, then effectively f also includes h.
Although the preprocessor does not care about the ending of an included file (i.e., whether it is .h or .cpp or .cc, etc.), it is customary to only include header files in other files.
Multiple inclusion of files may or may not lead to compilation problems. For example, if a header file contains only macros and declarations then the compiler will not object to their reappearance. But if it contains a variable definition, for example, the compiler will flag it as an error. The next section describes a way of avoiding multiple inclusions of the same file.
The conditional compilation directives allow sections of code to be selectively included for or excluded from compilation, depending on programmer-specified conditions being satisfied. It is usually used as a portability tool for tailoring the program code to specific hardware and software architectures.
General forms of conditional compilation directives.|
Form |
Explanation |
|
#ifdef identifier code #endif |
If identifier is a #defined symbol then code is included in the compilation process. Otherwise, it is excluded. |
|
#ifndef identifier code #endif |
If identifier is not a #defined symbol then code is included in the compilation process. Otherwise, it is excluded. |
|
#if expression code #endif |
If expression evaluates to nonzero then code is included in the compilation process. Otherwise, it is excluded. |
|
#ifdef identifier code1 #else code2 #endif |
If identifier is a #defined symbol then code1 is included in the compilation process and code2 is excluded. Otherwise, code2 is included and code1 is excluded. Similarly, #else can be used with #ifndef and #if. |
|
#if expression1 code1 #elif expression2 code2 #else code3 #endif |
If expression1 evaluates to nonzero then only code1 is included in the compilation process. Otherwise, if expression2 evaluates to nonzero then only code2 is included. Otherwise, code3 is included. As before, the #else part is optional. Also, any number of #elif directives may appear after a #if directive. |
Here are two simple examples:
// Different application start-ups for beta and final version:
#ifdef BETA
DisplayBetaDialog();
#else
CheckRegistration();
#endif
// Ensure Unit is at least 4 bytes wide:
#if sizeof(int) >= 4
typedef int Unit;
#elif sizeof(long) >= 4
typedef long Unit;
#else
typedef char Unit[4];
#endif
One of the common uses of #if is for temporarily omitting code. This is often done during testing and debugging when the programmer is experimenting with suspected areas of code. Although code may also be omitted by commenting its out (i.e., placing /* and */ around it), this approach does not work if the code already contains /*...*/ style comments, because such comments cannot be nested.
Code is omitted by giving #if an expression which always evaluates to zero:
#if 0
...code to be omitted
#endif
The preprocessor provides an operator called defined for use is expression arguments of #if and #elif. For example,
#if defined BETA
has the same effect as:
#ifdef BETA
However, use of defined makes it possible to write compound logical expressions. For example:
#if defined ALPHA || defined BETA
Conditional compilation directives can be used to avoid the multiple of inclusion of files. For example, given an include file called file.h, we can avoid multiple inclusions of file.h in any other file by adding the following to file.h:
#ifndef _file_h_
#define _file_h_
contents of file.h goes here
#endif
When the preprocessor reads the first inclusion of file.h, the symbol _file_h_ is undefined, hence the contents is included, causing the symbol to be defined. Subsequent inclusions have no effect because the #ifndef directive causes the contents to be excluded.
The preprocessor provides three other, less-frequently-used directives. The #line directive is used to change the current line number and file name. It has the general form:
#line number file
where file is optional. For example,
#line 20 "file.h"
makes the compiler believe that the current line number is 20 and the current file name is file.h. The change remains effective until another #line directive is encountered. The directive is useful for translators which generate C++ code. It allows the line numbers and file name to be made consistent with the original input file, instead of any intermediate C++ file.
The #error directive is used for reporting errors by the preprocessor. It has the general form
#error error
where error may be any sequence of tokens. When the preprocessor encounters this, it outputs error and causes compilation to be aborted. It should therefore be only used for reporting errors which make further compilation pointless or impossible. For example:
#ifndef UNIX
#error This software requires the UNIX OS.
#endif
The #pragma directive is implementation-dependent. It is used by compiler vendors to introduce nonstandard preprocessor features, specific to their own implementation. Examples from the SUN C++ compiler include:
// align name and val starting addresses to multiples of 8 bytes:
#pragma align 8 (name, val)
char name[9];
double val;
// call MyFunction at the beginning of program execution:
#pragma init (MyFunction)
Predefined Identifiers
The preprocessor provides a small set of predefined identifiers which denote useful information.
Standard predefined identifiers.|
Identifier |
Denotes |
|
__FILE__ |
Name of the file being processed |
|
__LINE__ |
Current line number of the file being processed |
|
__DATE__ |
Current date as a string (e.g., "25 Dec 1995") |
|
__TIME__ |
Current time as a string (e.g., "12:30:55") |
The predefined identifiers can be used in programs just like program constants. For example,
#define Assert(p) \
if (!(p)) cout << __FILE__ << ": assertion on line " \
<< __LINE__ << " failed.\n"
defines an assert macro for testing program invariants. Assuming that the sample call
Assert(ptr != 0);
appear in file prog.cpp on line 50, when the stated condition fails, the following message is displayed:
prog.cpp: assertion on line 50 failed.