Filter non-ASCII characters
As part of the compilation process of the source program, the processor calls the C compiler. When you develop source code that contains non-ASCII characters, the way that the C compiler handles such characters can affect the success of the compilation process.
- Multibyte characters might contain C-language tokens.
A component of a multibyte character might be indistinguishable from some single-byte characters such as percent ( % ), comma ( , ), backslash ( \ ), and double quotation mark ( " ) characters. If such characters are included in a quoted string, the C compiler might interpret them as C-language tokens, which can cause compilation errors or even lost characters.
- The C compiler might not be 8-bit clean.
If a code set contains non-ASCII characters (with code values that are greater than 127), the C compiler must be 8-bit clean to interpret the characters. To be 8-bit clean, a compiler must read the eighth bit as part of the code value; it must not ignore or put its own interpretation on the meaning of this eighth bit.
stcopy("A1A2A3", dest);
stcopy("\160\042\244", dest); /* correct interpretation */
stcopy("A1"A3, dest); /* incorrect interpretation of A2 */
The C compiler would generate an error for the preceding line because the line has terminated the string argument incorrectly. The esqlmf utility also handles the 8-bit-clean situation because it prevents the C compiler from ignoring the eighth bit of the A3 byte. If the compiler ignores the eighth bit, it incorrectly interprets A3 (octal \244) as octal \044.