Filter non-ASCII characters

As part of the compilation process of the source program, the processor calls the C compiler. When you develop source code that contains non-ASCII characters, the way that the C compiler handles such characters can affect the success of the compilation process.

In particular, the following situations might affect compilation of your program:

Multibyte characters might contain C-language tokens.
A component of a multibyte character might be indistinguishable from some single-byte characters such as percent ( % ), comma ( , ), backslash ( \ ), and double quotation mark ( " ) characters. If such characters are included in a quoted string, the C compiler might interpret them as C-language tokens, which can cause compilation errors or even lost characters.
The C compiler might not be 8-bit clean.
If a code set contains non-ASCII characters (with code values that are greater than 127), the C compiler must be 8-bit clean to interpret the characters. To be 8-bit clean, a compiler must read the eighth bit as part of the code value; it must not ignore or put its own interpretation on the meaning of this eighth bit.

To filter a non-ASCII character, the filter converts each byte of the character to its octal equivalent. For example, suppose the multibyte character A¹A²A³ has an octal representation of \160\042\244 and appears in the stcopy() call.

stcopy("A¹A²A³", dest);

After esqlmf filters the source file, the C compiler sees this line as follows:

stcopy("\160\042\244", dest); /* correct interpretation */

To handle the C-language-token situation, the filter prevents the C compiler from interpreting the A² byte (octal \042) as an ASCII double quotation mark and incorrectly parsing the line as follows:

stcopy("A¹"A³, dest); /* incorrect interpretation of A² */

The C compiler would generate an error for the preceding line because the line has terminated the string argument incorrectly. The esqlmf utility also handles the 8-bit-clean situation because it prevents the C compiler from ignoring the eighth bit of the A³ byte. If the compiler ignores the eighth bit, it incorrectly interprets A³ (octal \244) as octal \044.