Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!tut.cis.ohio-state.edu!ucbvax!ucsd!hub!eiffel!bertrand From: bertrand@eiffel.UUCP (Bertrand Meyer) Newsgroups: comp.lang.eiffel Subject: Eiffel cleanup #5: The character set Keywords: A somewhat less USA_English-centered perspective Message-ID: <236@eiffel.UUCP> Date: 21 Jan 90 03:39:13 GMT Organization: Interactive Software Engineering, Santa Barbara CA Lines: 194 A planned change for the character set of Eiffel ------------------------------------------------ This is the fifth of a sequence of postings describing the changes planned for version 3 of Eiffel. These are cleanup changes and do not affect anything fundamental. The particular issue discussed here is the character set for the language and the support for national variants. Since the preparation of version 3 is clean-up time, I have finally looked in some detail at these theoretically mundane but practically important questions. I must confess that not much thought went into these aspects when the language was designed, but now is the time to get them right. It's great to have the proper solution for the big things, but why not take care of the little ones as well? I must further admit that I have no particular expertise in this field, and won't be offended if it turns out that someone else has a better answer. I was considerably helped by the contribution made by Erland Sommarskog last July (<101@enea.se>). In fact, the advantage I gained from that posting is almost unfair; Mr. Sommarskog did all the hard work, and I only had to draw the conclusions for Eiffel. (Of course, he bears no responsibility for any deficiency in what follows.) Anyone who wants to contribute comments or criticisms will be well advised to look at Mr. Sommarskog's message first. The solution described below only addresses one-byte codes (such as those used for European languages other than English). No consideration has been given to multi-byte languages. (If we don't leave some work for the standardization committee, they might get bored.) ---------------------------------------------------------------------- |WARNING: The change described here is planned for version 3 of the | |environment, not to be released until late 1990. | | | |Any change in the language supported by Interactive's tools | |will be accompanied by CONVERSION TOOLS to translate ``old'' syntax | |into new. Programmers will NOT need to perform any significant work | |to update existing Eiffel software. | | | |This posting is made solely for the purpose of informing the Eiffel | |community about ongoing developments. Although the posting has been | |preceded by careful reflection and internal discussions within | |Interactive, we make no commitment at this point that the features | |described here will actually be included, and, if they are, that | |their final form will be the exact one shown below. | ---------------------------------------------------------------------- Purpose of the change. ---------------------- Several problems were raised by Mr. Sommarskog with respect to the use of Eiffel on non-American keyboards/terminals. 1. - Characters such as @ At sign [ Opening bracket ] Closing bracket { Opening brace } Closing brace | Vertical bar \ Backslash ^ Circumflex ` Back quote ~ Tilde are often pre-empted by national character set variants. For example, on many French keyboards, [ and ] appear as e' (e with an acute accent) and e` (e with a grave accent). Mr. Sommarskog cited further examples with Swedish keyboards. These character translations make it very unpleasant for programmers working on such keyboards, who have to remember the correspondence between the character in the language manual and their local keyboard equivalents. Mr. Sommarskog went so far as to say that ``Any programming language using any of [these] characters as an operator or a delimiter is committing a crime in my eyes''. (He did not say, however, that the language *designer* was committing a crime, so I feel relatively safe even though I will be traveling to Sweden soon.) The problem does exist in Eiffel since all of the above except ` (back quote) are used in special symbols. (| and ~ will be used for boolean operators in Eiffel 3.) 2. - The syntax of identifiers restricts them to letters, underscores and digits. ``Letters'' here means unaccented letters of the English alphabet. A French programmer would often like to use accented letters in an identifier (e.g. e've'nement, with two acute accents), and similarly for other languages. 3. - The backslash is particularly ``criminal'' since it has an important role in strings and character constants as the ``escape'' for special characters, in the Unix-C tradition. For example a quote in a character constant is \' (backslash-quote). 4. - As a less important but unpleasant point, special characters are specified through a three-digit octal code, as in '\756'. Why force octal? Also, why require exactly three digits, which imply leading zeroes? The language change ------------------- The language change is simple. First, an observation: brackets and braces are (fortunately) not strictly needed syntactically in Eiffel: parentheses would do just as well in the places where these characters are needed. (Brackets are used for generic parameters; braces for selective exports.) As a consequence, parentheses now become legal in those places, although the forms using brackets and braces remain the standard ones for publication of program texts. Brackets and braces will continue to be used as the default form for text produced as output of tools of the Eiffel environment such as ``short'', even if the original class text uses parentheses. (Presumably, a decent troff/TEX/Interleaf/Word adapted to Swedish, Polish or French will still have those characters.) Similarly, equivalents are defined for ^ (for which the equivalent is **, as used in Fortran for exponentiation), ~ (not) etc. Then, the backslash loses its special role as an escape character in character and string constants. It is replaced by the exclamation mark. For example, in a character or string constant: !! means ! !" means " !' means ' !T means tab !N means new line !D(27) means the character of decimal code 27 !O(27) means the character of octal code 27 !X(27) means the character of hexadecimal code 27 etc. The convention for character strings split over two or more lines remains as before, with ! instead of backslash. In all codes involving letters (!T, !D etc.), lower- and upper-case are equivalent. For the last three codes in the above list, note that the numerical value is parenthesized, so that the number of digits is not fixed. Finally, although the default alphabet for identifiers is still the English letters plus digits and underscore, it becomes possible to use others if they are specified in a special file (which could be called ``.characters'' in the Unix implementation). The idea of using a file rather than a compilation option is that if you deliver classes to a customer (possibly in a different country) you will deliver the .characters file as well, ensuring consistent recompilation at the target site; with compilation options this cannot be achieved. Furthermore, a file is more flexible. Obviously, some restrictions are imposed on the characters that may be specified in the .characters file: they may not conflict with characters used in special symbols of the language, such as ``;'' or ``:'', unless these symbols have default substitutes (as with the bracket ``['', whose substitute is ``(''). Just as obviously, once a character has been selected for identifiers through the .characters file, it cannot be used as special symbol any more; for example, if you accept the opening bracket in identifiers because its shows up as e' on your keyboard, then you may not use it as a bracket any more and must resort to parentheses. Discussion ---------- The exclamation mark seems to be the least bad among universally possible choices. Its use as an attention-getter in ordinary language seems to fit well with its above use as a special character marker. We have, of course, considered the obvious objection that a new Eiffel programmer's first attempt may contain the instruction putstring ("Hello world!") which will trigger a compilation error (because the exclamation mark eats the following double quote). Tough luck. At least, we can try to produce a decent error message. -- -- Bertrand Meyer bertrand@eiffel.com