Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!uwm.edu!bionet!agate!ucbvax!hplabs!hp-ses!hp-ptp!toddp From: toddp@hp-ptp.HP.COM (Todd_Poynor) Newsgroups: comp.mail.misc Subject: Tab Expansion in E-mail Message-ID: <1960003@hp-ptp.HP.COM> Date: 27 Feb 90 18:34:28 GMT Organization: HP Pacific Technology Park - Sunnyvale, Ca. Lines: 152 Message bodies containing the Horizontal Tab character (ASCII 9) pose quite a problem to Mail User Agents: it is impossible to know how to correctly reproduce the original behavior of the tab on a recipient's display device. That is, the tab stops defined at the sending user's terminal may not correspond to the tab stops defined at the terminal of the destination user(s). Although for UNIX systems tabs at every 8 character positions is fairly standard, this is not the case for other flavors of hosts which send and receive Internet mail. Indeed, on certain hosts and display devices the tab character is not normally understood as a horizontal tab at all. Misaligned columns on reports, often to the point of near-incomprehensibility, are a constant annoyance to users in such a situation. Two obvious means of solution are apparent: to avoid use of that character in text messages which may potentially be received by someone with differing tab stops, or to include information with the message which informs the destination User Agent of the intended tab stops. Avoidance of use can be accomplished by simply not pressing the Tab key when entering messages, but we creatures of habit usually find this hard to remember not to do. Messages can be filtered through a process which locally expands tabs to blanks before dispatching the message, but again, it is difficult to remember to do this if not automatically done. If the filtering is performed automatically, it has the undesirable effect of corrupting certain verbatim-text usages, such as within messages containing files to be transferred "as is". A familiar UNIX example of such corruption is the expansion of tabs within messages containing "shar" archived files, where the receiving process may detect that the received data does not match the data originally sent. Automatic expansion of tabs may be feasible if some means of preventing unwanted expansion is provided. For the unusual case of mailing verbatim text it is perhaps not overly difficult to remember to include some sort of header information or text marker which inhibits message body modification. Taking a cue from the privacy enhancement RFCs, a text marker such as: -----TEXT PROTECTION BOUNDARY----- could indicate the end of text subject to detabbing or any other conceivable text modifications. This marker may have to be recognized even within encapsulated messages (messages within messages, as per RFC 934), where a "- " would be prefixed to the marker. The marker could even be automatically generated by archiving software at the top of the archive. Aside from the general impression of inelegance left by the text marker solution on many computer literates (including this author), the practice of automatic text modifications strikes some as a gross violation of data communications protocol. Although it can be argued that such tab expansion falls under the category of approved cross-host translations along with local character set translation, the general feeling is that the original content of the message should be preserved to the greatest extent possible. For this reason, a preferred method might be to preserve the tab characters within the message, and include information in the message header which informs User Agents what the proper tab settings are. This information would normally correspond to the tab stops which were set at the sending user's terminal. For mail sent by automatic means where no terminal can be identified with the creation of the message either a local default may be given, or the information may be omitted, indicating that, as in the present-day situation, the tab behavior is up to interpretation by the destination. A new header field is probably in order for this purpose. In absence of a standard, the user-defined field nomenclature of prefixing the field name with "X-" has been suggested for prototype implementations. The proposed field syntax in RFC 822 notation is: tab-define = "X-Tab-Stops" ":" 1#(tab-posn / tab-incr) tab-incr = "+" 1*DIGIT tab-posn = 1*DIGIT This syntax defines a field named "X-Tab-Stops" which takes as an argument a comma-separated list of numerical values, each optionally preceded with a plus sign. The list should include at least one of these values, and each value is a string of at least one decimal digit. Each of these values is interpreted as the definition of the next tab stop in left-to-right order across the destination display. The interpretation of each value is as follows: o If it is a tab-posn (that is, is not preceded by a plus sign) the value is the character position of the next tab stop, where the first character in the line is numbered one. o If it is a tab-incr (preceded with a plus sign) the value is a number of characters relative to the character position of the preceding tab stop at which the next tab stop is to be set. If there has been no previous tab stop definition, meaning that this is the first item in the list, the increment is relative to character position 1. If it is the last item in the list this increment applies indefinitely, such that the effect is to have an infinite number of tab stops set from this position forward, each with this same character position increment between them. So for UNIX users with tab stops every 8 characters this might appear as: X-Tab-Stops: 9, 17, 25, 33, 41, 49, 57, 65, 73 or more succinctly: X-Tab-Stops: +8 A typical setting for FORTRAN programmers might be: X-Tab-Stops: 7, +3 which sets the first tab at position 7, the start of the statement area, and every 3 positions thereafter. One possible modification to the argument syntax is to delete the commas, using blanks as separators between items for efficiency (RFC 822 favors the shown syntax for lists). This header field is intended to be interpreted by Mail User Agents at message viewing time. Tab characters in the body are expanded as blanks, according to the tab stops defined in the field. Of course, use of other control characters or characters outside the standard printing subset may cause the User Agent to have an incorrect notion of the current character position at expansion time. This is not expected to be a problem in most text messages of the sort normally used in inter-host environments. This solution addresses the problem of reading tabbed messages at the presentation level, which many feel is appropriate. Not specifically addressed is the problem of saving the message text in the local file system, where detabbing a particular message may be required or may be prohibited, depending on the intended use of the file. Conceivably, the same software which displays messages on terminals can perform the conversion into files, leaving execution of this software for file storage to user discretion. Digests may require interpretation of the "X-Tab-Stops" field at each encapsulated message header by presentation software, or digestification software may convert encapsulated messages to a common tabbing convention. A subject of controversy is whether gateways to foreign mail systems not adhering to any such tab stop representation should expand the tabs contained in the body according to the "X-Tab-Stops" field during transfer. Bearing the aforementioned warnings about data corruption in mind, this author recommends the data be passed unretouched; let the foreign community demand such a capability in that mailer if deemed of sufficient importance. Obviously, the problem of tab representation is difficult to solve, perhaps more difficult than is warranted by the relatively minor consequence involved. If you have any thoughts on a simpler solution I welcome your suggestions. Submitted for your approval, Todd Poynor HP Data Systems Operation todd@hpepoc.hp.com 408/746-5185