Path: utzoo!attcan!uunet!world!decwrl!bacchus.pa.dec.com!deccrl!shlump.nac.dec.com!galvia.enet.dec.com!killian From: killian@galvia.enet.dec.com Newsgroups: comp.text.sgml Subject: Re: Umlaut vs. Diaresis (?) Message-ID: <1990Oct9.163803@galvia.enet.dec.com> Date: 9 Oct 90 15:41:42 GMT Sender: newsdaemon@shlump.nac.dec.com Reply-To: killian@galvia.enet.dec.com () Organization: Digital Equipment Corporation Lines: 60 To: manfred@swi.psy.uva.nl () Cc: Subject: Re: Umlaut vs. Diaresis (?) > Does SGML provide a way of specifying an 'Umlaut', as is used in German, in > contrast wit a 'diaresis', used in many languages, such as the Dutch language. In general, yes! There are two possible ways of doing this but the choice of which depends on whether you want to enter the accented character in text (ie: content) or markup. In the more common case where you want to enter the accented character in text, SGML would advise the use of an SDATA entity (system specific data entity). For example, ö could be used to enter a small 'o' with an umlat, and &odiar; could be used to enter a small 'o' with a diaresis. Of course these entities need to be declared in the scope of the Document Type Definition. This is normally done for a complete collection of such entities and transportability is enhansed when user communities can agree on, and standardise, such entity sets. 'ouml' is present in the ISO Latin 1 entity set published as an informative annex to the SGML standard; its description includes 'o' diaresis, so this particular entity set will not solve your problem. In addition, an SGML system must be able to correctly interpret the 'o' diaresis entity reference when the parser finds it in the text. For example, your SGML typesetting system must be able to translate the declared value of the 'odiar' entity to the correct glyph shape. The other solution, which would also allow the use of the accented character in markup (eg: tag names), is to use a character set that has a code position for the accented character (a different code position than the similar 'o' umlat). SGML is character set independant, in that the SGML declaration (before the Document Type Definition, but absent from most SGML documents) allows the identification or definition of the document character set. Of course, the SGML parser must be able to accept the SGML declaration (not every one does) and the SGML system must be able to accept (eg: typeset) text in that character set. Defining your own character set is not always a smart thing to do. I have also seen some unconventional solutions to your problem. One such solution involved defining a special element (tag) that was used to enter accented characters. For example: odiar. Again, this element would have to be defined in the scope of the Document Type Definition and the SGML system would have to be capable of translating the 'odiar' text into the required accented character. My recommendation is to use the SDATA entity solution if the accented character is not required in markup. Regards, Aidan