Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!husc6!necntc!ncoast!allbery From: dietz@zhmti.UUCP (Dieter H. Zebbedies) Newsgroups: comp.sources.misc Subject: "Producer" translates Smalltalk to Objective-C (Part 1 of 5) Message-ID: <4217@ncoast.UUCP> Date: Wed, 19-Aug-87 21:54:37 EDT Article-I.D.: ncoast.4217 Posted: Wed Aug 19 21:54:37 1987 Date-Received: Sat, 22-Aug-87 06:35:34 EDT Sender: allbery@ncoast.UUCP Organization: Zebb-Hoff Machine Tool Inc's Automated Mfg. Project, Cleve., OH Lines: 1162 Approved: allbery@ncoast.UUCP X-Archive: comp.sources.misc/8708/30 "Producer", A package to translate Smalltalk-80 code to your favorite object oriented language, Objective-C. #!/bin/sh # to extract, remove the header and type "sh filename" if `test ! -s ./Makefile` then echo "writting ./Makefile" cat > ./Makefile << '\Rogue\Monster\' DOC= README producer.f .SUFFIXES: .me .i .f all: README README: readme.f ; mv readme.f README .me.i: ; itroff -me mac.me $< .me.f: ; nroff -me mac.me $< >$$$$.f && mv $$$$.f $*.f \Rogue\Monster\ else echo "will not over write ./Makefile" fi if `test ! -s ./mac.me` then echo "writting ./mac.me" cat > ./mac.me << '\Rogue\Monster\' .\" Must be defined externally .\" \nN Chapter Number .po .5i .ll 7i .nr % 1 .ds NM \\nN .de H . tm .H0 \\*(NM-\\n% "\\$1" . sp .(l C \\s12\\fB\\$1\\fP\\s0 .)l . pp .. .de H1 . tm .H1 \\*(NM-\\n% "\\$1" . uh "\\$1" . pp .. .de H2 . tm .H2 \\*(NM-\\n% "\\$1" . uh "\\$1" .. .de M . tm .M \\*(NM-\\n% "\\$1" . uh "\\$1" .. .de MC . tm .MC \\*(NM-\\n% "\\$1" . uh "\\$1" .. .de (C . (l . sz -2n . ls 1 . ta .25i +.25i +.25i +.25i +.25i +.25i +.25i +.25i +.25i +.25i +.25i +.25i .. .de )C . ta . ls 2 . sz . )l .. .nr fn 1 .ds F \*(NM-1 .de (F . ds F \\*(NM-\\n(fn . tm .F \\*(NM-\\n% "Figure \\*F: \\$1" . (l M F . hl . sp . ce \fIFigure \\*F:\fP \\$1 .. .de )F . hl . )l . nr fn +1 . ds F \\*(NM-\\n(fn .. .de (N . (q \\fINOTE IN DRAFT: \\$1\\fP .. .de )N . )q .. .de NT . (q \\fINOTE IN DRAFT: \\$1\\fP . )q . tm .NT \\*(NM-\\n% "NOTE: \\$1" .. .de C .nr N \\$1 .ds NA \\$1 .he ''\\*(NA'' .fo 'Brad Cox'%'\\*(td' .(l C \\s14\\fB\\$2\\fP\\s0 .sp 2 .)l .he ''\\*(NA'' .tm .C \\nN-\\n% "\\$2" .pp .. \Rogue\Monster\ else echo "will not over write ./mac.me" fi if `test ! -s ./README` then echo "writting ./README" cat > ./README << '\Rogue\Monster\' Producer: Smalltalk-80 to Objective-C Translator Brad J. Cox Productivity Products International 75 Glen Road Sandy Hook, CT 06482 (203) 426 1875. Smalltalk-80 is a tool for turning raw concepts into working software prototypes. Objective-C is a tool for turning proven concepts into fast, commercial-quality, production systems. Producer is a tool for bridging the gap between prototyping and production by automati- cally translating Smalltalk-80 sources into Objective-C sources. The translation is guided by a rule base in which the programmer describes how differences between the Smalltalk-80 prototyping environment and the Objective-C production environment should be resolved when translating the code. At SIGGRAPH-87, PPI will announce a library of user interface components from which programmers build applications with iconic user interfaces. The library and applications built using it are portable across diverse window systems, initially X-Windows, SunWindows and Hewlett Packard's window system. While the Objective-C user interface classes are different from Smalltalk's, they are similar enough that Producer can usually bridge the differences with some hand-tuning of the translated output. We confidently hope that Objective-C, this library and Producer will make automatic translation of Smalltalk-80 prototypes a routine part of many companies' software development lifecycle. I'm distributing Producer to enlist your help in testing the practicality of this notion. Disclaimer Producer is not a mature software product but an embryo that could grow to maturity someday. Specifically it is not supported or warranteed in any way. It was written by myself, an individual employed by PPI, and has been released prior to maturity by myself as an individual with the consent of the company. This document will make its strengths and some of its present shortcomings clear. However, even in its present state, Producer demonstrates that automatic translation is technically feasible and its present imple- mentation provides a capable foundation on which to build. Since the market for Smalltalk-80 translators is insufficient for PPI to pursue presently, we've released Producer for you to make what use of it you can. I do ask that you keep me informed of your experiences in using it in its current state, and PPI requests that you feed back any Brad Cox 1 June 22, 1987 Producer: Smalltalk-80 to Objective-C Translator improvements so that we can offer a fully supported translation pro- duct in the future. PPI retains the copyright and all other applicable rights. For example, you may not sell products that contain any part of the Producer distribution without PPI's permission. How it works The following is a brief description of how Producer works inter- nally. This was written from my recollection of how I left the code over a year ago. It may be inaccurate in places. Producer is basically a compiler. It's lexical analyzer (written in lex) divides Smalltalk-80 text into lexemes, and its parser (writ- ten in yacc) recognizes valid lexeme sequences and constructs an abstract representation of the program as an expression tree. The expression tree consists of instances of Objective-C classes; e.g. Method, Statement, Expression, Message, and Variable. The grammar was derived from the syntax diagrams in Goldberg and Robson; _S_m_a_l_l_t_a_l_k-_8_0: _T_h_e _L_a_n_g_u_a_g_e _a_n_d _i_t_s _I_m_p_l_e_m_e_n_t_a_t_i_o_n; Addison Wesley; 1986. The grammar was extended to also recognize rules that may also appear in the lexeme stream. Rules are enclosed in { braces } to help fend off shift-reduce conflicts from yacc. The parser stores the rules in separate data structures for use during code generation. At certain points, the parser sends the top of the expression tree a gen message to trigger code generation[1]. Recall that Smalltalk-80 is an extremely simple language with basically two com- ponents; data references (variables, literals, etc) and messages. Rules may influence how each case is treated during code generation. Code generation proceeds in two passes. The first pass collects typing information for each symbol and message by examining the expression tree from the bottom up. The bottom-most nodes are either literals whose type is immediately obvious (e.g. 1, 2.3, or 'string'), or they are symbols whose type can be known or unknown. Symbol types ____________________ 9 [1] I now regard this as a major architectural flaw whenever I see it in any application. It represents a key departure from an important but often ignored rule of object-oriented design. The expression tree classes should be abstract so that they could be reused in other tools. But their code generation methods pollute the abstraction with knowledge about a particular concrete interface; Objective-C. The code generation methods should have been provided in a separate hierarchy of classes that know how to connect the abstract classes to one of many potential concrete interfaces. This rule is simply a generaliza- tion of the model/view/controller paradigm to apply to interfaces of any kind, not just user interfaces. 9 Brad Cox 2 June 22, 1987 Producer: Smalltalk-80 to Objective-C Translator become known either as the result of a previous type inferencing operation or because their type was specified in a rule. Unknown sym- bols default to id when first referenced. Most of the internal nodes are messages. Message typing is slightly more complicated because any message can have multiple trans- lations depending on how the message is used because different rules may specify different translations for different receiver and argu- ments types. The diverse translations may each compute a different type. Since we assign types bottom up, types have been assigned for the arguments and the receiver, so a translation for that selector is chosen by searching a table of possible translations for one matching the receiver and argument types. In all cases, unless overridden by a specific rule, default translations are used. These amount to a fairly literal translation from Smalltalk-80 syntax to Objective-C syntax. However exceptions are made for Smalltalk literal constants, which translate to C literal constants. In other words, 2+2 translates to [2 plus:2], which is _g_u_a_r_a_n_t_e_e_d to fail catastrophically in Objective-C. The integer 2 is an object only in Smalltalk! The moral: _N_e_v_e_r believe the translator. _A_l_w_a_y_s monitor it closely. Remember the 90-10 rule. The automatic translation concept is capable, with suitable rules, of automatically translating only 90% of an application correctly; the other 10% (where the bugs will have congregated) is still up to you. Implementation Status Producer currently represents about three man-weeks of effort, spent in two intensive bursts separated by about a year. The most recent burst was nearly a year and a half ago. The first burst was to demonstrate the feasibility and practicality of the translation con- cept. The second burst was in the course of preparing a paper that, coauthored with Kurt Schmucker, will appear in the OOPSLA-87 proceed- ings. A (very) early draft is provided with this distribution. For being developed so quickly, the translator does an effective job of translation. I refer you to the paper for discussions of the strengths and limitations of the translation concept. This section discusses the current implementation of this concept, the items on my own must-do list for the planned, but not yet completed, third stage of Producer's evolution. (1) Smalltalk-80 fileout format uses '!' delimiters in a fashion that I was never able to formalize correctly in Producer's yacc gram- mar. The symptom is that the translator will generate syntax errors in nearly every translated file for certain of these del- imiters. I'm told that fileout format has been documented in a Brad Cox 3 June 22, 1987 Producer: Smalltalk-80 to Objective-C Translator paper somewhere, but I've never worked the repairs back into the code. The fix should be local to gram.y. (2) The translator loads its rule base by reading files of rules as if they were concatenated with the sources to be translated. The rule-specification syntax is abysmal, primarily because it was chosen to minimize the amount of time I spent struggling with shift-reduce conflicts from yacc, rather than making the rules intelligible to users. Smalltalk's formal grammar seemed unrea- sonably difficult for yacc to swallow, and I suspect the problem may lie in some mistake I've made in translating Smalltalk-80 syntax diagrams into yacc specifications. (3) The program contains extensive provisions for reporting its cogi- tations in type inferencing. The various error, warning, logging, and debugging messages need to be tuned for greater utility. (4) The code was based on an as yet unreleased libary (phylum) called "Substrate", which supports features that are not yet in our standard product set, like Blocks, Coroutining, and exception handling. I made a fast editing pass to remove any dependencies on these nonstandard library features. I also added a file, Substrate.h, that defines stylistic conventions that I adhere to in all my work. See USE, IMPORT, EXPORT, etc in the sources. The preceeding problems are superficial and easily repaired. The following ones are somewhat more substantial in that they involve design work in addition to coding work. (1) The type inferencing machinery infers types of newly-encountered (unknown) messages and variables by seeing how they are combined with variables and messages whose types are known apriori or else determined earlier through inferencing. The only types that are known apriori are literals like 1, 2.3, or 'string'. This gen- erally provides insufficient typing information from which to infer anything useful, so you should generally provide variable rules to pin down types for key instance variables and method arguments You do this with rules that state, in effect, that `the type of the Smalltalk variable named foo is int, and the variable is called foobar in Objective-C'. Presently rules have global scope. If different Smalltalk classes use the name, foo, in ways that should be translated differently, different rule sets must be provided manually to the translator. Creating and managing these application-specific rules sets adds to the translation effort and tends to make rules non-reusable across translations. The rules should be organized with a scoping mechanism, ideally one based on inheritance. (2) The inferencing logic is ad-hoc and quite possibly slow. However the main bottleneck seems to be loading the rule-base; transla- tion speed has never been a real problem. Inferencing is presently deductive, and a more inductive scheme based on both forwards and backwards reasoning might produce higher quality Brad Cox 4 June 22, 1987 Producer: Smalltalk-80 to Objective-C Translator translations. In other words, the translation of a given message expression is determined exclusively by whatever information can be inferred about the types of the receiver and arguments to that message (forward reasoning). Backward reasoning would also con- sider how the results of the expression are used in other expres- sions. (3) Producer does not presently handle non-trivial uses of Blocks correctly; ie. Block expressions that cannot be translated directly into C conditional expressions like if, while, or for, which Producer handles just fine already. Nearly all occurrences of Smalltalk-80 Blocks could be handled without changing the Objective-C language by adding a trivially simple Block class to the library. A named instance variable holds a pointer to a static function and indexed instance variables hold _c_o_p_i_e_s _o_f any variables that the block accesses in the instantiation site[2]. This copy could be taken entirely automatically by copying the instantiation site's stack frame. However I prefer to have more control over space than that. So I've been using a scheme that requires the programmer (and someday the compiler) to specify which variables are really accessed by the block as arguments to the message that instantiates the block; like this ... { IMPORT void aStaticFunction(); id var1 = something, var2 = something; aBlock = [Block function:aStaticFunction args:2, var1, var2]; [anyObject do:aBlock]; ... } LOCAL void aStaticFunction(instantiationSiteVariables, value1, value2) struct { id var1, var2; } *instantiationSiteVariables; id value1, value2; { if ([instantiationSiteVariables->var1 someMessage]) ... } The block will call the function when anyObject sends the block one of several evaluation messages (value:arg1 or value:arg1 value:arg2 or ...). The first argument is a _p_o_i_n_t_e_r to block's copy of the instantiation site's variables. The trailing argu- ments contain the arguments that the invocation site passed in the value: message. I've used this approach extensively by writ- ing the static functions by hand, and am trying to get our staff to extend the language to provide some kind of language-level support to make the syntax simpler. This approach could be, but has not yet been, taken by Producer. ____________________ 9 [2] In Smalltalk-80, the block seems to have access to the instan- tiation site's variables, so that the block can change variables in 9 Brad Cox 5 June 22, 1987 Producer: Smalltalk-80 to Objective-C Translator The inferencing machinery's primary current virtue is that it can be made to work for selected test cases. It leaves lots to be desired. Call me if you decide to extend it so that I can prevent unnecessary duplication of effort. About the distribution The top level of the distribution consists of total 88 -rw-r--r-- 1 cox 181 Jun 22 14:32 Makefile -rw-r--r-- 1 cox 26592 Jun 22 14:30 README drwxr-xr-x 2 cox 512 Jun 22 14:19 example -rw-r--r-- 1 cox 166 Jun 16 13:18 log -rw-r--r-- 1 cox 997 Jun 15 11:09 mac.me -rw-r--r-- 1 cox 26751 Jun 15 11:02 producer.me -rw-r--r-- 1 cox 21444 Jun 22 14:29 readme.me drwxr-xr-x 2 cox 512 Jun 12 10:22 rules drwxr-xr-x 2 cox 3072 Jun 22 14:31 src The Makefile governs formatting of the two documents; this README (from readme.me) and the draft of the OOPSLA-87 paper (from Producer.me). The mac.me file contains text formatting macros that are common to both papers; used like this: nroff -me mac.me Producer.me >Producer.f The rules directory contains a single file, generic.ru, that represents a first pass at an application-independent rules base. This set of rules translate Smalltalk to the conventions used in my proto- type version of the user interface library. For example, it translates Smalltalk Integer operations to C int operations, and it translates Smalltalk Point operations to C macros that manage points as type PT; a pair of 16-bit coordinates in a 32- bit C int. For example, pt(x,y) invokes a C macro that trims and shifts two ints, x and y, to fit side by side in a 32-bit integer, ptPlus(p,q) invokes a macro that computes the vector sum of two points, p and q, etc. rules: total 35 ____________________ 9 the instantiation site. In Objective-C the block receives a copy of the variables and cannot use them to communicate with the instantia- tion site. I believe that this is the sole functional difference between the two schemes. 9 Brad Cox 6 June 22, 1987 Producer: Smalltalk-80 to Objective-C Translator -rw-r--r-- 1 cox 35567 Jun 12 10:22 generic.ru The src directory contains a fragment from the video animation program that appears at the end of the Smalltalk-80 video tape. BounceInBoxNode.st is the Smalltalk-80 source file, animation.ru con- tains the application-specific rule set, BounceInBoxNode.m is the translated version built by Producer as invoked by Makefile[3]. example: total 7 -rw-r--r-- 1 cox 1730 Jun 16 10:24 BounceInBoxNode.m -rw-r--r-- 1 cox 868 Jun 16 10:18 BounceInBoxNode.st -rw-r--r-- 1 cox 394 Jun 16 10:20 Makefile -rw-r--r-- 1 cox 2178 Jun 16 10:18 animation.ru -rw-r--r-- 1 cox 185 Jun 16 10:24 log -rw-r--r-- 1 cox 239 Jun 16 10:18 st80.h The log file records the results of the translation session. The syntax error is innocuous, the result of the beforementioned problem in the grammar in handling '!' delimiters. Producer -c ../rules/generic.ru animation.ru BounceInBoxNode.st >BounceInBoxNode.m error 7:BounceInBoxNode.st: tegory:'Graphics-Animation'!! : syntax error *** Error code 1 (ignored) The src directory contains the sources for Producer, with its own Makefile. The Substrate.h header file, which is automatically included by the Producer.h header file, is technically a part of a internal lower level library, Substrate, on which Producer was origi- nally developed. Substrate.h was copied and changed superficially so that Producer compiles correctly without the Substrate library. src: total 70 -rw-r--r-- 1 cox 483 Jun 12 10:21 AbstractTranslation.m -rw-r--r-- 1 cox 282 Jun 12 10:21 ArgumentList.m -rw-r--r-- 1 cox 897 Jun 12 10:21 Block.m -rw-r--r-- 1 cox 143 Jun 12 10:21 CharConstant.m -rw-r--r-- 1 cox 2205 Jun 12 10:21 Class.m -rw-r--r-- 1 cox 630 Jun 12 10:21 Comment.m -rw-r--r-- 1 cox 176 Jun 12 10:21 Constant.m -rw-r--r-- 1 cox 2032 Jun 12 10:21 Expr.m -rw-r--r-- 1 cox 1243 Jun 12 10:21 FunctionTranslation.m -rw-r--r-- 1 cox 1484 Jun 12 10:21 Identifier.m -rw-r--r-- 1 cox 1248 Jun 12 10:21 IdentifierTranslation.m ____________________ 9 [3] The full source for the animation program is not provided. My copyright paranoia argued against providing even this fragment. 9 Brad Cox 7 June 22, 1987 Producer: Smalltalk-80 to Objective-C Translator -rw-r--r-- 1 cox 105 Jun 12 10:21 List.m -rw-r--r-- 1 cox 1985 Jun 15 11:55 METHODDECLS.m -rw-r--r-- 1 cox 1384 Jun 15 11:51 Makefile -rw-r--r-- 1 cox 4302 Jun 12 10:21 Method.m -rw-r--r-- 1 cox 3136 Jun 12 10:21 Msg.m -rw-r--r-- 1 cox 583 Jun 12 10:21 MsgArgPattern.m -rw-r--r-- 1 cox 828 Jun 12 10:21 MsgNamePattern.m -rw-r--r-- 1 cox 1280 Jun 12 10:21 MsgTranslation.m -rw-r--r-- 1 cox 775 Jun 12 10:21 MsgTranslator.m -rw-r--r-- 1 cox 1868 Jun 12 10:21 Node.m -rw-r--r-- 1 cox 229 Jun 12 10:21 NumberConstant.m -rw-r--r-- 1 cox 1402 Jun 15 11:27 Producer.h -rw-r--r-- 1 cox 306 Jun 12 10:21 Return.m -rw-r--r-- 1 cox 825 Jun 12 10:21 Scope.m -rw-r--r-- 1 cox 3157 Jun 12 10:21 Selector.m -rw-r--r-- 1 cox 253 Jun 12 10:21 SelectorConstant.m -rw-r--r-- 1 cox 457 Jun 12 10:21 StArray.m -rw-r--r-- 1 cox 492 Jun 12 10:21 Stmt.m -rw-r--r-- 1 cox 381 Jun 12 10:21 StringConstant.m -rw-r--r-- 1 cox 1268 Jun 12 10:21 StringTranslation.m -rw-r--r-- 1 cox 2140 Jun 15 11:38 Substrate.h -rw-r--r-- 1 cox 1405 Jun 15 11:53 Symbol.m -rw-r--r-- 1 cox 452 Jun 12 10:21 Template.m -rw-r--r-- 1 cox 901 Jun 12 10:21 Type.m -rw-r--r-- 1 cox 1800 Jun 12 10:21 design.me -rw-r--r-- 1 cox 3271 Jun 12 10:21 gen.m -rw-r--r-- 1 cox 9007 Jun 12 10:21 gram.y -rw-r--r-- 1 cox 3601 Jun 12 10:21 lex.l -rw-r--r-- 1 cox 2212 Jun 12 10:21 main.m -rw-r--r-- 1 cox 260 Jun 12 10:21 st80.h -rw-r--r-- 1 cox 259 Jun 15 11:59 y.tab.h The files are exactly as I left them nearly a year and a half ago, except for: (1) The addition of this README document. An early draft of the OOPSLA-87 paper, sadly prior to Kurt Schmucker's improvements, is in Producer.me. (2) One recompilation pass to remove any obvious dependencies on my private Substrate library and to verify that Producer compiles and runs correctly on the standard Foundation library. I tested the changes by verifing that the Makefile in the example direc- tory ran to completion, but this is hardly an ironclad guarantee. Using Producer Flags controlling the translation process, source files, and rules files are provided on the command line and are processed in the Brad Cox 8 June 22, 1987 Producer: Smalltalk-80 to Objective-C Translator order they appear. The flags are[4] -d: Enable debugging functions (dbg()) scattered throughout the code. Seldom useful. -m: Enables the Objective-C Foundation library message tracing feature. Seldom useful in Producer. -a: Enables the Objective-C Foundation library allocation tracing feature. Seldom useful in Producer. -l: Enables printing of each lexical token as produced by lex. Useful only for debugging lex.l. -g: Enables automatic redirection of each class into a separate file based on the class name parsed from the input file. Automatically puts class Foobar into file Foobar.m. CAREFUL! This puts at risk other files whose name might coincide with a Smalltalk-80 class name! -s: Generate Smalltalk-80 sources in the output file as Objective-C comments (the default). -c: Don't generate Smalltalk-80 sources in the output file. -i: Generate information that was thought at one time to be useful when debugging rules. -M: Send storeOn: to the message rule dictionary just before ter- minating as a debugging aid. -I: Send storeOn: to the variable rule dictionary just before ter- minating as a debugging aid. Typically, the generic rules in rules/generic.ru is specified first, then any application-specific rules, then a single Smalltalk-80 source file. Unless -g is set, the translated output appears on stdout. The various creaks, groans and mumbles that can be elicited about the translation process itself appear on stderr. For the syntax for writing new rules, refer to the examples in generic.ru and animation.ru, and if necessary, the rules section of the grammar in gram.y. And good luck! Let me know how you fare... ____________________ 9 [4] I'm working from memory about what these flags mean. Some may be nonfunctional: 9 Brad Cox 9 June 22, 1987 \Rogue\Monster\ else echo "will not over write ./README" fi if `test ! -s ./readme.me` then echo "writting ./readme.me" cat > ./readme.me << '\Rogue\Monster\' .C "Producer: Smalltalk-80 to Objective-C Translator" .(l C Brad J. Cox Productivity Products International 75 Glen Road Sandy Hook, CT 06482 (203) 426 1875. .)l .pp Smalltalk-80 is a tool for turning raw concepts into working software prototypes. Objective-C is a tool for turning proven concepts into fast, commercial-quality, production systems. Producer is a tool for bridging the gap between prototyping and production by automatically translating Smalltalk-80 sources into Objective-C sources. The translation is guided by a rule base in which the programmer describes how differences between the Smalltalk-80 prototyping environment and the Objective-C production environment should be resolved when translating the code. .pp At SIGGRAPH-87, PPI will announce a library of user interface components from which programmers build applications with iconic user interfaces. The library and applications built using it are portable across diverse window systems, initially X-Windows, SunWindows and Hewlett Packard's window system. While the Objective-C user interface classes are different from Smalltalk's, they are similar enough that Producer can usually bridge the differences with some hand-tuning of the translated output. We confidently hope that Objective-C, this library and Producer will make automatic translation of Smalltalk-80 prototypes a routine part of many companies' software development lifecycle. .pp I'm distributing Producer to enlist your help in testing the practicality of this notion. .H "Disclaimer" .pp Producer is not a mature software product but an embryo that could grow to maturity someday. Specifically it is not supported or warranteed in any way. It was written by myself, an individual employed by PPI, and has been released prior to maturity by myself as an individual with the consent of the company. This document will make its strengths and some of its present shortcomings clear. .pp However, even in its present state, Producer demonstrates that automatic translation is technically feasible and its present implementation provides a capable foundation on which to build. Since the market for Smalltalk-80 translators is insufficient for PPI to pursue presently, we've released Producer for you to make what use of it you can. .pp I do ask that you keep me informed of your experiences in using it in its current state, and PPI requests that you feed back any improvements so that we can offer a fully supported translation product in the future. PPI retains the copyright and all other applicable rights. For example, you may not sell products that contain any part of the Producer distribution without PPI's permission. .H "How it works" .pp The following is a brief description of how Producer works internally. This was written from my recollection of how I left the code over a year ago. It may be inaccurate in places. .pp Producer is basically a compiler. It's lexical analyzer (written in lex) divides Smalltalk-80 text into lexemes, and its parser (written in yacc) recognizes valid lexeme sequences and constructs an abstract representation of the program as an expression tree. The expression tree consists of instances of Objective-C classes; e.g. Method, Statement, Expression, Message, and Variable. The grammar was derived from the syntax diagrams in Goldberg and Robson; \fISmalltalk-80: The Language and its Implementation\fP; Addison Wesley; 1986. .pp The grammar was extended to also recognize rules that may also appear in the lexeme stream. Rules are enclosed in { braces } to help fend off shift-reduce conflicts from yacc. The parser stores the rules in separate data structures for use during code generation. .pp At certain points, the parser sends the top of the expression tree a gen message to trigger code generation\**. Recall that Smalltalk-80 is an extremely simple language with basically two components; data references (variables, literals, etc) and messages. Rules may influence how each case is treated during code generation. .(f \** I now regard this as a major architectural flaw whenever I see it in any application. It represents a key departure from an important but often ignored rule of object-oriented design. The expression tree classes should be abstract so that they could be reused in other tools. But their code generation methods pollute the abstraction with knowledge about a particular concrete interface; Objective-C. The code generation methods should have been provided in a separate hierarchy of classes that know how to connect the abstract classes to one of many potential concrete interfaces. This rule is simply a generalization of the model/view/controller paradigm to apply to interfaces of any kind, not just user interfaces. .)f .pp Code generation proceeds in two passes. The first pass collects typing information for each symbol and message by examining the expression tree from the bottom up. The bottom-most nodes are either literals whose type is immediately obvious (e.g. 1, 2.3, or 'string'), or they are symbols whose type can be known or unknown. Symbol types become known either as the result of a previous type inferencing operation or because their type was specified in a rule. Unknown symbols default to id when first referenced. .pp Most of the internal nodes are messages. Message typing is slightly more complicated because any message can have multiple translations depending on how the message is used because different rules may specify different translations for different receiver and arguments types. The diverse translations may each compute a different type. Since we assign types bottom up, types have been assigned for the arguments and the receiver, so a translation for that selector is chosen by searching a table of possible translations for one matching the receiver and argument types. .pp In all cases, unless overridden by a specific rule, default translations are used. These amount to a fairly literal translation from Smalltalk-80 syntax to Objective-C syntax. However exceptions are made for Smalltalk literal constants, which translate to C literal constants. In other words, 2+2 translates to [2 plus:2], which is \fIguaranteed\fP to fail catastrophically in Objective-C. The integer 2 is an object only in Smalltalk! .pp The moral: \fINever\fP believe the translator. \fIAlways\fP monitor it closely. Remember the 90-10 rule. The automatic translation concept is capable, with suitable rules, of automatically translating only 90% of an application correctly; the other 10% (where the bugs will have congregated) is still up to you. .H "Implementation Status" .pp Producer currently represents about three man-weeks of effort, spent in two intensive bursts separated by about a year. The most recent burst was nearly a year and a half ago. The first burst was to demonstrate the feasibility and practicality of the translation concept. The second burst was in the course of preparing a paper that, coauthored with Kurt Schmucker, will appear in the OOPSLA-87 proceedings. A (very) early draft is provided with this distribution. .pp For being developed so quickly, the translator does an effective job of translation. I refer you to the paper for discussions of the strengths and limitations of the translation concept. This section discusses the current implementation of this concept, the items on my own must-do list for the planned, but not yet completed, third stage of Producer's evolution. .np Smalltalk-80 fileout format uses '!' delimiters in a fashion that I was never able to formalize correctly in Producer's yacc grammar. The symptom is that the translator will generate syntax errors in nearly every translated file for certain of these delimiters. I'm told that fileout format has been documented in a paper somewhere, but I've never worked the repairs back into the code. The fix should be local to gram.y. .np The translator loads its rule base by reading files of rules as if they were concatenated with the sources to be translated. The rule-specification syntax is abysmal, primarily because it was chosen to minimize the amount of time I spent struggling with shift-reduce conflicts from yacc, rather than making the rules intelligible to users. Smalltalk's formal grammar seemed unreasonably difficult for yacc to swallow, and I suspect the problem may lie in some mistake I've made in translating Smalltalk-80 syntax diagrams into yacc specifications. .np The program contains extensive provisions for reporting its cogitations in type inferencing. The various error, warning, logging, and debugging messages need to be tuned for greater utility. .np The code was based on an as yet unreleased libary (phylum) called "Substrate", which supports features that are not yet in our standard product set, like Blocks, Coroutining, and exception handling. I made a fast editing pass to remove any dependencies on these nonstandard library features. I also added a file, Substrate.h, that defines stylistic conventions that I adhere to in all my work. See USE, IMPORT, EXPORT, etc in the sources. .pp The preceeding problems are superficial and easily repaired. The following ones are somewhat more substantial in that they involve design work in addition to coding work. .np The type inferencing machinery infers types of newly-encountered (unknown) messages and variables by seeing how they are combined with variables and messages whose types are known apriori or else determined earlier through inferencing. The only types that are known apriori are literals like 1, 2.3, or 'string'. This generally provides insufficient typing information from which to infer anything useful, so you should generally provide variable rules to pin down types for key instance variables and method arguments You do this with rules that state, in effect, that `the type of the Smalltalk variable named foo is int, and the variable is called foobar in Objective-C'. Presently rules have global scope. If different Smalltalk classes use the name, foo, in ways that should be translated differently, different rule sets must be provided manually to the translator. Creating and managing these application-specific rules sets adds to the translation effort and tends to make rules non-reusable across translations. The rules should be organized with a scoping mechanism, ideally one based on inheritance. .np The inferencing logic is ad-hoc and quite possibly slow. However the main bottleneck seems to be loading the rule-base; translation speed has never been a real problem. Inferencing is presently deductive, and a more inductive scheme based on both forwards and backwards reasoning might produce higher quality translations. In other words, the translation of a given message expression is determined exclusively by whatever information can be inferred about the types of the receiver and arguments to that message (forward reasoning). Backward reasoning would also consider how the results of the expression are used in other expressions. .np Producer does not presently handle non-trivial uses of Blocks correctly; ie. Block expressions that cannot be translated directly into C conditional expressions like if, while, or for, which Producer handles just fine already. Nearly all occurrences of Smalltalk-80 Blocks could be handled without changing the Objective-C language by adding a trivially simple Block class to the library. A named instance variable holds a pointer to a static function and indexed instance variables hold \fIcopies of\fP any variables that the block accesses in the instantiation site\**. This copy could be taken entirely automatically by copying the instantiation site's stack frame. However I prefer to have more control over space than that. So I've been using a scheme that requires the programmer (and someday the compiler) to specify which variables are really accessed by the block as arguments to the message that instantiates the block; like this .(C ... { IMPORT void aStaticFunction(); id var1 = something, var2 = something; aBlock = [Block function:aStaticFunction args:2, var1, var2]; [anyObject do:aBlock]; ... } LOCAL void aStaticFunction(instantiationSiteVariables, value1, value2) struct { id var1, var2; } *instantiationSiteVariables; id value1, value2; { if ([instantiationSiteVariables->var1 someMessage]) ... } .)C .ip The block will call the function when anyObject sends the block one of several evaluation messages (value:arg1 or value:arg1 value:arg2 or ...). The first argument is a \fIpointer\fP to block's copy of the instantiation site's variables. The trailing arguments contain the arguments that the invocation site passed in the value: message. I've used this approach extensively by writing the static functions by hand, and am trying to get our staff to extend the language to provide some kind of language-level support to make the syntax simpler. This approach could be, but has not yet been, taken by Producer. .(f \** In Smalltalk-80, the block seems to have access to the instantiation site's variables, so that the block can change variables in the instantiation site. In Objective-C the block receives a copy of the variables and cannot use them to communicate with the instantiation site. I believe that this is the sole functional difference between the two schemes. .)f .pp The inferencing machinery's primary current virtue is that it can be made to work for selected test cases. It leaves lots to be desired. Call me if you decide to extend it so that I can prevent unnecessary duplication of effort. .H "About the distribution" .pp The top level of the distribution consists of .(C total 88 -rw-r--r-- 1 cox 181 Jun 22 14:32 Makefile -rw-r--r-- 1 cox 26592 Jun 22 14:30 README drwxr-xr-x 2 cox 512 Jun 22 14:19 example -rw-r--r-- 1 cox 166 Jun 16 13:18 log -rw-r--r-- 1 cox 997 Jun 15 11:09 mac.me -rw-r--r-- 1 cox 26751 Jun 15 11:02 producer.me -rw-r--r-- 1 cox 21444 Jun 22 14:29 readme.me drwxr-xr-x 2 cox 512 Jun 12 10:22 rules drwxr-xr-x 2 cox 3072 Jun 22 14:31 src .)C The Makefile governs formatting of the two documents; this README (from readme.me) and the draft of the OOPSLA-87 paper (from Producer.me). The mac.me file contains text formatting macros that are common to both papers; used like this: .(C nroff -me mac.me Producer.me >Producer.f .)C .pp The rules directory contains a single file, generic.ru, that represents a first pass at an application-independent rules base. This set of rules translate Smalltalk to the conventions used in my prototype version of the user interface library. .pp For example, it translates Smalltalk Integer operations to C int operations, and it translates Smalltalk Point operations to C macros that manage points as type PT; a pair of 16-bit coordinates in a 32-bit C int. For example, pt(x,y) invokes a C macro that trims and shifts two ints, x and y, to fit side by side in a 32-bit integer, ptPlus(p,q) invokes a macro that computes the vector sum of two points, p and q, etc. .(C rules: total 35 -rw-r--r-- 1 cox 35567 Jun 12 10:22 generic.ru .)C .pp The src directory contains a fragment from the video animation program that appears at the end of the Smalltalk-80 video tape. BounceInBoxNode.st is the Smalltalk-80 source file, animation.ru contains the application-specific rule set, BounceInBoxNode.m is the translated version built by Producer as invoked by Makefile\**. .(f \** The full source for the animation program is not provided. My copyright paranoia argued against providing even this fragment. .)f .(C example: total 7 -rw-r--r-- 1 cox 1730 Jun 16 10:24 BounceInBoxNode.m -rw-r--r-- 1 cox 868 Jun 16 10:18 BounceInBoxNode.st -rw-r--r-- 1 cox 394 Jun 16 10:20 Makefile -rw-r--r-- 1 cox 2178 Jun 16 10:18 animation.ru -rw-r--r-- 1 cox 185 Jun 16 10:24 log -rw-r--r-- 1 cox 239 Jun 16 10:18 st80.h .)C .pp The log file records the results of the translation session. The syntax error is innocuous, the result of the beforementioned problem in the grammar in handling '!' delimiters. .(C Producer -c ../rules/generic.ru animation.ru BounceInBoxNode.st >BounceInBoxNode.m error 7:BounceInBoxNode.st: tegory:'Graphics-Animation'!! : syntax error *** Error code 1 (ignored) .)C .pp The src directory contains the sources for Producer, with its own Makefile. The Substrate.h header file, which is automatically included by the Producer.h header file, is technically a part of a internal lower level library, Substrate, on which Producer was originally developed. Substrate.h was copied and changed superficially so that Producer compiles correctly without the Substrate library. .(C src: total 70 -rw-r--r-- 1 cox 483 Jun 12 10:21 AbstractTranslation.m -rw-r--r-- 1 cox 282 Jun 12 10:21 ArgumentList.m -rw-r--r-- 1 cox 897 Jun 12 10:21 Block.m -rw-r--r-- 1 cox 143 Jun 12 10:21 CharConstant.m -rw-r--r-- 1 cox 2205 Jun 12 10:21 Class.m -rw-r--r-- 1 cox 630 Jun 12 10:21 Comment.m -rw-r--r-- 1 cox 176 Jun 12 10:21 Constant.m -rw-r--r-- 1 cox 2032 Jun 12 10:21 Expr.m -rw-r--r-- 1 cox 1243 Jun 12 10:21 FunctionTranslation.m -rw-r--r-- 1 cox 1484 Jun 12 10:21 Identifier.m -rw-r--r-- 1 cox 1248 Jun 12 10:21 IdentifierTranslation.m -rw-r--r-- 1 cox 105 Jun 12 10:21 List.m -rw-r--r-- 1 cox 1985 Jun 15 11:55 METHODDECLS.m -rw-r--r-- 1 cox 1384 Jun 15 11:51 Makefile -rw-r--r-- 1 cox 4302 Jun 12 10:21 Method.m -rw-r--r-- 1 cox 3136 Jun 12 10:21 Msg.m -rw-r--r-- 1 cox 583 Jun 12 10:21 MsgArgPattern.m -rw-r--r-- 1 cox 828 Jun 12 10:21 MsgNamePattern.m -rw-r--r-- 1 cox 1280 Jun 12 10:21 MsgTranslation.m -rw-r--r-- 1 cox 775 Jun 12 10:21 MsgTranslator.m -rw-r--r-- 1 cox 1868 Jun 12 10:21 Node.m -rw-r--r-- 1 cox 229 Jun 12 10:21 NumberConstant.m -rw-r--r-- 1 cox 1402 Jun 15 11:27 Producer.h -rw-r--r-- 1 cox 306 Jun 12 10:21 Return.m -rw-r--r-- 1 cox 825 Jun 12 10:21 Scope.m -rw-r--r-- 1 cox 3157 Jun 12 10:21 Selector.m -rw-r--r-- 1 cox 253 Jun 12 10:21 SelectorConstant.m -rw-r--r-- 1 cox 457 Jun 12 10:21 StArray.m -rw-r--r-- 1 cox 492 Jun 12 10:21 Stmt.m -rw-r--r-- 1 cox 381 Jun 12 10:21 StringConstant.m -rw-r--r-- 1 cox 1268 Jun 12 10:21 StringTranslation.m -rw-r--r-- 1 cox 2140 Jun 15 11:38 Substrate.h -rw-r--r-- 1 cox 1405 Jun 15 11:53 Symbol.m -rw-r--r-- 1 cox 452 Jun 12 10:21 Template.m -rw-r--r-- 1 cox 901 Jun 12 10:21 Type.m -rw-r--r-- 1 cox 1800 Jun 12 10:21 design.me -rw-r--r-- 1 cox 3271 Jun 12 10:21 gen.m -rw-r--r-- 1 cox 9007 Jun 12 10:21 gram.y -rw-r--r-- 1 cox 3601 Jun 12 10:21 lex.l -rw-r--r-- 1 cox 2212 Jun 12 10:21 main.m -rw-r--r-- 1 cox 260 Jun 12 10:21 st80.h -rw-r--r-- 1 cox 259 Jun 15 11:59 y.tab.h .)C .pp The files are exactly as I left them nearly a year and a half ago, except for: .np The addition of this README document. An early draft of the OOPSLA-87 paper, sadly prior to Kurt Schmucker's improvements, is in Producer.me. .np One recompilation pass to remove any obvious dependencies on my private Substrate library and to verify that Producer compiles and runs correctly on the standard Foundation library. I tested the changes by verifing that the Makefile in the example directory ran to completion, but this is hardly an ironclad guarantee. .H "Using Producer" .pp Flags controlling the translation process, source files, and rules files are provided on the command line and are processed in the order they appear. The flags are\** .(f \** I'm working from memory about what these flags mean. Some may be nonfunctional: .)f .ip -d: Enable debugging functions (dbg()) scattered throughout the code. Seldom useful. .ip -m: Enables the Objective-C Foundation library message tracing feature. Seldom useful in Producer. .ip -a: Enables the Objective-C Foundation library allocation tracing feature. Seldom useful in Producer. .ip -l: Enables printing of each lexical token as produced by lex. Useful only for debugging lex.l. .ip -g: Enables automatic redirection of each class into a separate file based on the class name parsed from the input file. Automatically puts class Foobar into file Foobar.m. .(q CAREFUL! This puts at risk other files whose name might coincide with a Smalltalk-80 class name! .)q .ip -s: Generate Smalltalk-80 sources in the output file as Objective-C comments (the default). .ip -c: Don't generate Smalltalk-80 sources in the output file. .ip -i: Generate information that was thought at one time to be useful when debugging rules. .ip -M: Send storeOn: to the message rule dictionary just before terminating as a debugging aid. .ip -I: Send storeOn: to the variable rule dictionary just before terminating as a debugging aid. .pp Typically, the generic rules in rules/generic.ru is specified first, then any application-specific rules, then a single Smalltalk-80 source file. Unless -g is set, the translated output appears on stdout. The various creaks, groans and mumbles that can be elicited about the translation process itself appear on stderr. .pp For the syntax for writing new rules, refer to the examples in generic.ru and animation.ru, and if necessary, the rules section of the grammar in gram.y. .pp And good luck! Let me know how you fare... \Rogue\Monster\ else echo "will not over write ./readme.me" fi echo "Finished archive 1 of 5" exit ---- Dieter H. Zebbedies ('dee-ter ayech 'zeb-ed-eez) Zebb-Hoff Mach. Tool's Automated Manufacturing Project Cleveland, OH (USnail): 9535 Clinton Rd, Cleveland, OH 44144 (+216 631 6100) (+216 741-5994) (UUCP): ...{decvax,sun,cbosgd}!cwruecmp!zhmti!dieter (CSNET/ARPA/BITNET): dieter@CWRU.Ewil