Path: utzoo!mnetor!uunet!husc6!cmcl2!brl-adm!umd5!purdue!i.cc.purdue.edu!j.cc.purdue.edu!pur-ee!uiucdcs!uxc.cso.uiuc.edu!ccvaxa!aglew From: aglew@ccvaxa.UUCP Newsgroups: comp.lang.c Subject: Re: Variable function names Message-ID: <28700022@ccvaxa> Date: 24 Dec 87 06:01:00 GMT References: <973@russell.STANFORD.EDU> Lines: 125 Nf-ID: #R:russell.STANFORD.EDU:973:ccvaxa:28700022:000:4885 Nf-From: ccvaxa.UUCP!aglew Dec 24 00:01:00 1987 ..> Executing data. There are many architectures where executing data cannot be done easily. But, since the loader (getxfile) has to read in data and execute it at some point, a similar facility should be provided to the user (without having to play tricks with temporary files). In this posting, I discuss why casting an array to a function pointer is NOT the way, I discuss the main architectural impediments, and I suggest an interface for converting data to code that might be portable to a lot of architectures (UNIX would have to be modified). Casts are not the way --------------------- Casts, however, are *NOT* the way. Using a cast to function makes it much too tempting to do little bit twiddles, like changing an ADD to a MULTIPLY, while you are executing the code. Almost *NO* modern architectures permit this sort of thing to be done safely, without some sort of possibly expensive synchronization of the instruction prefetch buffer and memory. This would create hell for an optimizing compiler. Casts are also inappropriate because there are many architectures where I and D are separate. You can't make D into I. You probably can, however, copy between the two. Finally, casts are inappropriate because they do not indicate HOW MUCH data is going to be made into code. Knowing how much is important, because, as mentioned above, systems with separate I and D, where movement is permitted between the two, have to do some sort of synchronization - and the synchronization may be made more efficient if the amount of data is known (page flush instead of entire cache flush). Architectural Impediments to Data->Code --------------------------------------- As discussed above, duty cycle - many architectures cannot execute writes into the instruction stream immediately. Some form of synchronization must be done so that data written can be made into code. Instructions and data may be truly in different spaces. However it may be possible to copy between them. Entry point registry: Advanced architectures may register entry points for security reasons. Examples of Applications That Can Use Data->Code Conversion ----------------------------------------------------------- Incremental compilers: although these are inherently machine dependent, you can isolate much of the dependence in per-machine files. It is obviously desirable to be able to compile without going through headstands to read the compiled code from a file. (A standard routine "Compile converting string to format obviously is useful). Numerical Work: many large numerical packages actually used to compile and load parts of their algorithms for efficiency. Interpretation isn't even in the ballpark, and even running compiled code with ifs is too expensive... Overlay Systems: there are still some systems with small address spaces. Almost-Portable Interface for Data->Code Conversion --------------------------------------------------- Completely separate I/D spaces Data Type Since some systems have truly disjoint I/D spaces, it is necessary to have a data type that is "uninitialized code". Suggested syntax: int f()[SIZE] where SIZE is in the same units as sizeof(). This is not to imply that code is measured in bytes; it is just to facilitate the description of sizes Providing a prototype at declaration time may be appropriate for architectures that do entry point control. Dynamic Allocation funcptr = codealloc(size); This loses in that C doesn't have a "mode" type; but, it'll handle most architectures. Movement Between Data and Code Spaces codecpy( (char *)frombuf, (int ()*)tofunc, SIZE) (i) Is legal only to correctly sized function buffers. Otherwise undefined. (ii) Gives you a locus for doing all the sorts of synchronization that your architecture requires. (iii) Identifies tofunc as an entry point to the machine. And, obviously, you would want a vice versa function. Simple interface The above lets you explicitly manage the code address space at a high level. A simpler interface might be: funcptr = mkexecutable((char*)buf,size) where you basically say that it is not safe to modify buf while funcptr may be executing. In this case, it would be possible for funcptr to be simply a cast, but mkexecutable() might very well manage the I address space, allocate, and copy, returning a pointer to the newly allocated space. I think that an interface like this would be portable to many systems. Andy "Krazy" Glew. Gould CSD-Urbana. 1101 E. University, Urbana, IL 61801 aglew@mycroft.gould.com ihnp4!uiucdcs!ccvaxa!aglew aglew@gswd-vms.arpa My opinions are my own, and are not the opinions of my employer, or any other organisation. I indicate my company only so that the reader may account for any possible bias I may have towards our products.