Path: utzoo!mnetor!uunet!husc6!cmcl2!brl-adm!umd5!purdue!i.cc.purdue.edu!j.cc.purdue.edu!pur-ee!uiucdcs!uxc.cso.uiuc.edu!ccvaxa!aglew
From: aglew@ccvaxa.UUCP
Newsgroups: comp.lang.c
Subject: Re: Variable function names
Message-ID: <28700022@ccvaxa>
Date: 24 Dec 87 06:01:00 GMT
References: <973@russell.STANFORD.EDU>
Lines: 125
Nf-ID: #R:russell.STANFORD.EDU:973:ccvaxa:28700022:000:4885
Nf-From: ccvaxa.UUCP!aglew    Dec 24 00:01:00 1987


..> Executing data.

There are many architectures where executing data cannot be done 
easily. But, since the loader (getxfile) has to read in data and
execute it at some point, a similar facility should be provided 
to the user (without having to play tricks with temporary files).

In this posting, I discuss why casting an array to a function pointer
is NOT the way, I discuss the main architectural impediments,
and I suggest an interface for converting data to
code that might be portable to a lot of architectures (UNIX would
have to be modified).

Casts are not the way
---------------------

Casts, however, are *NOT* the way. Using a cast to function makes it 
much too tempting to do little bit twiddles, like changing an ADD to
a MULTIPLY, while you are executing the code. Almost *NO* modern 
architectures permit this sort of thing to be done safely, without 
some sort of possibly expensive synchronization of the instruction
prefetch buffer and memory. This would create hell for an optimizing 
compiler.

Casts are also inappropriate because there are many architectures
where I and D are separate. You can't make D into I. You probably can,
however, copy between the two.

Finally, casts are inappropriate because they do not indicate HOW MUCH
data is going to be made into code. Knowing how much is important,
because, as mentioned above, systems with separate I and D, where 
movement is permitted between the two, have to do some sort of
synchronization - and the synchronization may be made more efficient
if the amount of data is known (page flush instead of entire cache flush).


Architectural Impediments to Data->Code
---------------------------------------

As discussed above, duty cycle - many architectures cannot execute 
writes into the instruction stream immediately. Some form of synchronization
must be done so that data written can be made into code.

Instructions and data may be truly in different spaces. However it may
be possible to copy between them.

Entry point registry: Advanced architectures may register entry points
for security reasons.


Examples of Applications That Can Use Data->Code Conversion
-----------------------------------------------------------

Incremental compilers: although these are inherently machine dependent,
 	you can isolate much of the dependence in per-machine files.
	It is obviously desirable to be able to compile without going
	through headstands to read the compiled code from a file.
	(A standard routine "Compile converting string to format obviously
	is useful).

Numerical Work: many large numerical packages actually used to compile
	and load parts of their algorithms for efficiency. Interpretation
	isn't even in the ballpark, and even running compiled code 
	with ifs is too expensive...

Overlay Systems: there are still some systems with small address spaces.


Almost-Portable Interface for Data->Code Conversion
---------------------------------------------------

Completely separate I/D spaces
    Data Type
	Since some systems have truly disjoint I/D spaces, it is necessary
	to have a data type that is "uninitialized code".

	Suggested syntax:
		int f()[SIZE]
	where SIZE is in the same units as sizeof(). This is not to imply
	that code is measured in bytes; it is just to facilitate the 
	description of sizes 

	Providing a prototype at declaration time may be appropriate for
	architectures that do entry point control.

    Dynamic Allocation
		funcptr = codealloc(size);
	This loses in that C doesn't have a "mode" type; but, it'll handle
	most architectures.

    Movement Between Data and Code Spaces
		codecpy( (char *)frombuf, (int ()*)tofunc, SIZE)
	(i) Is legal only to correctly sized function buffers. Otherwise
	    undefined.
	(ii) Gives you a locus for doing all the sorts of synchronization
	     that your architecture requires.
	(iii) Identifies tofunc as an entry point to the machine.
	And, obviously, you would want a vice versa function.

    Simple interface
	The above lets you explicitly manage the code address space at a 
	high level. A simpler interface might be:

		funcptr = mkexecutable((char*)buf,size)

	where you basically say that it is not safe to modify buf while
	funcptr may be executing.
		In this case, it would be possible for funcptr to be
        simply a cast, but mkexecutable() might very well manage the I
	address space, allocate, and copy, returning a pointer to the
	newly allocated space.

I think that an interface like this would be portable to many systems.


Andy "Krazy" Glew. Gould CSD-Urbana.    1101 E. University, Urbana, IL 61801   
aglew@mycroft.gould.com    ihnp4!uiucdcs!ccvaxa!aglew    aglew@gswd-vms.arpa
   
My opinions are my own, and are not the opinions of my employer, or any
other organisation. I indicate my company only so that the reader may
account for any possible bias I may have towards our products.