Xref: utzoo comp.lang.eiffel:327 comp.lang.c++:4069 comp.lang.misc:3102 Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!csd4.milw.wisc.edu!lll-winken!uunet!mcvax!kth!draken!tut!santra!hila.hut.fi!enuutila From: enuutila@hila.hut.fi (Esko Nuutila) Newsgroups: comp.lang.eiffel,comp.lang.c++,comp.lang.misc Subject: Re: Record layout for multiple inheritance Summary: A packing mechanism for the dope vectors Keywords: multiple inheritance Message-ID: <23877@santra.UUCP> Date: 21 Jul 89 11:39:46 GMT References: <18498@mimsy.UUCP> <2129@randvax.UUCP> Sender: news@santra.UUCP Reply-To: enuutila@hila.hut.fi (Esko Nuutila) Organization: Helsinki University of Technology, Finland Lines: 172 In article <2129@randvax.UUCP> florman@rand-unix.UUCP (Bruce Florman) writes: [The description of the record layout mechanism deleted] > As I said before, I don't know for sure what techniques are used in >Eiffel or C++. This is how I've done it. If anyone sees any glaring >inefficiencies in this system, I'd really like to hear about them. One >obvious place to look for improvement is in the storage of the dope >vectors. When the inheritance hierarchy is very wide but shallow, the >dope vectors will have a whole lot of empty space in them. I imagine >that someone can find a clever scheme to pack them into a smaller space. >Any suggestions? Here is my proposal. The basic assumption here is that the empty positions in the dope vectors are not needed in any way. They might be used, for example, for run-time type checking, and in that case the following packing mechanims cannot be used. I am going to explain the mechanism by giving an example. Let us have classes A, B, C, D, E, F. class A {var a1, a2, a3: integer} class B {var b1, b2: integer} class C(A, B) {var c1, c2, c3, c4: integer} class D(C) {} class E {var e1: integer} class F(A, E) {var f1, f2: integer} The run-time object layouts are (this is one possibility): instance of A: [A, a1, a2, a3] instance of B: [B, b1, b2] instance of C: [C, a1, a2, a3, b1, b2, c1, c2, c3, c4] instance of D: [D, a1, a2, a3, b1, b2, c1, c2, c3, c4] instance of E: [E, e1] instance of F: [F, a1, a2, a3, e1, f1, f2] (In position 0, there is the class identifier) The dope vectors are: A B C D E F A 1 - - - - - B - 1 - - - - C 1 4 6 - - - D 1 4 6 10 - - E - - - - 1 - F 1 - - - 4 5 First, we find out that the D column is unnecessary. Class D has no local variable declarations, thus they cannot be accessed. All columns corresponding to a class with no local variable declarations can be deleted. The result is here: The dope vectors are: A B C E F A 1 - - - - B - 1 - - - C 1 4 6 - - D 1 4 6 - - E - - - 1 - F 1 - - 4 5 This saves some space, but we can do better. First, we notice that those positions in dope vectors that are empty are never going to be accessed. Then we forget the implicit assumption that we should have different dope vectors for different classes and try to merge the dope vectors into a single dope vector that contains all offsets. In our case this goes as follows: A 1 - - - - B - 1 - - - C 1 4 6 - - D 1 4 6 - - E - - - 1 - F 1 - - 4 5 Result 1 4 6 4 5 The class descriptor of class X then contains an offset to the starting point of the dope vector for class X. In our case the offsets would be A : 0, B : -1, C : 0, D : 0, E : -3, and F : 0. In order to achieve a faster access it would be better to store a memory address in the class descriptor instead of the offset. In our case the addresses would be A : i, B : i-1, C : i, D : i, E : i-3, and F : i, where i is the address of the merged dope vector. The fields are accessed in the same way as in the original mechanism, only the dope vector access is different. Computing an optimal merge is very likely an NP-complete problem. However, a simple algorithm that merges the dope vectors one after another with the result vector works well. When merging a dope vector with the result vector, we first find out the left-most position of the dope vector that contains an offset. Then we match the dope vector with the result vector so that the left-most position of the dope vector that contains an offset is compared with the left-most position of the result vector etc. Comparison is successful if at every position the result vector and the dope vector have the same offset, or one of them contains no offset at that position. If the comparison succeeds, the dope vector is merged with the result vector yielding a new result vector. Else we shift the dope vector one position to the right and try again. For example, if the result vector is currently 1 3 - 4 - 2 and the next dope vector is - 1 4 - - 5 the first comparison is 1 3 - 4 - 2 - 1 4 - - 5 No match, we shift dope vector to the right: 1 3 - 4 - 2 - 1 4 - - 5 No match, shift right 1 3 - 4 - 2 - 1 4 - - 5 Match, the new result vector is 1 3 1 4 - 2 5 What about the savings? In the example problem the unpacked dope vectors take 6*6=36 positions. After merging, only 5 positions are needed, that is about 14%. One might wonder whether this is only a well chosen example. To get a more realistic picture about the savings I implemented a Lisp program in the Symbolics that computes the dope vectors for Flavors and merges them. As a test input data I used sys:*all-flavor-names*, i.e. a list that contains all flavors that are defined in the system. In our case (length sys:*all-flavor-names*)=2680. 1504 flavors contained no instance variables. With no packing the dope vectors would have taken 7182400 positions. The result vector of the merge takes 4027 positions (0.056%)! I have also tried with smaller examples, e.g., all flavors in package TCP-IP and their component flavors. There were 215 flavors, and the result vector size was 477 = 1% of the unpacked size. Bruce Florman told me that the actual implementation of his system utilizes the fact that the "dope vector matrix" is triangular (see the example above) and does not allocate space for the dope vectors beyond the main diagonal. So the savings compared to his system are slightly different. For example, in the sys:*all-flavor-names* example the figures would be 3592540 instead of 7182400 compared to 4027, i.e. 0.112% instead of 0.056% of the unpacked size. Similar techniques may have been used in compilers for multiple inheritance languages, but I don't have any references. I got the article by Krogdahl in Bit vol 25. 1985, that was mentioned in comp.lang.misc, but the scheme used there was different. He claims that he has a solution to the problem that takes no extra space, but it is described in a technical report that I have not found yet. > Bruce Florman > Associate Computer Scientist > The RAND Corporation > florman@rand.org Esko Nuutila Helsinki University of Technology Department of Computer Science Internet: enuutila@hila.hut.fi Internet: enuutila@hila.hut.fi