Path: utzoo!utgpu!water!watmath!clyde!att!pacbell!lll-tis!mordor!joyce!sri-unix!quintus!ok From: ok@quintus Newsgroups: comp.lang.prolog Subject: Parallel: lists or trees? Message-ID: <185@quintus.UUCP> Date: 21 Jul 88 22:57:16 GMT Sender: news@quintus.UUCP Reply-To: ok@quintus () Organization: Quintus Computer Systems, Inc. Lines: 98 I was looking at some FCP code recently, and I was irritated by the way it was using lists all over the place. If you have a collection of N elements represented as a list, it is going to take you N reductions to visit them all, do what you will. Of course, the N visits can then proceed in parallel, but it takes so long to get started. So I decided to try a small experiment. The task isn't very interesting: it is simply to make a collection of the integers from 1 to N and determine how many of them are divisible by 3. Here's the version using lists: three_list(N, Count) :- make_list(N, Numbers), remainder_list(Numbers?, Remainders), zero_count_list(Remainders?, 0, Count). make_list(0, []). make_list(N, [N|Rest]) :- N > 0, M := N-1 | make_list(M, Rest). remainder_list([], []). remainder_list([N|Ns], [R|Rs]) :- R := N \ 3 | remainder_list(Ns?, Rs). zero_count_list([], C, C). zero_count_list([R|Rs], C0, C) :- C1 := (3-R)/3 + C0 | zero_count_list(Rs?, C1, C). The make_list, remainder_list, and zero_count list processes run in parallel, but make_list is sequential because it is counting N down, zero_count_list is sequential because it is adding the 1s (R=0) and 0s (R!=0) up in order, and remainder_list is sequential because it is waiting for make_list. For the other representation, I used trees of the form {SingleItem} or [LeftSon | RightSon] as this seemed to minimise the space requirements. I surmise that a collection represented this way takes about twice the space of a collection represented as a list. The code is: three_tree(N, Count) :- make_tree(1, N, Numbers), remainder_tree(Numbers?, Remainders), zero_count_tree(Remainders?, Count). make_tree(N, N, {N}). make_tree(L, U, [LM|NU]) :- L < U, M := (L+U)/2, N := (L+U)/2 + 1 | make_tree(L, M, LM), make_tree(N, U, NU). remainder_tree({N}, {R}) :- R := N \ 3 | true. remainder_tree([N1|N2], [R1|R2]) :- remainder_tree(N1?, R1), remainder_tree(N2?, R2). zero_count_tree({R}, C) :- C := (3-R)/3 | true. zero_count_tree([T1|T2], C) :- zero_count_tree(T1?, C1), zero_count_tree(T2?, C2), plus(C1?, C2?, C). In this scheme, the depth between the top level call of make_tree/3 and one of its "leaves" is only lg(N) rather than N, e.g. the generation of 1..N/2 can proceed in parallel with the generation of N/2+1..N. The next step was to actually measure these things. Fortunately, we have a(n old) copy of Logix. I used the "time" and "rcp" builtins to make the measurements, and compiled the code in mode(failsafe). three_list(300,_) cputime = 0.860 sec, reductions = 1201, creations = 74 reductions = 993, cycles = 30, reductions_per_cycle = 33 three_tree(300,_) cputime = 1.500 sec, reductions = 2355, creations = 1266 reductions = 2186, cycles = 27, reductions_per_cycle = 80 three_list(1000,_) cputime = 2.160 sec, reductions = 3108, creations = 19 reductions = 3093, cycles = 57, reductions_per_cycle = 54 three_tree(1000,_) cputime = 4.280 sec, reductions = 7101, reductions = 7086, cycles = 28, reductions_per_cycle = 253 The tree version runs slower on a uniprocessor. Oddly enough, the time ratio is about the same as my estimate of the space ratio for the data structures, and the same ratio is obtained in plain Prolog. Since the tree version appears to have more parallelism available, it would be very interesting to run this code on a multiprocessor Logix. (By the way, I am only a beginner with FCP, and would welcome suggestions for improving my code.) -- quintus!ok@SUN.COM; The proper study of man is *everything* -- C.S.Lewis