Xref: utzoo soc.culture.japan:5441 comp.sys.super:237 Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!ncar!noao!arizona!rick From: rick@cs.arizona.edu (Rick Schlichting) Newsgroups: soc.culture.japan,comp.sys.super Subject: Kahaner report -- 4th ISR Supercomputing Workshop, Hakone Japan. Message-ID: <102@saguaro.cs.arizona.edu> Date: 2 Oct 90 03:54:07 GMT Followup-To: soc.culture.japan Organization: U of Arizona CS Dept, Tucson Lines: 404 [Dr. David Kahaner is a numerical analyst visiting Japan for two-years under the auspices of the Office of Naval Research-Far East (ONRFE). The following is the professional opinion of David Kahaner and in no way has the blessing of the US Government or any agency of it. All information is dated and of limited life time. This disclaimer should be noted on ANY attribution.] To: Distribution >From: David K. Kahaner ONRFE [kahaner@xroads.cc.u-tokyo.ac.jp] Tony F. Chan UCLA [chan@math.ucla.edu] Re: 4th ISR Supercomputing Workshop 29-31 August 1990, Hakone, Japan. Date: 27 Sept 1990 ABSTRACT. This report describes the 4th ISR Supercomputing Workshop: The Road to Parallel Applications, held from August 29 to 31, 1990 in Hakone, Japan. In addition, some observations on the trends and characteristics of parallel supercomputing research in Japan are presented. Most of the text of this report was prepared by Professor T. F. Chan Dept. of Mathematics, Univ. of Calif. at Los Angeles, CA 90024. In some places I have inserted references to earlier reports of mine (DKK) when these supplement Chan's comments. Chan's travel expenses were supported by ISR and some local expenses were supported by my office, ONRFE. INTRODUCTION. The Institute for Supercomputing Research (ISR) is a private non-profit research institute established in 1987 to "conduct research on issues in supercomputing and parallel processing, ... , and to strengthen ties with universities and research centers in Japan". It is funded by the Recruit Corporation, which is a multi-billion dollar company in Japan whose main business is in recruiting college graduates for the major corporations but it also has a division which sells computer services. The director is Dr. Raul Mendez, who has a Ph.D. from U.C. Berkeley under Alexander Chorin and who is well-known for some of the earliest benchmark tests on the Japanese supercomputers in the early 80's. The ISR has been organizing a series of annual workshops on various topics in supercomputing. Typically, both Japanese and US researchers are invited. Last summer it was held in Hawaii and this year the venue was Hakone, a resort about 2 hours from Tokyo, famous for its hotsprings and the view of Mt. Fuji. There were about 40 registered participants, mostly Japanese, with three speakers from the US: Olof Lubeck of Los Alamos, John Levesque of Pacific-Sierra Research and myself. There were 13 talks total and a panel discussion on "The future and evolution of scientific computing". A program is attached and an informal proceedings was available at the conference. The atmosphere was relaxed but intimate, and there were many lively discussions both during and after the formal lectures. LECTURES. Four main themes of the conference can be identified: parallel algorithms (with emphasis on PDEs), hardware (both general and special purpose) for scientific computing, dataflow, and computing environments (languages, networks, programming tools). This reflects the organizers' attempt to cover the main issues in parallel supercomputing and it mostly succeeded because there were many discussions during the workshop on how these areas should interact. Algorithms. The numerical solution of partial differential equations (PDEs) represents a major demand for supercomputing resources and they are widely employed in many areas of science and engineering, as a result of the fundamental fact that most physical laws are expressed as PDEs mathematically. It therefore makes sense to look at some of the basic PDE algorithms more carefully, especially in view of the advent of parallel computing. Several speakers addressed this issue. Prof. Toshio Kawai of Keio University tried to convince the audience that nature is the best parallel supercomputer and it also provides a very powerful class of algorithms for these machines. He calls these "natural algorithms" -- namely explicit in time algorithms which are based on local interactions in space. He has produced a programming system called DISTRAN (written in PROLOG and publicly available), an ELLPACK-like system which allows the user to easily specify the PDE and obtain reliable results quickly. (See also my report 11 April 1990 in which this topic is also mentioned. At that time I thought the idea was too good to be true. Perhaps someone can request the program and perform a critical evaluation. DKK) On the other hand, Chan's talk tried to argue that the most appropriate class of algorithms for massively parallel computers are hierarchical (multilevel) ones. He based his arguments on the observation that many problems in nature are hierarchical in nature (e.g. having many different scales in time and space) and therefore the most efficient algorithms require some of form of global communication. Hierarchical algorithms are a reasonable compromise between explicit algorithms, which are high parallelizable but slowly convergent, and fully implicit algorithms, which are fast convergent but difficult to parallelize. Besides they can be implemented efficiently on hierarchical parallel computers, such as the CM-2, the hypercubes and clustered hierarchical shared memory systems. Very often, existing algorithms for a particular problem are not naturally parallelizable and one has to devise novel parallel algorithms. Prof. Yoshizo Takahashi of Tokushima University presented several such algorithms for a automated wire-routing problem specifically adapted to the Coral parallel computer, a binary tree distributed memory MIMD machine based on the MC68000 chip. These algorithms are particularly interesting because they are true MIMD algorithms for a realistic unstructured problem running on a real parallel machine and they outperform the best commericial software running on a SUN 3/260. A central issue in the design of parallel algorithms for MIMD computers is how to map the data into the processors so as to minimize data communication. George Abe of ISR presented results on comparing a ring mapping to a 2D mapping for a semiconductor device modelling problem on the iPSC/1. Comparisons with similar results on an Alliant FX/8-4 are also given. He concluded that in two dimensions the difference in performance for the two mappings can be large, with the two dimensional mapping being more efficient. Hardware. With the advent of multiprocessor systems with a relatively large number of off-the-shelf inexpensive processors, it has become increasingly easy and cost-effective to build special purpose hardware for special applications, as an alternative to conventional mainframe general purpose supercomputers. Prof. Yoshio Oyanagi of the University of Tsukuba calls these "multi-purpose" computers. Japan, long recognized for its manufacturing prowness especially in electronics and computers, is primed for following this approach. Physics seems to be the primary field for which special purpose computers have been built. Three machines of this kind were discussed at the conference. The first is QCDPAX which is for QCD lattice simulations. Apparently, the world-wide physics community has recognized the potential of parallel computing and several countries (including Italy, USA and Japan) have initialized projects to build special purpose hardware for this application. QCDPAX is a MIMD machine with 432 processing units, connected through a 2D nearest neighbor grid and a common bus. Each processing element consists of a 32 bit microprocessor MC68020, a floating point chip L64133 and an LSI for vector operation, 2 MB of fast memory and 4MB of slow memory. Measured peak performance is 12.25 Gflops. For matrix vector multiplies, 5 Gflops is attainable. For the QCD problem, a preconditioned conjugate gradient method is used. The project was funded at a level of about two million US dollars for the FY87 to FY89. A commerical product is now being marketed by the Anritsu Corporation (model DSV 6450, 4 sold). (See also reports on PAX and Anritsu, 11, 12 April 1990, and 28 April 1990, DKK). Another special purpose machine discussed (by J. Makino of the Dept. of Earth Sciences and Astronomy of the Univ. of Tokyo and ISR) is the GRAPE- 1 (GRAvitational PipE) developed at the University of Tokyo for gravitational N-body problems. It is not really a computer in the usual sense because it is not programmable but instead is viewed as a backend computational processor for performing only the N-body force computations. Effective performance of 120 Mflops has been achieved. The high performance derives from the use of three arithmetic pipelines corresponding to the three spatial co-ordinates. An interesting feature is the use of variable precision: 8 bits for force calculations, 16 bits for positional data, and 48 bits for force additions. A General Purpose Interface Bus (GPIB) connects the GRAPE-1 with the host (a Sony workstation). This project is most impressive in its speed of completion. The design started in March 89, the hardware was ready by September 89 and production runs began at the same time. A follow-up GRAPE-2 project is now in progress, with parallel pipelines, and improved precisions (64/32 bits). Makino estimates that a 50 board, 15 Gflops system can be built for US $100,000 and a 500 board, 150 Gflops system for US $300,000. A GRAPE-3 system is also under design. Following Makino, Junichi Ebisuzaki (Dept. of Earth Sciences and Astronomy of the Univ. of Tokyo) talked about adapting other many body simulations for the GRAPE system. The basic modification needed is to accomodate the different forms of the force law. He discussed applications in plasma physics and molecular dynamics. Prof. Nobuyasu Ito of the Department of Physics at the University of Tokyo gave a seemingly exciting and entertaining talk (judged only from the reaction of the audience, since it was given in Japanese!), in which he described the m-TIS (Mega spin per second University of Tokyo Ising Spin) computer for simulating the many body problem arising from Ising systems. A successor m-TISII system has also been built. Lest you think the Japanese supercomputer field is only producing special purpose hardware, rest assured that the really big boys have also been doing their homeworks. Akihiro Iwaya of NEC described the NEC SX-3 computer, which was widely reported in the US press as the fastest general purpose supercomputer today. He reported that the performance ranges from 0.68 to 22 Gflop, depending on the particular computation performed. The machine has a SIMD architecture (which he estimated is sufficient to handle more than 80% of all applications), with shared memory (because "FORTRAN is based on shared memory") and up to four processors (he estimated that 16-32 such processors is within practical limits) each with multiple pipelined arithmetic processors. He also discussed several system issues such as synchronization primitives, ParallelDo and ParallelCase statements, and micro/macro-tasking. All in all a very Cray-like machine with blazingly fast peak performance. (See also reports on SX-3 25 April 1990, and 19 Sept 1990, DKK.) Finally, Shin Hashimoto of Fijitsu described the High Speed Parallel Processor (HPP), which has been developed under a joint project between MITI and six computer companies (including Fujitsu, NEC and Hitachi) from 1981 to 1990. The main idea is to connect several conventional supercomputers (e.g. Fujitsu VP2000) via a Common Storage Unit (CSU) and a Large High-Speed Storage (LHS). The data transfer rate between the HPP and the LHS is 1.5 Gbytes/sec. The peak performance is over 10 Gflops. It comes with its own parallel language Phil, which has the usual parallel-do and lock and barrier statements, and a very user-friendly programming environment with execution viewers, cost analyzer and a parallel verifier. Surprisingly, there has been no plan yet for turning it into a commercial product. (See report of the highspeed project, 3 July 1990, DKK.) Dataflow. One of the most difficult tasks in designing parallel programming systems is the automatic detection and extraction of parallelism in programs. The dataflow approach has long been advocated as one model for achieving this goal and in a fundamental way it is very attractive because it looks at the basic level of computation. While the dataflow approach has not yet been demonstrated to be competitive in practice (practical dataflow machines are not exactly prolifilating at this moment), we should aim for the ideal nontheless, as Olaf Lubeck of the Computing Divison at the Los Alamos National Laboratory implored us to do in his talk. He has been working closely with both the group led by Arvind at MIT and the SIGMA-1 group at Electrotechnical Laboratory (ETL) of Japan. He claims that the main advantages of dataflow is that it produces deterministic computations and it extracts maximum parallelism. In addition to some general comments about dataflow, he also discussed a more technical problem concerning how to "throttle" loop activations so that loops statements do not generate a big demand on system resources (i.e. memory) in the early iterations in a dataflow model. (See also reports on ETL projects, 2 July 1990, 16 August 1990, DKK). Toshio Sekiguchi, also from ETL, described his efforts in designing the parallel dataflow language DFC II for the SIGMA-1 dataflow computer currently being developed. The SIGMA-1 is an instruction-level dataflow machine, with 128 processing elements, 640 MIPS, 427 MFlops and 330 Mbytes of memory. DFC II is C based (functional langugages were deliberately not chosen because they want the language to be useful "for practical problems") and allows synchronization, global variables and, of course, automatic detection of parallelism. The motto is: "sequential description, parallel execution". Applications that have been run include QCD, PIC, Keno and LINPACK. Environment. It is wide recognized that one of the potential stumbling blocks on the road to the utopia of parallel computing for the masses is that parallel programming is an order of magnitude more difficult than vector programming, not to mention sequential programming. Without user- friendly and yet powerful programming enviroments, parallel computing may never reach the promised land. One of the main themes of the workshop is on environments. John Levesque of the Pacific Sierra Research Corp. (PSR) was the main speaker on this issue. John is one of the leaders in this field and he had just published a book on optimization techniques for supercomputers. He described the philosophy behind the FORGE and MIMDizer systems that have been developed at PSR. FORGE is an integrated environment consisting of program development modules, static and dynamic performance monitors, sequential and parallel debugging, memory mapping modules, automatic optimization and a menu driven interface. John stressed the importance of building a database of information about the program and collecting both static and runtime statistics in order to optimize performance. MINDizer is a brand new system scheduled to be delivered this October. As the name suggests, it is designed for easing the porting of programs to distributed memory MIMD machines. The key idea is "array decomposition", i.e. the user specifies the mapping of data arrays and MIMDizer handles automatically all communication interfaces. This appears to be a very practical approach between automatic parallel compilers and explicit data mapping and message passing by the user. Anyone of us who uses electronic mail realizes the importance of networks. But networks can also play a critical road in the computing environment for supercomputing in the near future, according to Raul Mendez in his banquet talk. His dream is "supercomputing from a laptop" --- and the way to achieve that is through networks. He discussed the existing networks in the US and Europe, as well as the several networks being developed in Japan and over the Pacific. PANEL DISCUSSION. The most lively discussions of the whole workshop occurred during the panel discussion, which should come as no surprise when one considers that the theme was: "The Future and Evolution of Scientific Computing", obviously a subject matter very dear to every participants' heart. The panelists were: Genki Yagawa (Dept. of Nuclear Eng., Univ. of Tokyo), Katsunobu Nishihara (Inst. of Laser Eng., Osaka Univ.), Kida (Kyoto Univ.), D. Sugimoto (Univ. of Tokyo), and four of the speakers: Lubeck, LeVeque, Chan, and Oyanagi. Mendez led off with the three main topics for discussion: 1. What will computational requirements be like in the next decade? 2. What is the outlook for SIMD and MIMD architectures? Shared versus distributed memory? 3. What other trends will come to play a significant role: dedicated machines, dataflow architectures, micropressors, etc.? Concerning Question 1 above, it is clear from the discussions that everyone thinks that there is no forseeable upper bound to the computational requirements for supercomputers; in fact the demand is limited by the current supercomputers at any one moment in time. Even with a teraflop machine, practical engineering computations (100^3 grids, with 3 variables for point) could still require one hour of CPU time. And it will require enormous amount of memory. In fact, the cost of memory may be a major barrier to building a teraflop machine: assuming a scaling law of 1 Mbytes per 1 Mflops, a teraflop machine will require about 20 billion dollars today just for the memory! Developments in algorithm design will also have to follow the pace of hardware and architectural advances (as it has been throughout the history of computing). Concerning Question 2, some interesting consensus emerged. While some panelists think that the SIMD architecture is sufficient for many problems (e.g. QCD), many personally prefer MIMD machines for their flexibility. The most likely trend will be hybrid (or cluster, hierarchical) architectures, with MIMD at the higher levels and SIMD at the lower levels. Concerning memory architecture (shared or distributed), many believe that hiding the storage structure of data will undoubtedly lead to performance degradation and therefore some user input is essential. No one believes we'll see automatic and efficient compilers for parallel machines in the forseeable future. Concerning Question 3, our representative from the dataflow camp (Lubeck) said that ignoring dataflow will be settling for second best and we should be "going for the gold", even though that may take some time. Someone pointed out also that while current research has primarily focused on the solution techniques, other aspects of the scientific computing process, such as mesh generation and visualization, will be playing a more important role in the future. And finally, while parallel machines are much more difficult to use than vector machines, users are willing to plunge in when given sufficient incentive (e.g. cost effectiveness of the CM-2). OBSERVATIONS (Chan). As someone who works on parallel algorithms, the most obvious thing was the small number of talks on this topic. I realize that this could be just a feature of this particular workshop, but in general I have not been aware of an active research community in parallel algorithms development in Japan. On the other hand, the hardware development in Japan has been truly impressive, both in terms of raw power and the speed and low cost at which special purpose machines are built. However, I did not see much in architectural innovations, and most of the designs follow trends already established in the industry. During the banquet, I was informed by a Fujitsu engineer that the company is building Japan's first commercial distributed memory MIMD machine --- from the terse description it resembles the several US hypercubes (1K processors, SPARC chip, grid connection topology and "wormhole" routing.) Another observation that I made was that many of the talks were based on work by interdisciplinary teams, consisting of physical scientists who have real problems to solve and hardware and software computer designers. In fact, Japanese physicists seem to play a very active role in parallel computing --- all the special machines mentioned were built for physics problems. Even though there were several academic engineers on the panel, I could not tell how big an influence they have had in this field in Japan. Overall, attending this workshop was a very pleasant experience for me. I met many interesting people (and everyone was very friendly and open) and my hosts Raul Mendez and Chris Eoyang were most gracious. I just wished my knowledge of Japanese was better than just reading of Kanji so I could understand all the jokes during the few talks delivered in Japanese! (The observations above are Chan's. Nevertheless they mostly echo my own feelings and I have often made similar remarks in my reports. In fact, readers should note that many of the presentations describe work very close to that published or presented elsewhere. However, I do not agree entirely with the comment about architectural innovation. There are only a few really different computer organizations. Innovation (as opposed to revelation) comes from figuring out how to design so that all the pieces work harmoniously. The Japanese researchers seem at least as capable as those in the west in finding methods to do this. DKK) PROGRAM: 4th ISR Supercomputing Workshop Raul Mendez (ISR) Opening Remarks Toshio Kawai (Keio University) "Standard Solutionf to Partial Differential Equations on Supercomputers" Yoshizo Takahashi (Tokashima University) "Parallel Automated Wire-Routing With a Number of Cometing Processors" Geroge Abe (ISR) "Partial Differential Equation Solvers and Architectures for Parallel Scientific Computing" Toshio Sekiguchi (Electrotechnical Laboratory) "The Design of the Practical Language DPCII and its Data Structures" Olaf Lubeck (Los Alamos National Laboratory) "Resource Management in Dataflow: A Case Study Using Two Numerical Applications" Yoshio Oyanagi (University of Tsukuba) "QCD Lattice Simulations With the QCDPAX" Daiichiro Sugimoto & Junichi Ebisuzaki (University of Tokyo) "Project GRAPE and The Development of a Specialized Computer for the N-body Problem" Nobuyasu Ito (University of Tokyo) "A Trial to Break Through the Many Body Problem With a Computer" Panel: G. Yagawa, K. Nishihara, D. Sugimoto, Y. Oyanagi, O. Lubeck, J. Levesque, R. Mendez (moderator) Akihiro Iwaya (NEC Corp) "Parallel Processing on the NECSX-3 Supercomputer" Shin Hashimoto (Fujitsu Ltd) "Parallel Application Development on the HPP" John Levesque (Pacific Sierra Research) "An Advanced Programming Environment" Raul Mendez (ISR) Closing Remarks ----------------------END REPORT-----------------------------------------