Xref: utzoo soc.culture.japan:5844 comp.sys.super:262 Path: utzoo!attcan!uunet!wuarchive!zaphod.mps.ohio-state.edu!ncar!noao!arizona!rick From: rick@cs.arizona.edu (Rick Schlichting) Newsgroups: soc.culture.japan,comp.sys.super Subject: Kahaner Report: Parallel Computing in Japan (Part 4) Message-ID: <121@saguaro.cs.arizona.edu> Date: 6 Nov 90 01:58:46 GMT Followup-To: soc.culture.japan Organization: U of Arizona CS Dept, Tucson Lines: 464 [Dr. David Kahaner is a numerical analyst visiting Japan for two-years under the auspices of the Office of Naval Research-Far East (ONRFE). The following is the professional opinion of David Kahaner and in no way has the blessing of the US Government or any agency of it. All information is dated and of limited life time. This disclaimer should be noted on ANY attribution.] [Copies of previous reports written by Kahaner can be obtained from host cs.arizona.edu using anonymous FTP.] To: Distribution From: David Kahaner ONRFE [kahaner@xroads.cc.u-tokyo.ac.jp] H.T. Kung CMU [ht.kung@cs.cmu.edu] Re: Aspects of Parallel Computing Research in Japan---Kyushu & Tsukuba Univ., ETL, Sanyo, New Info Proc Technology project. Date: 6 Nov 1990 ABSTRACT. Some aspects of parallel computing research in Japan are analyzed, based on authors' visits to a number of Japanese universities and industrial laboratories in October 1990. This portion of the report deals with parallel computing at Kyushu and Tsukuba Universities, Electrotechnical Laboratory, Sanyo Electric, and the New Information Processing Technology project. PART 4. The following outline describes the topics that are discussed in the various parts of this report. PART 1 OUTLINE------------------------------------------------------------- INTRODUCTION SUMMARY RECOMMENDATIONS PART 2 OUTLINE------------------------------------------------------------- FUJITSU OVERVIEW Company profile and computer R&D activities VP2000 series supercomputer organization and performance PARALLEL PROCESSING ACTIVITIES SP (Logic Simulation Engine) AP1000 (Cellular Array Processor) RP (Routing Processor) ATM (Asynchronous Transfer Mode) Switch MISCELLANEOUS FUJITSU ACTIVITIES Neurocomputing HMET NEC SX-3 series supercomputer organization and performance Benchmark data for SX-3, VP2000, and Cray. Comments MISCELLANEOUS NEC PARALLEL PROCESSING ACTIVITIES PART 3 OUTLINE------------------------------------------------------------ HITACHI CENTRAL RESEARCH LABORATORY HDTV PARALLEL AND VECTOR PROCESSING Hyper crossbar parallel processor, H2P Parallel Inference Machine, PIM/C Josephson-Junctions Molecular Dynamics JAPAN ELECTRONICS SHOW, 1990 HDTV Flat Panel Displays MATSUSHITA ELECTRIC Company profile and computer R&D activities ADENA Parallel Processor MISCELLANEOUS ACTIVITIES HDTV Comments about Japanese industry PART 4 (this part) OUTLINE-------------------------------------------------- KYUSHU UNIVERSITY Profile of Information Science Department Reconfigurable Parallel Processor Superscalar Processor FIFO Vector Processor Comments ELECTROTECHNICAL LABORATORY Sigma-1 Dataflow Computer and EM-4 Dataflow Comments CODA Multiprocessor NEW INFORMATION PROCESSING TECHNOLOGY Summary Comments UNIVERSITY OF TSUKUBA PAX SANYO ELECTRIC Company profile and computer R&D activities HDTV END OF OUTLINE----------------------------------------------------------- KYUSHU UNIVERSITY. Kyushu University is in the city of Fukuoka, the largest city on the island of Kyushu, Japan's southernmost large island. Kyushu is the closest part of Japan to mainland Asia (Korea) and was the route for Genghis Khan's unsuccessful invasion attempt in the 13th century. His fleet was destroyed by a storm, dubbed heavenly wind or kamikazi. Fukuoka is about an hour and a quarter by air from Tokyo. Our host for this visit was Prof. Shinji Tomita Department of Information Systems Interdisciplinary Graduate School of Engineering Sciences Kyushu University 6-1 Kasuga-Koen, Kasuga-shi, Fukuoka 816 Japan Tel: (92) 573-9611 Ext. 411 Email: tomita@is.kyushu-u.ac.jp Professor Tomita was with Kyoto university, where Kung first met him in a 1982 visit to Japan sponsored by IBM. Tomita explained to us that the Information Science Department is composed of seven labs, Information Recognition, Information Transmission, Information Organization, Computational Linguistics, Information Models, Information Retrieval, and Device Physics. These labs are also associated with the engineering, math and physics departments. (By lab, we mean a professor and his associated research assistants and students.) Tomita's lab is Information Organization. We spent most of our time hearing about its activities, which are described briefly below. (1) Reconfigurable parallel processor. The effort here is to develop a testbed for parallel computer architecture, operating systems and parallel programming languages research. The hardware system consists of processing elements (PEs) and a crossbar network that can be reconfigured to fit the communication patterns of different applications. Consisting of the SPARC processor, a home-made MMU and the Weitek floating-point chips, the PE is a complete processor supporting virtual memory and cache. Each PE has a peak performance of 10 MIPS and 1.6 MFLOPS, and has a 8 MBytes of local memory. The system is intended to support all sorts of usage models including tightly coupled (shared memory) computation models and loosely coupled (distributed memory) computation models. A thrust of this effort is therefore in the operating systems area. They are planning to build a 128 by 128 crossbar network, supporting both static and dynamic routing. The system clock is a modest 16.6 MHz. The 128 by 128 crossbar will need 32 15"x20" boards. Currently they have built a subset of the crossbar. Hardware construction is limited by available funds, and the 128-processor system will take three years to complete. The following reference gives more details. "The Kyushu University Reconfigurable Parallel Processor-Design Philosophy and Architecture", Info. Proc. 89, Proc of IFIP 11th World Computer Congress, San Francisco USA (Aug 1989), G.X. Ritter (ed), Elsevier Science Publishers B.V. (North Holland), pp 995-1000. (2) Superscalar processor. In this kind of a machine the instruction word is often quite long and can contain several instructions that can be decoded and executed in parallel by multiple instruction pipelines. Performance gains in such a system are crucially dependent on the run-time method of resolving data dependencies and control dependencies and on the capabilities of the compiler. Thus there is symbiosis between hardware and software support. This research project is thus studying architecture and also compiler development. The hardware supports four simultaneous instruction issues, and eager execution of predicted program branches, and shadow registers to recover when branch prediction is incorrect. (3) A vector processor based on streaming/FIFO architecture. The goal of this project is to do something different from conventional vector supercomputers, which use vector registers to feed the arithmetic pipes. The researchers here propose to use a set of FIFOs instead of vector registers. Since the FIFOs can be made much larger than registers, the proposed approach has some potential advantages of sustaining much higher throughput arithmetic pipes by using chaining. However, to make chaining easy, virtual ALU and load/store pipelines are needed. So this is a project involving very challenging issues and with real-world implication. The researchers promise a "blueprint" of the architecture by April 1991. (4) Special purpose machine for high-speed ray tracing. This project studies parallellism available at different processing levels of a ray tracing computation. Kyushu is one of a few Japanese universities where research is addressing mainstream computer systems issues. In the U.S., there are probably no more than ten universities which are able to do similar kinds of research. Professor Tomita and his two junior project members all have systems building experiences. One, Dr. Akira Fukuda, a graduate of Kyoto University, worked at NTT, and the other worked three years on mainframes at Fujitsu. We believe that this kind of industrial expertise is unusual at Japanese universities. The faculty members and Ph.D students we talked to seemed capable. However, these projects have ambitious goals, and their resources are limited. The entire group, including undergraduates, is about 20 people, and funds are also very tight. It is hard to predict if the four systems or even any one of them will be sufficiently finished in time to support the planned research. But even if their research goals are not completely accomplished, they will have learned valuable experiences for real systems of the future. We also had the opportunity to meet Professor Masaaki Shimasaki, who has recently moved to Kyushu U. from the Computer Center of Kyoto University. Prof Masaaki Shimasaki Computer Center, Kyushu University Fukuoka 812 Japan Tel: (092) 641-1101, ext 2507, Fax: (092) 631-3196 Email: simasaki@sun4.cc.kyushu-u.ac.jp In the past Professor Shimasaki worked on finite element for various kinds of mixed boundary value problems. More recently he has been studying performance analysis of vector supercomputers and techniques used in vectorizing and parallelizing compilers. In particular he has applied Hockney's model to NEC SX-2 and Fujitsu Facom VP-400 supercomputers. (Hockney proposes that an estimate of the total time for a vector operation, t, can be given by t=(n+nhalf)/rinf, where n is the vector length, rinf is the peak speed, and nhalf is the vector length at which half the maximum speed is obtained). Shimasaki's results match observed data extremely well. He is going to apply this technique to newer systems and we will be anxious to see the results. ELECTROTECHNICAL LABORATORY. Kahaner wrote about ETL, see 2 July 1990 file "etl", so here we summarize only our latest impressions based on Kung's recent visit to ETL. The main interest in this visit was the Sigma-1 Dataflow computer and its follow on the EM-4. To review, Sigma-1 now has an operational 128-PE system, in 32 clusters each composed of 4 processors. A single processor can compute at 3.3 MFLOPS (32 bit arithmetic) and 5 MIPS. Each processor requires two boards, one for the processor and one for memory. Connections between processors and clusters are each 100 MBytes/second. Applications developed on this machine have not been very significant yet. They demonstrated a trapezoidal integration of sin(x) with 30K mesh points, for which the calculation rate is 170 MFLOPS. It might be interesting to try an adaptive integration which could exhibit the run-time capability of a dataflow architecture. They said that they would try this. ETL researchers claim that Sigma-1 is the first and likely the last pure dataflow machine. The follow up project, EM-4, suggests that traditional optimization techniques are being used to improve performance of dataflow architectures. (We saw a similar effort at Kyushu University.) The new aspects of these dataflow machines are not much different from those of any advanced high-performance machines. It is very clear that distinguishing data flow architectures is no longer an interesting issue. However, Japanese researchers working in the area are making every effort to emphasize that they are still working on dataflow architectures. It is worthwhile to repeat some of the essential issues here. Every calculation can be thought of as being described by a set of tasks. Some tasks can be done in parallel, others sequentially. Most tasks need data that will be computed in another task. Tasks may be large, such as a subroutine, or as small as an arithmetic assignment statement. It is relatively easy to generate large tasks, but then the amount of parallelism is limited. A task graph (or dataflow graph) indicates which tasks need to be done first, how much time each takes, where data goes, etc. In principle, using this graph one can determine the absolute lower bound on the execution time for the problem. The important problem for any parallel processor is to allocate a set of tasks having different execution times and precedence constraints onto a number of processors. In practice, tasks cannot be matched perfectly to processors, and there are overhead and other delays. Further the execution time for large tasks depends on how their subtasks are broken up. Thus the actual execution time will always be greater than the lower bound. In "real" dataflow, the tasks are low level. If a dataflow computer can organize processors to execute tasks exactly as they are presented in the task graph, the possibility exists for a computation to be done in almost the minimum possible time. The difficulty with pure dataflow computers has been that various overheads have been tremendous, these include difficulty of controlling the sequence of execution, memory overhead because of contention for data, and communication overhead. There is a great deal of dataflow work going on both in Japan and in the west. But as we have pointed out above current research seems to involve compromising the pure dataflow concept to bring it back to practical realization. The EM-4 project is one example; another is the Harray project at Waseda university in which large tasks are done using more conventional control flow and within these tasks computations are done using data flow. The problem of allocating processors to tasks has been studied for many years and is known to be a very intractable scheduling problem, known as strong NP-hard. Thus various approximate algorithms are used. One of these has been shown to be near optimal by H. Kasahara, also of Waseda University. Kung was given a briefing on the ETL's CODA multiprocessor project. The goal of the project is to study scalable prioritized multi-stage networks which have a predictable delay for communication. These kinds of networks are important for sensor fusion in real-time applications such as process control. A novel idea of "priority forwarding" is proposed so that the part of a packet that contains its priority information will never be blocked. This will guarantee predictable communication delay for packets with the highestest priority. Our overall host for this visit to ETL was: Toshio Shimada Chief Scientist Computer Architecture Section Computer Science Division Electrotechnical Laboratroy 1-1-4 Umezono Tsukuba, Ibaraki 305 Tel: 0298-54-5443 FAX: 0298-58-5882 Email: shimada@etl.go.jp NEW INFORMATION PROCESSING TECHNOLOGY. This is the follow-on to MITI's Future Information Technology Project which began in 1986. Some parts ended this year, others end in 1992. The New Information Processing Technology is MITI's New Initiative in 1990's. Kahaner reported on aspects of this earlier, see 3 July 1990 file "highspd", and 26 June 1990 "nipt". Recent additional information was provided by Mr. T. Yuba of ETL. The best information we have is that this new follow-on MITI project is still not officially decided. For the past two years specialists from the Japanese government, academic, and industrial organizations in fields such as mathematics, physiology, psychology, and computer science have organized three subcommittees and six working groups in order to make a comprehensive study to define and set project goals. The working groups meet about once a month and have produced many preliminary reports. A final report is due soon. The new project deals with the following fundamental issues. (1) The capabilities of traditional (Turing) computers have increased dramatically, but there are still many kinds of information processing that are easy for living organisms for which conventional computers perform poorly. (2) In the latter areas, work of the "fifth generation project" has focused on inference, language, understanding and other logical processing. (3) Other areas such as pattern recognition, intuitive information processing, and autonomous and cooperative control involving systems having many degrees of freedom, seem to be less suitable to sequential processing. (4) Physiology, cognitive psychology, and other brain research have produced a great deal of insight into how the brain learns and processes information. (5) Technology such as optical and molecular devices are being developed that may make possible large scale parallel processing. While not yet officially set, the project will probably focus on the following two kinds of research. (1) Basic principles of very highly parallel and highly distributed information processing, learning, optical technology and other new devices. (2) Three dimensional information, visual and auditory recognition and understanding, and autonomous and cooperative functions as seen in living organisms. Thus there will be research on something related to "soft logic" supported by massively parallel processors. The goal is to handle ambiguous or incomplete information using a new set of information processing methods. These include, but is not limited to neural nets, and also includes the idea of intelligent databases. The project will probably be of the same scale as the 5th Generation Computer Project, and follow the same organization and setting as ICOT. The project planners have expressed a strong interest in international cooperation. One exciting possibility discussed by Kung is to establish a research facility containing some massively parallel hardware of at least 1 million programmable processors. This can be an international testbed for applications in massively parallel processing. Contact on this subject is: Mr. Toshitsugu Yuba Director Intelligent Systems Division Electrotechnical Laboratory 1-1-4 Umezono Tsukuba, Ibaraki 305 Tel: (0298) 54-5412 A project to build a reliable computer with a million or more processors is the kind of basic research thrust that a great nation could feel very proud about embarking on. There would be difficult problems in designing and building it. But the challenges and the opportunities would draw the best research minds like a powerful magnet. It is impossible to say what will really come out of this but every scientist should be excited about the possibilities. UNIVERSITY OF TSUKUBA. Kung made a short visit to University of Tsukuba after his visit to ETL. The purpose of this visit is to see the 14 GFLOPS, 488-processor MIMD, QCDPAX machine. The machine was designed by University of Tsukuba and manufactured by Anritsu Corporation. Kahaner had a report on this machine before, see April 12, 1990 "pax". The machine has started to produce interesting results in physics. One paper reporting these results has just been presented in a recent physics conference in the U.S. According to Professor Hoshino, the next generation machine will be 100 GFLOPS and will probably be built by physicists. It is quite an achievement to have built a machine of this scale by any standard. This project is an interesting and successful collaboration example between physicists and computer scientists. Contacts are: Professor Tsutomu Hoshino Institute of Engineering Mechanics University of Tsukuba Tshukuba-Shi, Ibarari-Ken Tel: (0298) 53-5255 FAX: (0298) 53-5207 Email: hoshino@kz.tsukuba.ac.jp Professor Yoshio Oyanagi Institute of Information Sciences Unversity of Tsukuba Tennodai 1-1-1, Tsukuba 305 Tel: +81 298-53-5518 FAX: +81 298-53-5206 Email: oranagi@is.tsukuba.ac.jp SANYO ELECTRIC CO. We had a brief visit in Sanyo's Osaka R&D facility to discuss the possibility of using the CMU-Intel iWarp in HDTV applications. We were given a briefing on Sanyo's research activities. Our host for this visit was Mr. Yasuhiro Ishii Senior Manager Sanyo Electric Co. Ltd Information & Communication Systems Research Center Optoelectronics Dept. 180 Ohmori, Anpachi-Cho Anpachi-Gun, Gifu, Japan Tel: (0584) 64-3996, Fax: (0584) 64-4754. Sanyo is primarily a consumer products corporation but they have also made significant advances in amorphous silicon and are very proud of their research in amorphous silicon solar cells. The R&D organization works with a budget of about $500Million U.S. divided roughly as follows. R&D Administrative Hq. Tsukuba Research Center 100 people (Basic research) Functional Materials Res. Center 200 (Fundamental res.) Semiconductor Res. Center 200 " ULSI Research Center 200 " Control and Systems Res. Center 200 " Product Engineering Laboratory 200 (Applied research) Audio-Video Research Center 200 " Information and Communication System Research Center 200 " The research staff we met were associated with the last three groups. Most of the work is centered in Osaka, except for the basic research in Tsukuba for which the most interesting computer applications there have to do with intelligent systems, such as robots, neurocomputers, and biocomputers, and the Information and Communication Center that is in Nagoya. The latter works on parallel processing for display and image processing, AI, expert systems, natural language processing, optical disks, digital communications, and research in reliability for functional and electromechanical components. Our comments here are not about research in general but only about the specific interactions we had. The HDTV research group we met were quite different from approximately similar groups that we visited in that the scientists (and managers) did not speak much English. We were accompanied by Mr. T. W. Kang of Intel Japan who provided a translation into Japanese, and this was absolutely necessary. The major interest here was how to compress HDTV images in order to write them on a CD-ROM. This is the same problem that was raised at Hitachi and Matsushita. Much better compression algorithms are needed. Sanyo is hoping for compression ratios of 150 times. This is an ideal application for parallel processing. It currently takes about eight hours to compress an image, and of course Sanyo would like to do it in real time to prepare for future writeable CD technology. There are about 1.7 TeraFLOPS computations. Only parallel machines can deal with this in any practical way. Special-purpose parallel hardware cannot really do the job because of lack of the flexibility needed to implement high-quality compression algorithms. New programmable parallel systems such as iWarp can potentially provide the required power and flexibility. ---------------END OF PART 4----------------------------------------------- ---------------END OF REPORT-----------------------------------------------