Xref: utzoo soc.culture.japan:5844 comp.sys.super:262
Path: utzoo!attcan!uunet!wuarchive!zaphod.mps.ohio-state.edu!ncar!noao!arizona!rick
From: rick@cs.arizona.edu (Rick Schlichting)
Newsgroups: soc.culture.japan,comp.sys.super
Subject: Kahaner Report: Parallel Computing in Japan (Part 4)
Message-ID: <121@saguaro.cs.arizona.edu>
Date: 6 Nov 90 01:58:46 GMT
Followup-To: soc.culture.japan
Organization: U of Arizona CS Dept, Tucson
Lines: 464


  [Dr. David Kahaner is a numerical analyst visiting Japan for two-years
   under the auspices of the Office of Naval Research-Far East (ONRFE).  
   The following is the professional opinion of David Kahaner and in no 
   way has the blessing of the US Government or any agency of it.  All 
   information is dated and of limited life time.  This disclaimer should 
   be noted on ANY attribution.]

  [Copies of previous reports written by Kahaner can be obtained from
   host cs.arizona.edu using anonymous FTP.]

To: Distribution
From: David Kahaner ONRFE [kahaner@xroads.cc.u-tokyo.ac.jp]
      H.T. Kung CMU [ht.kung@cs.cmu.edu]
Re: Aspects of Parallel Computing Research in Japan---Kyushu & Tsukuba
                  Univ., ETL, Sanyo, New Info Proc Technology project. 
Date: 6 Nov 1990

ABSTRACT. Some aspects of parallel computing research in Japan are
analyzed, based on authors' visits to a number of Japanese universities
and industrial laboratories in October 1990. This portion of the report
deals with parallel computing at Kyushu and Tsukuba Universities,
Electrotechnical Laboratory, Sanyo Electric, and the New Information
Processing Technology project.

PART 4.

The following outline describes the topics that are discussed in the various 
parts of this report.

PART 1 OUTLINE-------------------------------------------------------------
  INTRODUCTION
  SUMMARY
  RECOMMENDATIONS

PART 2 OUTLINE-------------------------------------------------------------
  FUJITSU OVERVIEW
    Company profile and computer R&D activities
    VP2000 series supercomputer organization and performance
    PARALLEL PROCESSING ACTIVITIES
     SP (Logic Simulation Engine)
     AP1000 (Cellular Array Processor)
     RP (Routing Processor)
     ATM (Asynchronous Transfer Mode) Switch
    MISCELLANEOUS FUJITSU ACTIVITIES
     Neurocomputing
     HMET 

  NEC
    SX-3 series supercomputer organization and performance
      Benchmark data for SX-3, VP2000, and Cray.
      Comments
    MISCELLANEOUS NEC PARALLEL PROCESSING ACTIVITIES

PART 3 OUTLINE------------------------------------------------------------
  HITACHI CENTRAL RESEARCH LABORATORY
    HDTV
    PARALLEL AND VECTOR PROCESSING
      Hyper crossbar parallel processor, H2P
      Parallel Inference Machine, PIM/C
      Josephson-Junctions
      Molecular Dynamics

   JAPAN ELECTRONICS SHOW, 1990
     HDTV
     Flat Panel Displays

   MATSUSHITA ELECTRIC
     Company profile and computer R&D activities
     ADENA Parallel Processor
     MISCELLANEOUS ACTIVITIES
       HDTV
     Comments about Japanese industry

PART 4 (this part) OUTLINE--------------------------------------------------
    KYUSHU UNIVERSITY
      Profile of Information Science Department
      Reconfigurable Parallel Processor
      Superscalar Processor
      FIFO Vector Processor
      Comments

    ELECTROTECHNICAL LABORATORY
      Sigma-1 Dataflow Computer and EM-4
      Dataflow Comments
      CODA Multiprocessor

    NEW INFORMATION PROCESSING TECHNOLOGY
      Summary
      Comments

    UNIVERSITY OF TSUKUBA
      PAX

    SANYO ELECTRIC
      Company profile and computer R&D activities
      HDTV
END OF OUTLINE-----------------------------------------------------------


KYUSHU UNIVERSITY.
Kyushu University is in the city of Fukuoka, the largest city on the
island of Kyushu, Japan's southernmost large island. Kyushu is the
closest part of Japan to mainland Asia (Korea) and was the route for
Genghis Khan's unsuccessful invasion attempt in the 13th century.  His
fleet was destroyed by a storm, dubbed heavenly wind or kamikazi.
Fukuoka is about an hour and a quarter by air from Tokyo.

Our host for this visit was
  Prof. Shinji Tomita
  Department of Information Systems
  Interdisciplinary Graduate School of Engineering Sciences
  Kyushu University
  6-1 Kasuga-Koen, Kasuga-shi, Fukuoka 816 Japan
  Tel: (92) 573-9611 Ext. 411
  Email: tomita@is.kyushu-u.ac.jp

Professor Tomita was with Kyoto university, where Kung first met him in
a 1982 visit to Japan sponsored by IBM.  Tomita explained to us that the
Information Science Department is composed of seven labs, Information
Recognition, Information Transmission, Information Organization,
Computational Linguistics, Information Models, Information Retrieval,
and Device Physics. These labs are also associated with the engineering,
math and physics departments. (By lab, we mean a professor and his
associated research assistants and students.) Tomita's lab is
Information Organization.  We spent most of our time hearing about its
activities, which are described briefly below.

(1) Reconfigurable parallel processor. The effort here is to develop a
testbed for parallel computer architecture, operating systems and
parallel programming languages research.  The hardware system consists
of processing elements (PEs) and a crossbar network that can be
reconfigured to fit the communication patterns of different
applications. Consisting of the SPARC processor, a home-made MMU and the
Weitek floating-point chips, the PE is a complete processor supporting
virtual memory and cache.  Each PE has a peak performance of 10 MIPS and
1.6 MFLOPS, and has a 8 MBytes of local memory.   The system is intended
to support all sorts of usage models including tightly coupled (shared
memory) computation models and loosely coupled (distributed memory)
computation models.  A thrust of this effort is therefore in the
operating systems area.   They are planning to build a 128 by 128
crossbar network, supporting both static and dynamic routing.  The
system clock is a modest 16.6 MHz.  The 128 by 128 crossbar will need 32
15"x20" boards.  Currently they have built a subset of the crossbar.
Hardware construction is limited by available funds, and the
128-processor system will take three years to complete. The following
reference gives more details.
   "The Kyushu University Reconfigurable Parallel Processor-Design
Philosophy and Architecture", Info. Proc. 89, Proc of IFIP 11th World
Computer Congress, San Francisco USA (Aug 1989), G.X. Ritter (ed),
Elsevier Science Publishers B.V. (North Holland), pp 995-1000.

(2) Superscalar processor. In this kind of a machine the instruction
word is often quite long and can contain several instructions that can
be decoded and executed in parallel by multiple instruction pipelines.
Performance gains in such a system are crucially dependent on the
run-time method of resolving data dependencies and control dependencies
and on the capabilities of the compiler. Thus there is symbiosis between
hardware and software support. This research project is thus studying
architecture and also compiler development. The hardware supports four
simultaneous instruction issues, and eager execution of predicted
program branches, and shadow registers to recover when branch prediction
is incorrect.

(3) A vector processor based on streaming/FIFO architecture.  The goal
of this project is to do something different from conventional vector
supercomputers, which use vector registers to feed the arithmetic pipes.
The researchers here propose to use a set of FIFOs instead of vector
registers.  Since the FIFOs can be made much larger than registers, the
proposed approach has some potential advantages of sustaining much
higher throughput arithmetic pipes by using chaining.   However, to make
chaining easy, virtual ALU and load/store pipelines are needed.  So this
is a project involving very challenging issues and with real-world
implication.  The researchers promise a "blueprint" of the architecture
by April 1991.

(4) Special purpose machine for high-speed ray tracing.  This project
studies parallellism available at different processing levels of a ray
tracing computation.

Kyushu is one of a few Japanese universities where research is
addressing mainstream computer systems issues. In the U.S., there are
probably no more than ten universities which are able to do similar
kinds of research. Professor Tomita and his two junior project members
all have systems building experiences.  One, Dr. Akira Fukuda, a
graduate of Kyoto University, worked at NTT, and the other worked three
years on mainframes at Fujitsu.  We believe that this kind of industrial
expertise is unusual at Japanese universities.

The faculty members and Ph.D students we talked to seemed capable.
However, these projects have ambitious goals, and their resources are
limited.  The entire group, including undergraduates, is about 20
people, and funds are also very tight.  It is hard to predict if the
four systems or even any one of them will be sufficiently finished in
time to support the planned research.  But even if their research goals
are not completely accomplished, they will have learned valuable
experiences for real systems of the future.

We also had the opportunity to meet Professor Masaaki Shimasaki, who has
recently moved to Kyushu U. from the Computer Center of Kyoto
University.
   Prof Masaaki Shimasaki
   Computer Center, Kyushu University
   Fukuoka 812 Japan
   Tel: (092) 641-1101, ext 2507, Fax: (092) 631-3196
   Email: simasaki@sun4.cc.kyushu-u.ac.jp
In the past Professor Shimasaki worked on finite element for various
kinds of mixed boundary value problems. More recently he has been
studying performance analysis of vector supercomputers and techniques
used in vectorizing and parallelizing compilers. In particular he has
applied Hockney's model to NEC SX-2 and Fujitsu Facom VP-400
supercomputers.  (Hockney proposes that an estimate of the total time
for a vector operation, t, can be given by t=(n+nhalf)/rinf, where n is
the vector length, rinf is the peak speed, and nhalf is the vector
length at which half the maximum speed is obtained). Shimasaki's results
match observed data extremely well. He is going to apply this technique
to newer systems and we will be anxious to see the results.

ELECTROTECHNICAL LABORATORY.
Kahaner wrote about ETL, see 2 July 1990 file "etl", so here we
summarize only our latest impressions based on Kung's recent visit to
ETL.  The main interest in this visit was the Sigma-1 Dataflow computer
and its follow on the EM-4. To review, Sigma-1 now has an operational
128-PE system, in 32 clusters each composed of 4 processors. A single
processor can compute at 3.3 MFLOPS (32 bit arithmetic) and 5 MIPS. Each
processor requires two boards, one for the processor and one for memory.
Connections between processors and clusters are each 100 MBytes/second.
Applications developed on this machine have not been very significant
yet. They demonstrated a trapezoidal integration of sin(x) with 30K mesh
points, for which the calculation rate is 170 MFLOPS. It might be
interesting to try an adaptive integration which could exhibit the
run-time capability of a dataflow architecture. They said that they
would try this.

ETL researchers claim that Sigma-1 is the first and likely the last pure
dataflow machine. The follow up project, EM-4, suggests that traditional
optimization techniques are being used to improve performance of
dataflow architectures. (We saw a similar effort at Kyushu University.)
The new aspects of these dataflow machines are not much different from
those of any advanced high-performance machines.  It is very clear that
distinguishing data flow architectures is no longer an interesting
issue.  However, Japanese researchers working in the area are making
every effort to emphasize that they are still working on dataflow
architectures.

It is worthwhile to repeat some of the essential issues here. Every
calculation can be thought of as being described by a set of tasks. Some
tasks can be done in parallel, others sequentially. Most tasks need data
that will be computed in another task. Tasks may be large, such as a
subroutine, or as small as an arithmetic assignment statement. It is
relatively easy to generate large tasks, but then the amount of
parallelism is limited.  A task graph (or dataflow graph) indicates
which tasks need to be done first, how much time each takes, where data
goes, etc. In principle, using this graph one can determine the absolute
lower bound on the execution time for the problem.  The important
problem for any parallel processor is to allocate a set of tasks having
different execution times and precedence constraints onto a number of
processors.  In practice, tasks cannot be matched perfectly to
processors, and there are overhead and other delays. Further the
execution time for large tasks depends on how their subtasks are broken
up.  Thus the actual execution time will always be greater than the
lower bound. In "real" dataflow, the tasks are low level. If a dataflow
computer can organize processors to execute tasks exactly as they are
presented in the task graph, the possibility exists for a computation to
be done in almost the minimum possible time.  The difficulty with pure
dataflow computers has been that various overheads have been tremendous,
these include difficulty of controlling the sequence of execution,
memory overhead because of contention for data, and communication
overhead.  There is a great deal of dataflow work going on both in Japan
and in the west. But as we have pointed out above current research seems
to involve compromising the pure dataflow concept to bring it back to
practical realization. The EM-4 project is one example; another is the
Harray project at Waseda university in which large tasks are done using
more conventional control flow and within these tasks computations are
done using data flow.  The problem of allocating processors to tasks has
been studied for many years and is known to be a very  intractable
scheduling problem, known as strong NP-hard. Thus various approximate
algorithms are used. One of these has been shown to be near optimal by
H. Kasahara, also of Waseda University.

Kung was given a briefing on the ETL's CODA multiprocessor project.  The
goal of the project is to study scalable prioritized multi-stage
networks which have a predictable delay for communication.  These kinds of
networks are important for sensor fusion in real-time applications such
as process control.  A novel idea of "priority forwarding" is proposed
so that the part of a packet that contains its priority information will
never be blocked.  This will guarantee predictable communication delay
for packets with the highestest priority.

Our overall host for this visit to ETL was:
  Toshio Shimada
  Chief Scientist
  Computer Architecture Section
  Computer Science Division
  Electrotechnical Laboratroy
  1-1-4 Umezono
  Tsukuba, Ibaraki 305
  Tel: 0298-54-5443
  FAX: 0298-58-5882
  Email: shimada@etl.go.jp


NEW INFORMATION PROCESSING TECHNOLOGY. 
This is the follow-on to MITI's Future Information Technology Project
which began in 1986. Some parts ended this year, others end in 1992.
The New Information Processing Technology is MITI's New Initiative in
1990's.  Kahaner reported on aspects of this earlier, see 3 July 1990
file "highspd", and 26 June 1990 "nipt". Recent additional information
was provided by Mr. T.  Yuba of ETL.

The best information we have is that this new follow-on MITI project is
still not officially decided.  For the past two years specialists from
the Japanese government, academic, and industrial organizations in
fields such as mathematics, physiology, psychology, and computer science
have organized three subcommittees and six working groups in order to
make a comprehensive study to define and set project goals. The working
groups meet about once a month and have produced many preliminary
reports. A final report is due soon.

The new project deals with the following fundamental issues. (1)
The capabilities of traditional (Turing) computers have increased
dramatically, but there are still many kinds of information processing
that are easy for living organisms for which conventional computers
perform poorly. (2) In the latter areas, work of the "fifth generation
project" has focused on inference, language, understanding and other
logical processing. (3) Other areas such as pattern recognition,
intuitive information processing, and autonomous and cooperative control
involving systems having many degrees of freedom, seem to be less
suitable to sequential processing. (4) Physiology, cognitive psychology,
and other brain research have produced a great deal of insight into how
the brain learns and processes information. (5) Technology such as
optical and molecular devices are being developed that may make possible
large scale parallel processing. 

While not yet officially set, the project will probably focus on the
following two kinds of research. (1) Basic principles of very highly
parallel and highly distributed information processing, learning,
optical technology and other new devices. (2) Three dimensional
information, visual and auditory recognition and understanding, and
autonomous and cooperative functions as seen in living organisms. Thus
there will be research on something related to "soft logic" supported by
massively parallel processors.  The goal is to handle ambiguous or
incomplete information using a new set of information processing
methods.  These include, but is not limited to neural nets, and also
includes the idea of intelligent databases.  The project will probably
be of the same scale as the 5th Generation Computer Project, and follow
the same organization and setting as ICOT.

The project planners have expressed a strong interest in international
cooperation.   One exciting possibility discussed by Kung is to
establish a research facility containing some massively parallel
hardware of at least 1 million programmable processors.  This can be an
international testbed for applications in massively parallel processing.
Contact on this subject is:
  Mr. Toshitsugu Yuba
  Director
  Intelligent Systems Division
  Electrotechnical Laboratory
  1-1-4 Umezono
  Tsukuba, Ibaraki 305
  Tel: (0298) 54-5412

A project to build a reliable computer with a million or more processors
is the kind of basic research thrust that a great nation could feel very
proud about embarking on.  There would be difficult problems in
designing and building it. But the challenges and the opportunities
would draw the best research minds like a powerful magnet. It is
impossible to say what will really come out of this but every scientist
should be excited about the possibilities.

UNIVERSITY OF TSUKUBA.
Kung made a short visit to University of Tsukuba after his visit to ETL.
The purpose of this visit is to see the 14 GFLOPS, 488-processor MIMD,
QCDPAX machine.  The machine was designed by University of Tsukuba and
manufactured by Anritsu Corporation.  Kahaner had a report on this
machine before, see April 12, 1990 "pax".  The machine has started to
produce interesting results in physics.  One paper reporting these
results has just been presented in a recent physics conference in the
U.S.  According to Professor Hoshino, the next generation machine will
be 100 GFLOPS and will probably be built by physicists.

It is quite an achievement to have built a machine of this scale by any
standard.  This project is an interesting and successful collaboration
example between physicists and computer scientists.
  Contacts are:
  Professor Tsutomu Hoshino
  Institute of Engineering Mechanics
  University of Tsukuba
  Tshukuba-Shi, Ibarari-Ken
  Tel: (0298) 53-5255
  FAX: (0298) 53-5207
  Email: hoshino@kz.tsukuba.ac.jp

  Professor Yoshio Oyanagi
  Institute of Information Sciences
  Unversity of Tsukuba
  Tennodai 1-1-1, Tsukuba 305
  Tel: +81 298-53-5518
  FAX: +81 298-53-5206
  Email: oranagi@is.tsukuba.ac.jp

SANYO ELECTRIC CO.
We had a brief visit in Sanyo's Osaka R&D facility to discuss the
possibility of using the CMU-Intel iWarp in HDTV applications.  We were
given a briefing on Sanyo's research activities.  Our host for this
visit was
  Mr. Yasuhiro Ishii
  Senior Manager
  Sanyo Electric Co. Ltd
  Information & Communication Systems Research Center
  Optoelectronics Dept.
  180 Ohmori, Anpachi-Cho 
  Anpachi-Gun, Gifu, Japan
   Tel: (0584) 64-3996, Fax: (0584) 64-4754.

Sanyo is primarily a consumer products corporation but they have also
made significant advances in amorphous silicon and are very proud of
their research in amorphous silicon solar cells. The R&D organization
works with a budget of about $500Million U.S. divided roughly as
follows.
    R&D Administrative Hq.
    Tsukuba Research Center              100 people (Basic research)
    Functional Materials Res. Center     200        (Fundamental res.)
    Semiconductor Res. Center            200             "
    ULSI Research Center                 200             "
    Control and Systems Res. Center      200             "
    Product Engineering Laboratory       200        (Applied research)
    Audio-Video Research Center          200             "
    Information and Communication
       System Research Center            200             "
The research staff we met were associated with the last three groups.

Most of the work is centered in Osaka, except for the basic research in
Tsukuba for which the most interesting computer applications there have
to do with intelligent systems, such as robots, neurocomputers, and
biocomputers, and the Information and Communication Center that is in
Nagoya. The latter works on parallel processing for display and image
processing, AI, expert systems, natural language processing, optical
disks, digital communications, and research in reliability for
functional and electromechanical components.

Our comments here are not about research in general but only about the
specific interactions we had. The HDTV research group we met were quite
different from approximately similar groups that we visited in that the
scientists (and managers) did not speak much English.  We were accompanied
by Mr. T. W. Kang of Intel Japan who provided a translation into
Japanese, and this was absolutely necessary.

The major interest here was how to compress HDTV images in order to
write them on a CD-ROM. This is the same problem that was raised at
Hitachi and Matsushita. Much better compression algorithms are needed.
Sanyo is hoping for compression ratios of 150 times. This is an ideal
application for parallel processing. It currently takes about eight
hours to compress an image, and of course Sanyo would like to do it in
real time to prepare for future writeable CD technology. There are about
1.7 TeraFLOPS computations. Only parallel machines can deal with this in
any practical way.  Special-purpose parallel hardware cannot really do
the job because of lack of the flexibility needed to implement
high-quality compression algorithms.  New programmable parallel systems
such as iWarp can potentially provide the required power and
flexibility.

---------------END OF PART 4-----------------------------------------------
---------------END OF REPORT-----------------------------------------------