Hitachi SR2201 - Misplaced Pages

#166833

8-514: The Hitachi SR2201 was a distributed memory parallel system that was introduced in March 1996 by Hitachi . Its processor, the 150 MHz HARP-1E based on the PA-RISC 1.1 architecture, solved the cache miss penalty by pseudo vector processing (PVP). In PVP, data was loaded by prefetching to a special register bank , bypassing the cache. Each processor had a peak performance of 300 MFLOPS, giving

16-503: A multiprocessor computer system in which each processor has its own private memory . Computational tasks can only operate on local data, and if remote data are required, the computational task must communicate with one or more remote processors. In contrast, a shared memory multiprocessor offers a single memory space used by all processors. Processors do not have to be aware where data resides, except that there may be performance penalties, and that race conditions are to be avoided. In

24-564: A 2048-node system, which reached a peak speed of 614 GFLOPS, was completed at the end of September 1996. The CP-PACS was run by the Center for Computational Physics, formed for that purpose. The 1024 processor system of the SR2201 achieved 220.4 GFLOPS on the LINPACK benchmark, which corresponded to 72% of the peak performance. Distributed memory In computer science , distributed memory refers to

32-536: A distributed memory system there is typically a processor, a memory, and some form of interconnection that allows programs on each processor to interact with each other. The interconnect can be organised with point to point links or separate hardware can provide a switching network. The network topology is a key factor in determining how the multiprocessor machine scales . The links between nodes can be implemented using some standard network protocol (for example Ethernet ), using bespoke network links (used in for example

40-565: A pipeline where data x is processed subsequently through functions f , g , h , etc. (the result is h ( g ( f ( x )))), then this can be expressed as a distributed memory problem where the data is transmitted first to the node that performs f that passes the result onto the second node that computes g , and finally to the third node that computes h . This is also known as systolic computation . Data can be kept statically in nodes if most computations happen locally, and only changes on edges have to be reported to other nodes. An example of this

48-431: Is simulation where data is modeled using a grid, and each node simulates a small part of the larger grid. On every iteration, nodes inform all neighboring nodes of the new edge data. Similarly, in distributed shared memory each node of a cluster has access to a large shared memory in addition to each node's limited non-shared private memory. Distributed shared memory hides the mechanism of communication, it does not hide

56-399: The transputer ), or using dual-ported memories . The key issue in programming distributed memory systems is how to distribute the data over the memories. Depending on the problem solved, the data can be distributed statically, or it can be moved through the nodes. Data can be moved on demand, or data can be pushed to the new nodes in advance. As an example, if a problem can be described as

64-455: The SR2201 a peak performance of 600 GFLOPS. Up to 2048 RISC processors could be connected via a high-speed three-dimensional crossbar network, which was able to transfer data at 300 MB/s over each link. In February 1996, two 1024-node systems were installed at the University of Tokyo and the University of Tsukuba . The latter was extended to the non-commercial CP-PACS system. An upgrade to

#166833