A Bit About Supercomputer Performance

What is actually supercomputing and HPC about? Dr. Ole Saastad, Chief Engineer at The University Center for Information Technology tries to give you a quick answer.

Image may contain: Rectangle, Slope, Font, Plot, Parallel.

Supercomputers used to (from 1950s to about 1994) be made by super fast processing units (CPU). A milestone was the Cray 1S in 1976. However, due to major improvement in microprocessors through the 1980s and early 1990s, some scientists started to look to microprocessors to do smaller work.

In 1994, the first cluster of microprocessor based office computers was called into existence by Sterling and Becker, the first Beowulf cluster which was able to attack real size problems. The rest is history. Fox is such a cluster, made with 128 core AMD processors and servers with a state of the art interconnect, InfiniBand from NVIDIA.

Many laptops with processors with low core count (4 to 8) might have faster individual cores. Each individual core might have higher performance than the compute node cores. However, the compute node has many more cores (128).

Your task should be to run in parallel. Most real applications are written in C/C++/Fortran (for good reasons) which is regarded as the classical languages. But Python and Julia are strong contenders.

The only thing for sure (in the short run) is that each individual core is not getting significantly faster. Actually, you have no choice but starting to run in parallel. As an example, the Betzy supercomputer in Norway has over 172.032 cores.

Once you have mastered the art of parallel programming, you need to think about scaling. In most cases, scaling from one to two or four cores gives excellent scaling. The challenge becomes harder when trying to use far more cores. There are two paradigms in parallel computing shared memory and distributed memory. The first one is for natural reasons confined to a single system and hence a limited number of cores. In the distributed case there is no theoretical limit, as communication between the systems are done with an interconnect.

Image may contain: Human body, Organism, Reptile, Amphibian, Terrestrial animal. Before venturing into production runs, the scaling must be checked. Plotting speedup against number of cores is quite easy. Initially for most applications, the speedup at 2, 4 or even 8 tends to line on a straight line. In most cases, the speedup starts to level off at some core count (or even drop).

The theoretical framework for speedup is known by two laws, Amdahl's law (pessimistic) and Gustavson's law (optimistic), regard learning them as homework. The latter law tells us that given a big enough problem, it will scale better than a very small one.

By Ole Saastad

Published Sep. 29, 2022 4:06 PM - Last modified Sep. 29, 2022 4:39 PM