At the time it was installed in the summer of 2018, Tetralith was more than just the fastest of the six traditional supercomputers in the National Supercomputer Centre (NSC) at Linköping University. It was the most powerful supercomputer in the Nordic region.
But just three years later, it was necessary to complement Tetralith with a new system – one that would be specifically designed to meet the requirements of fast-evolving artificial intelligence (AI) and machine learning (ML) algorithms. Tetralith wasn’t designed for machine learning – it didn’t have the parallel processing power that would be needed to handle the increasingly large datasets used to train artificial intelligence algorithms.
To support research programmes that rely on AI in Sweden, the Knut and Alice Wallenberg Foundation donated €29.5m to have the bigger supercomputer built. Berzelius was delivered in 2021 and began operation in the summer. The supercomputer, which has more than twice the computing power of Tetralith, takes its name from the renowned scientist Jacob Berzelius, who came from Östergötland, the region of Sweden where the NSC is located.
Atos delivered and installed Berzelius, which includes 60 of Nvidia’s latest and most powerful servers – the DGX systems, with eight graphics processing units (GPUs) in each. Nvidia networks connect the servers with one another – and with 1.5PB (petabytes) of storage hardware. Atos also delivered its Codex AI Suite, an application toolset to support researchers. The entire system is housed in 17 racks, which when placed side-by-side extend to about 10 metres.
The system will be used for AI research – not only the large programmes funded by the Knut and Alice Wallenberg Foundation, but also other academic users who apply for time on the system. Most of the users will be in Sweden, but some will be researchers in other parts of the world who cooperate with Swedish scientists. The biggest areas of Swedish research that will use the system in the near future are autonomous systems and data-driven life sciences. Both cases involve a lot of machine learning on enormous datasets.
NSC intends to hire staff to help users – not so much core programmers, but rather to help users put together parts that already exist. There are a lot of software libraries for AI and they have to be understood and used correctly. The researchers using the system typically either do their own programming, have it done by assistants, or simply adapt good open source projects to their needs.
“So far, around 50 projects have been granted time on the Berzelius,” says Niclas Andresson, technology manager of NSC. “The system is not yet fully utilised, but utilisation is rising. Some problems use a large part of the system. For instance, we had a hackathon on NLP [natural language processing], and that used the system quite well. Nvidia provided a toolbox for NLP that scales up to the big machine.”
In fact, one of the biggest challenges now is for researchers to scale the software they’ve been using to match the new computing power. Many of them have one or a small number of GPUs that they use on their desktop computers. But scaling their algorithms to a system with hundreds of GPUs is a challenge.
Now Swedish researchers have the opportunity to think big.
Autonomous systems
AI researchers in Sweden have been using supercomputer resources for several years. In the early days, they used systems based on CPUs. But in more recent years, as GPUs evolved out of the gaming industry and into supercomputing, their massively parallel structures have taken number crunching to a new level. The earlier GPUs were designed for image rendering, but now they are being tailored to other applications, such as machine learning, where they have already become essential tools for researchers.
“Without the availability of supercomputing resources for machine learning we couldn’t be successful in our experiments,” says Michael Felsberg, professor at the Computer Vision Laboratory at Linköping University. “Just having the supercomputer doesn’t solve our problems, but it’s an essential ingredient. Without the supercomputer, we couldn’t get anywhere. It would be like a chemist without a Petri dish, or a physicist without a clock.”
“Without the availability of supercomputing resources for machine learning we couldn’t be successful in our experiments. Just having the supercomputer doesn’t solve our problems, but without it, we couldn’t get anywhere”
Michael Felsberg, Linköping University
Felsberg was part of the group that helped define the requirements for Berzelius. He is also part of the allocation committee that decides which projects that get time on this cluster, how time is allocated, and how usage is counted.
He insists that not only is it necessary to have a big supercomputer, but it must be the right type of supercomputer. “We have enormous amounts of data – terabytes – and we need to process these thousands of times. In all the processing steps, we have a very coherent computational structure, which means we can use a single instruction and can process multiple data, and that is the typical scenario where GPUs are very strong,” says Felsberg.
“More important than the sheer number of calculations, it’s also necessary to look at the way the calculations are structured. Here too, modern GPUs do exactly what’s needed – they easily perform calculations of huge matrix products,” he says. “GPU-based systems were introduced in Sweden a few years ago, but in the beginning, they were relatively small, and it was difficult to gain access to them. Now we have what we need.”
Massive parallel processing and huge data transfers
“Our research does not require just a single run that lasts over a month. Instead, we might have as many as 100 runs, each lasting two days. During those two days, enormous memory bandwidth is used, and local filesystems are essential,” says Felsberg.
“When machine learning algorithms run on modern supercomputers with GPUs, a very high number of calculations are performed. But an enormous amount of data is also transferred. The bandwidth and throughput from the storage system to the computational node must be very high. Machine learning requires terabyte datasets and a given dataset needs to be read up to 1,000 times during one run, over a period of two days. So all the nodes and the memory have to be on the same bus.
“Modern GPUs have thousands of cores,” adds Felsberg. “They all run in parallel on different data but with the same instruction. So that is the single-instruction, multiple-data concept. That’s what we have on each chip. And then you have sets of chips on the same boards and you have sets of boards in the same machine so that you have enormous resources on the same bus. And that is what we need because we often split our machine learning onto multiple nodes.
“We use a large number of GPUs at the same time, and we share the data and the learning among all of these resources. This gives you a real speed-up. Just imagine if you ran this on a single chip – it would take over a month. But if you split it, a massively parallel architecture – let’s say, 128 chips – you get the result of the machine learning much, much faster, which means you can analyse the result and you see the outcome. Based on the outcome you run the next experiment,” he says.
“One other challenge is that the parameter spaces are so large that we cannot afford to cover the whole thing in our experiments. Instead, we have to do smarter search strategies in the parameter spaces and use heuristics to search what we need. This often requires that you know the outcome of the previous runs, which makes this like a chain of experiments rather than a set of experiments that you can run in parallel. Therefore, it’s very important that each run be as short as possible to squeeze out as many runs as possible, one after the other.”
“Now, with Berzelius in place, this is the first time in the 20 years I’ve been working on machine learning for computer vision that we really have sufficient resources in Sweden to do our experiments,” says Felsberg. “Before, the computer was always a bottleneck. Now, the bottleneck is somewhere else – a bug in the code, a flawed algorithm, or a problem with the dataset.”
The beginning of a new era in life sciences research
“We do research in structural biology,” says Bjorn Wallner, professor at Linköping University and head of the boinformatics division. “That involves trying to find out how the different elements that make up a molecule are arranged in three-dimensional space. Once you understand that, you can develop drugs to target specific molecules and bind to them.”
Most of the time, research is coupled to a disease, because that’s when you can solve an immediate problem. But sometimes the bioinformatics division at Linköping also conducts pure research to try to get a better understanding of biological structures and their mechanisms.
The group uses AI to help make predictions about specific protein structures. DeepMind, a Google-owned company, has done work that has given rise to a revolution in structural biology – and it relies on supercomputers.
DeepMind developed AlphaFold, an AI algorithm it trained using very large datasets from biological experiments. The supervised training resulted in “weights”, or a neural network that can then be used to make predictions. AlphaFold is now open source, available to research organisations, such as Bjorn Wallner’s team at Linköping University.
“We can use Berzelius to get a lot more throughput and break new ground in our research. Google has a lot of resources and can do a lot of big things, but now we can maybe compete a little bit”
Bjorn Wallner, Linköping University
There is still a vast amount of uncharted territory in structural biology. While AlphaFold offers a new way of finding the 3D structure of proteins, it’s only the tip of the iceberg – and digging deeper will also require supercomputing power. It’s one thing to understand a protein in isolation, or a protein in a static state. But it’s an entirely different thing to figure out how different proteins interact and what happens when they move.
Any given human cell contains around 20,000 proteins – and they interact. They are also flexible. Shifting one molecule out and another one binding a protein to something else are all actions that regulate the machinery of the cell. Proteins are also manufactured in cells. Understanding the basic machinery is important and can lead to breakthroughs.
“Now we can use Berzelius to get a lot more throughput and break new ground in our research,” says Wallner. “The new supercomputer even gives us the potential to retrain the AlphaFold algorithm. Google has a lot of resources and can do a lot of big things, but now we can maybe compete a little bit.
“We have just started using the new supercomputer and need to adapt our algorithms to this huge machine to use it optimally. We need to develop new methods, new software, new libraries, new training data, so we can actually use the machine optimally,” he says.
“Researchers will expand on what DeepMind has done and train new models to make predictions. We can move into protein interactions, beyond just single proteins and on to how proteins interact and how they change.”