Another day, another high-energy proton beam collision. Ever since the Large Hadron Collider (LHC) at CERN began operating in 2008, discoveries of elusive particles have frequently made the headlines, and announcements ranging from the Higgs boson to the pentaquark have given us a glamorous image of the research being done there. But for the scientists who work at the LHC, beams of sub-atomic particles barrelling into each other at close to the speed of light are part of a daily reality, and dazzling discoveries of new, exotic particles don’t come easily.


In fact, despite the LHC being one of the most complex and impressive feats of engineering in history, there’s one simple yet fundamental limit on how much data its scientists can analyse: CERN’s computer storage. Each collision has the potential to tell us more about what our Universe is made from, but to prevent CERN from becoming overwhelmed with information, a vast majority of the data needs to be thrown away. So does this mean that the LHC’s scientists could be missing out on new discoveries?

One hundred metres below the Swiss town of Meyrin, home of the CERN particle physics laboratory, lies the LHC’s largest particle detector: the ATLAS experiment. As beams of protons collide head-on with each other inside ATLAS, it’s the detector’s job to pick up the debris of the collisions, which comes in the form of smaller, high energy particles and structures that are created as the protons smash each other apart.

“It’s not a dramatic oversimplification to describe the detector as a massive, inward looking digital camera”, says Dr Ben Wynne, a researcher at the University of Edinburgh who works with the data gathered by ATLAS. But whereas an ordinary camera creates images made up of pixels formed by individual flashes of visible light, or lower-energy photons, hitting the different pixelated parts of its detector, the camera Dr Wynne describes is quite different.

“The innermost layers for the ATLAS detector are called the pixel detector, made of pixelated silicon; except we’re looking for much higher energy particles than the photons that you might use to compose an ordinary photo.” So each high-energy particle resulting from the collisions forms a small part of a unique photo. ATLAS is able to take continuous, 3D images of the aftermath of the proton beam collisions as they happen, which then get stored in CERN’s computer system for physicists to analyse.

More like this

Dr Wynne’s role in handling these images is crucial. As the LHC’s detectors pick up the particles resulting from individual proton-proton collisions, an overwhelming amount of data comes flooding into CERN’s computers. This means a system needs to be in place to record only the most important collision data, and throw out the rest. “ATLAS is detecting 40 million collisions per second, but we can only record 1,000 per second. A single event requires about 1.3 megabytes of storage, so we can’t store all of that raw data”, Dr Wynne explains. With such a low recording rate (coming to just around 1 in 40,000 collisions), it’s therefore the job of him, and hundreds of other scientists, to develop the software needed for ATLAS’ extremely picky data recording system.

“We have a large system of computers just in the next cave over from the detector itself, which will read out the detector, and very rapidly make decisions based on whether a particular event is interesting or not. We call it the trigger system.” Luckily, it isn’t too difficult for the trigger system to decide what happens to a lot of the data.

“Most of the data coming out of the machine isn’t very interesting, and we don’t need to record it. That’s either because the proton collision wasn’t particularly central, and so not a lot of energy was actually transferred, or it was just a very common process, so it’s nothing that would surprise us”. So despite each individual collision having a chance of telling us something new, it’s extremely unlikely. With that in mind, the trigger system’s data acceptance rate of 1 in 40,000 isn’t quite as alarmingly low as it first seems. But is it enough to tell scientists at the LHC everything they want to know?

, via Wikimedia Commons" alt="By diverse contributors; mashup by User:Zykure [CC BY-SA 2.0], via Wikimedia Commons" classes=""] , via Wikimedia Commons" alt="By diverse contributors; mashup by User:Zykure [CC BY-SA 2.0], via Wikimedia Commons" classes=""] , via Wikimedia Commons" alt="By diverse contributors; mashup by User:Zykure [CC BY-SA 2.0], via Wikimedia Commons" classes=""] By diverse contributors; mashup by User:Zykure [CC BY-SA 2.0], via Wikimedia Commons

Opposite to ATLAS on the ring of the LHC’s particle accelerator, nine kilometres away underneath Cessy in France, lies the CMS experiment. CMS has a similar setup to ATLAS, and one of the main purposes of the detectors is to cross-check each other’s results, to verify any of interest. But sometimes, CMS is needed for different research, leaving the two detectors looking for different kinds of collision data.

Dr Wynne recalls an instance when the differences in objectives between the two experiments at the time caused a headache for some scientists. “CMS was producing results which were looking at some generally uninteresting collisions within the LHC. But they’d gone looking for a particular structure that might appear within them, and they’d set up their trigger system to specifically go searching for this, which produced a very interesting result.”

That meant that as CMS analysed their discovery, they unfortunately couldn’t cross-check their results with ATLAS. “At ATLAS, our trigger system was set up in a completely different way, and we didn’t have the data at the lower end of the energy spectrum that CMS needed. So not only did we not have this result but having seen CMS’ result we still couldn’t go back and analyse our data to try and confirm it.”

Part of the problem is that as the LHC’s researchers make progressively more interesting discoveries, they typically need to program the LHC’s detector to look for increasingly higher-energy particles. This ultimately assumes that lower-energy discoveries have already been made - but this isn’t always the case. In this instance, CMS had gone back to analyse lower-energy particles that ATLAS had decided to pay less attention to, meaning they couldn’t match CMS’s results.

As a result, LHC researchers sometimes need to make calculated risks when they change the objectives of the two detectors. These trade-offs sometimes need to be made when deciding what energy range to set the LHC’s detectors to, which don’t always pay off. “There could be rare processes that we just hadn’t seen at lower energy experiments because they hadn’t produced enough data,” says Wynne. “CMS did have to spend an awful lot of the output bandwidth from their trigger system on this analysis, so you can argue whether it was worthwhile for them or not.”

A certain trigger

But the issues caused by detectors looking in different places aren’t the only problems caused by CERN’s limited data storage. Restrictions in computer space mean that different groups of scientists need to frequently debate each other over who gets the resources to gather the data they need for their research. “We do have meetings where people will argue at great lengths, saying this particular physics signal that I’m interested in requires this figure”, Dr Wynne says.

“There’s a total amount of output that we are limited by, so some people will have to lose out where other people’s analysis is seen as more important”, he adds. For the scientists who lose these debates, research can become much more difficult. “If you have less hard data, you can sometimes do some quite clever analysis on a fairly limited amount of data, but it’s much harder to get it done.”

The worldwide LHC computing grid servers, part of the CERN LHC experiment in Geneva, Switzerland © Harold Cunningham/Getty Images
The worldwide LHC computing grid servers, part of the CERN LHC experiment in Geneva, Switzerland © Harold Cunningham/Getty Images

Ultimately, the type of data that the LHC produces is largely decided by the entirely human decision of what each LHC detector’s trigger system is programmed to detect. Already, this process sacrifices data that could be useful to many scientists, in the place of data deemed most likely to be important. But even within the experiments of scientists who have won these debates and had their way with their research, it isn’t entirely certain whether every important event will be picked up by the trigger system.

“You have to make a decision about what’s interesting and what’s not and there’s definitely plenty of discussion there not just about how you justify one thing over another. But at a philosophical level, you’re relying on your existing knowledge of the experiment and of physics, and there are a lot of assumptions built in there.” The LHC’s scientists are like explorers in uncharted territory, who need to rely on their own expertise and intuition as they mark out the criteria for the events the trigger system decides to record. There’s no guarantee that even these brightest minds in physics will stumble into the right place.

The phenomena scientists want to analyse can only really be decided based on current scientific theories, and if something completely new to science happens in the detector, the trigger system could decide to overlook it completely. “We have to make a choice about what we set the system up to look for, and if we’re taken by surprise – there’s something unexpected there, then we might miss it”, Dr Wynne adds.

However, the LHC’s trigger systems aren’t completely blind to unexpected events. “We do have ways of trying to account for this. If the trigger system can’t make the decision, so it doesn’t understand the event for whatever reason, then it can get saved anyway in case there’s something particularly interesting or weird going on.” This clever programming has lowered the risk of scientists missing out on new discoveries.

“There are also other approaches”, Dr Wynne adds. “LHCB, one of the other experiments at CERN, try and do as much of their actual data analysis in the trigger system itself, so as the data is being read out, a lot of analysis is just being done in real time. So they don’t have to store the raw events when they can’t afford to, but they can still get quite a lot of the physics they need from it.”

Rest assured, Dr Wynne and all of the LHC’s trigger system software engineers are prepared for almost every eventuality as potential discoveries come in from unexpected places. But still, the incredibly meticulous selection process of the trigger system means that unexpected events don’t have a complete guarantee of being recorded instead of the more predictable events that scientists are actually looking for.

Despite all the methods used by the teams working at the Large Hadron Collider, there is a fundamental constraint on how much data it can gather, and there isn’t really any way around it. “The more data you save, the more you spend on storage and networking bandwidth. So there are some fairly major limits on the size of our data sets – computing time costs money. We’re pretty much at our maximum at the moment”, Dr Wynne concludes.

This constraint has created an environment for research at the LHC that we rarely hear about: a setting where the risk of missing out on new discoveries is ever present. It is the reason why CERN only accepts the brightest minds in physics in the world, and requires them to continually debate each other to decide who gets the sufficient resources to carry out their research. It means that the LHC must only use the pinnacle of modern engineering and technology, to ensure that the lowest possible number of new discoveries slip through the cracks. But it is ultimately by the chaotic, unpredictable nature of particle physics that the risk of missing discoveries is inevitable, and can only be alleviated to the extent that the laws of physics allow.


Follow Science Focus on Twitter, Facebook, Instagram and Flipboard