Businesses and governments have been urged to take action to protect themselves against hacking attacks that are capable of injecting invisible backdoors into the source code of widely used programming languages.
Trojan Source attacks can be used by hackers or hostile states to launch powerful attacks against software supply chains by depositing doctored code in libraries and software repositories such as GitHub.
The hacking technique, disclosed today by researchers at the University of Cambridge, can be used by hostile attackers to insert backdoors into source code across almost all computer languages.
The attacks exploit standard control characters to secretly insert malicious code into source code which appears innocuous to humans reviewing it for potential security risks.
Nicholas Boucher and Ross Anderson of Cambridge University’s Computer Science Laboratory demonstrated that C, C++, JavaScript, Java, Rust, Go and Python are vulnerable to Trojan Source attacks.
They warned in a research paper published today (1 November) that the same attacks could be applied to almost any programming language that uses common software compilers that make use of Unicode – the international standard for encoding text and scripts.
The Cambridge researchers have spent the past three months coordinating a complex disclosure programme to allow suppliers of software tools, such as compilers, interpreters, code editors and code repositories, to put defences in place.
Half of the organisations contacted by the researchers during the disclosure process are either working on patches or have committed to doing so, but others, say the researchers, are “dragging their feet”.
Anderson said it was likely that bad actors will use the “Trojan Source trick” against some compilers that haven’t been patched to spread software vulnerabilities.
“We recommend that governments and firms that rely on critical software should identify their supplier’s posture, exert pressure on them to implement adequate defences and ensure that any gaps are covered by controls elsewhere in their toolchain,” the academics said.
“Any entity whose security relies on the integrity of software supply chains should be concerned,” they warned.
Copy and paste
Many developers are happy to copy and paste insecure source code from unofficial online sources. This makes it likely that attackers will post malicious code with invisible vulnerabilities in the hope that they will end up in production code.
There is a financial incentive for them to do so, the researchers argue, as there is a lucrative market for security vulnerabilities which can command seven-figure sums for the most valuable.
Malicious attackers have a strong incentive to use Trojan Source attacks to maliciously add backdoors into authenticated code that will persist in the wild for a long time.
Attacking open-source software components that are used by many other software applications would mean any attack will have “a large blast radius”.
The vulnerabilities would be difficult or impossible to detect by security specialists reviewing the uncompiled source code.
“Trojan Source attacks introduce the possibility of inserting such vulnerabilities into source code invisibly, thus completely circumventing the current principal control against them, namely human source code review,” the researchers said.
Supply chain attacks
Supply chain attacks have gained urgent attention from governments, including the US, which issued an executive order to improve the security of the software supply chain in May 2021.
In one of the largest supply chain attacks, FireEye disclosed in December 2020, nation-state hackers successfully attacked SolarWinds Orion, a widely used IT performance-monitoring platform, to attack governments and enterprises world-wide.
According to the University of Cambridge research, once published, supply chain vulnerabilities are likely to persist in the affected ecosystem even if patches are later released.
Bidi control characters
Trojan Source attacks exploit bi-directional control characters used in Unicode, which are used to swap between languages written left to right, such as English, and those written right to left, such as Arabic or Hebrew.
Attackers can use the control characters, known as Bidi override characters, to insert malicious code in source code that will appear unsuspicious to a human reviewer.
The malicious code can be hidden in comments or strings of characters in the source code of the programme. “Any developer who copies code from an untrusted source into a protected code base may inadvertently introduce an invisible vulnerability,” the researchers warn.
There is “an immediate” need for organisations to build defences into their code repositories and text editors used for writing code, the authors said.
One way to do this is to scan code for the presence of Bidi override characters.
The researchers found some evidence that techniques similar to Trojan Source attacks had been already exploited, although no malicious attacks have been discovered.
In the longer term, the use of Unicode attacks against Natural Language Systems will be a bigger problem, said Anderson.
Here, attackers could use Trojan Source type attacks disrupt for machine translation services, according to another paper published by University of Cambridge researchers.
That could disrupt the work of journalists or intelligence services monitoring events overseas, said Anderson.
“If journalists rely on machine translation to monitor hate speech by the Burmese army against the Rohingya, for example, then the army propagandists could use coding tricks to stop their stuff being translated, so it’s much less available to foreigners ,” said Anderson.
The same techniques could also be used to compromise business emails, to subvert search engine optimisation algorithms or to disable hate speech detection filters in social media services.