Bugzilla – Bug 1191820
VUL-0: CVE-2021-42574,CVE-2021-42694: Trojan Source attack
Last modified: 2022-01-12 08:32:41 UTC
Michael / Richard, could you please check if this could be mitigated via compiler flags etc.?
(In reply to Alexander Bergmann from comment #13)
> Michael / Richard, could you please check if this could be mitigated via
> compiler flags etc.?
For the homoglyph attack I would suggest to build all packages with -fno-extended-identifiers (note that's only a C family language parameter, so it
causes spurious diagnostics when used with languages other than C or C++).
For the Bidi encodings there's talk to simply not allow those at all because
using those doesn't appear to make sense.
GCC upstream devs from RedHat are working on changes to GCC to diagnose
the homoglyph cases and reject the Bidi encodings.
Note the issue is likely more universal and affects other compilers and
interpreters that accept non-ASCII input.
will be used for tracking the primary attack pattern which uses Bidi control characters.
will be used for tracking the homoglyph variant of the attack.
Note also that I don't think the homoglyph "problem" should be included in the
security issues. It's like noting that in some fonts '1' and 'l' are rendered
similarly and then making variable names like 'O01l' vs '0Ol1', which is an
obfuscation "technique" known since the dawn of time.
(I bet disabling extended identifiers like suggested will break some packages,
I certainly remember seeing non-ascii comments already many years ago. I would
bet some programs are meanwhile using non-ascii identifiers as well. But it will
indeed catch the majority of cases, except, of course, the above example of
homoglyphs (in some fonts)).
I do see the issue with Bidi control characters, and as said, warnings from GCC
for that are currently being worked on.
From the OSSS ML
OSS Security teams,
We have identified an issue affecting all compilers and interpreters that support Unicode. We believe that the techniques described hereafter can be used to generate adversarial encodings of source code files that can be used to craft targeted attacks against source code that cannot be seen by human reviewers in rendered text. This is of concern to the open source community because, absent defenses, supply chain attacks can be imperceptibly mounted against the ecosystem.
This vulnerability has undergone a coordinated disclosure process that has concluded today. The security advisory can be found at https://trojansource.codes.
Multiple organizations will be releasing parallel security advisories, such as Rust's advisory at https://blog.rust-lang.org/2021/11/01/cve-2021-42574.html, Red Hat's advisory at https://access.redhat.com/security/vulnerabilities/RHSB-2021-007, and GitHub's advisory at https://github.blog/changelog/2021-10-31-warning-about-bidirectional-unicode-text/.
The attached paper describes an attack paradigm -- which we believe to be novel -- discovered by security researchers at the University of Cambridge. There are two techniques for attack, both of which exploit Unicode's high expressiveness to craft source code files for which rendered text displays divergent logic from the underlying encoded bytes seen by compilers.
The first and primary technique, which we dub the Trojan Source attack, uses Unicode Bidirectional (Bidi) control characters embedded in comments and string literals to produce visually deceptive source code files. This technique enables an adversary to encode constructs that visually appear to be comments or string literals but execute as code, or vice versa. Complete details, as well as recommended mitigations, can be found in the attachment 001 [details] Trojan Source.pdf. This vulnerability is tracked under CVE-2021-42574.
The second technique, to which we refer as the homoglyph variant, uses homoglyphs (characters that render to the same glyph but are represented by different Unicode values) to define adversarial identifiers. In this technique, an adversary defines an identifier such as a function name that appears visually identical to a target function, but is defined using Unicode homoglyphs. This adversarial function then performs some malicious action, then optionally calls the original function it is impersonating. When defined in upstream dependencies such as open source software, these adversarial functions can be imported into downstream software and invoked without visual indication of malicious code. Complete details, as well as recommended mitigations, can also be found in the attachment 001 [details] Trojan Source.pdf. This vulnerability is tracked under CVE-2021-42694.
Proofs-of-concept can be found at https://github.com/nickboucher/trojan-source.
We hope that this information proves useful in building and applying defenses where applicable.
University of Cambridge