Bug 1191820 - (CVE-2021-42574) VUL-0: CVE-2021-42574,CVE-2021-42694: Trojan Source attack
VUL-0: CVE-2021-42574,CVE-2021-42694: Trojan Source attack
Status: NEW
Classification: Novell Products
Product: SUSE Security Incidents
Classification: Novell Products
Component: Incidents
Other Other
: P3 - Medium : Normal
: ---
Assigned To: Michael Matz
Security Team bot
Depends on:
  Show dependency treegraph
Reported: 2021-10-19 10:06 UTC by Alexander Bergmann
Modified: 2022-01-12 08:32 UTC (History)
7 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Note You need to log in before you can comment on or make changes to this bug.
Comment 13 Alexander Bergmann 2021-10-19 10:14:27 UTC
Michael / Richard, could you please check if this could be mitigated via compiler flags etc.?
Comment 14 Richard Biener 2021-10-19 11:21:48 UTC
(In reply to Alexander Bergmann from comment #13)
> Michael / Richard, could you please check if this could be mitigated via
> compiler flags etc.?

For the homoglyph attack I would suggest to build all packages with -fno-extended-identifiers (note that's only a C family language parameter, so it
causes spurious diagnostics when used with languages other than C or C++).

For the Bidi encodings there's talk to simply not allow those at all because
using those doesn't appear to make sense.

GCC upstream devs from RedHat are working on changes to GCC to diagnose
the homoglyph cases and reject the Bidi encodings.

Note the issue is likely more universal and affects other compilers and
interpreters that accept non-ASCII input.
Comment 15 Alexander Bergmann 2021-10-19 12:18:28 UTC
will be used for tracking the primary attack pattern which uses Bidi control characters.

will be used for tracking the homoglyph variant of the attack.
Comment 16 Michael Matz 2021-10-19 12:33:50 UTC
Note also that I don't think the homoglyph "problem" should be included in the
security issues.  It's like noting that in some fonts '1' and 'l' are rendered
similarly and then making variable names like 'O01l' vs '0Ol1', which is an
obfuscation "technique" known since the dawn of time.

(I bet disabling extended identifiers like suggested will break some packages,
I certainly remember seeing non-ascii comments already many years ago.  I would
bet some programs are meanwhile using non-ascii identifiers as well.  But it will
indeed catch the majority of cases, except, of course, the above example of
homoglyphs (in some fonts)).

I do see the issue with Bidi control characters, and as said, warnings from GCC
for that are currently being worked on.
Comment 17 Gianluca Gabrielli 2021-11-02 10:31:15 UTC
From the OSSS ML

OSS Security teams,

We have identified an issue affecting all compilers and interpreters that support Unicode. We believe that the techniques described hereafter can be used to generate adversarial encodings of source code files that can be used to craft targeted attacks against source code that cannot be seen by human reviewers in rendered text. This is of concern to the open source community because, absent defenses, supply chain attacks can be imperceptibly mounted against the ecosystem.

This vulnerability has undergone a coordinated disclosure process that has concluded today. The security advisory can be found at https://trojansource.codes.

Multiple organizations will be releasing parallel security advisories, such as Rust's advisory at https://blog.rust-lang.org/2021/11/01/cve-2021-42574.html, Red Hat's advisory at https://access.redhat.com/security/vulnerabilities/RHSB-2021-007, and GitHub's advisory at https://github.blog/changelog/2021-10-31-warning-about-bidirectional-unicode-text/.

The attached paper describes an attack paradigm -- which we believe to be novel -- discovered by security researchers at the University of Cambridge. There are two techniques for attack, both of which exploit Unicode's high expressiveness to craft source code files for which rendered text displays divergent logic from the underlying encoded bytes seen by compilers.

The first and primary technique, which we dub the Trojan Source attack, uses Unicode Bidirectional (Bidi) control characters embedded in comments and string literals to produce visually deceptive source code files. This technique enables an adversary to encode constructs that visually appear to be comments or string literals but execute as code, or vice versa. Complete details, as well as recommended mitigations, can be found in the attachment 001 [details] Trojan Source.pdf. This vulnerability is tracked under CVE-2021-42574.

The second technique, to which we refer as the homoglyph variant, uses homoglyphs (characters that render to the same glyph but are represented by different Unicode values) to define adversarial identifiers. In this technique, an adversary defines an identifier such as a function name that appears visually identical to a target function, but is defined using Unicode homoglyphs. This adversarial function then performs some malicious action, then optionally calls the original function it is impersonating. When defined in upstream dependencies such as open source software, these adversarial functions can be imported into downstream software and invoked without visual indication of malicious code. Complete details, as well as recommended mitigations, can also be found in the attachment 001 [details] Trojan Source.pdf. This vulnerability is tracked under CVE-2021-42694.

Proofs-of-concept can be found at https://github.com/nickboucher/trojan-source.

We hope that this information proves useful in building and applying defenses where applicable.

Nicholas Boucher
University of Cambridge