Bug 1213892 (CVE-2023-20583) - VUL-0: CVE-2023-20583: kernel-source-azure,kernel-source-rt,kernel-source: AMD-SB-7006: potential power side-channel vulnerability in AMD processor
Summary: VUL-0: CVE-2023-20583: kernel-source-azure,kernel-source-rt,kernel-source: AM...
Status: REOPENED
Alias: CVE-2023-20583
Product: SUSE Security Incidents
Classification: Novell Products
Component: Incidents (show other bugs)
Version: unspecified
Hardware: Other Other
: P3 - Medium : Normal
Target Milestone: ---
Assignee: Security Team bot
QA Contact: Security Team bot
URL: https://smash.suse.de/issue/373919/
Whiteboard: CVSSv3.1:SUSE:CVE-2023-20583:4.7:(AV:...
Keywords:
Depends on:
Blocks:
 
Reported: 2023-08-02 07:52 UTC by Carlos López
Modified: 2023-11-21 11:40 UTC (History)
7 users (show)

See Also:
Found By: Security Response Team
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Carlos López 2023-08-02 07:52:29 UTC
CVE-2023-20583

A potential power side-channel vulnerability in
AMD processors may allow an authenticated attacker to monitor the CPU power
consumption as the data in a cache line changes over time potentially resulting
in a leak of sensitive information.

References:
http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2023-20583
https://bugzilla.redhat.com/show_bug.cgi?id=2228322
https://www.cve.org/CVERecord?id=CVE-2023-20583
https://www.amd.com/en/corporate/product-security/bulletin/AMD-SB-7006
Comment 1 Joey Lee 2023-08-04 10:12:14 UTC
I did not google out useful detail of this CVE. I add cpufreq expert, Giovanni Gherdovich to Cc list.

Actually I am not sure that this CVE relates to cpufreq.
Comment 2 Karasulli 2023-08-11 16:42:16 UTC
Reassigning to a concrete person to ensure progress [1] (feel free to pass to next one), see also the process at [2].
 
Giovanni, could you please take a look?
 
[1] https://confluence.suse.com/display/KSS/Kernel+Security+Sentinel
[2] https://wiki.suse.net/index.php/SUSE-Labs/Kernel/Security
Comment 3 Marcus Meissner 2023-08-28 08:56:49 UTC
The intel CPU folks had a similar issue and solved this by making the power metering less finegrained, and also restricting access to the cpu variable.

see our TID:
https://www.suse.com/support/kb/doc/?id=000019778
Comment 4 Giovanni Gherdovich 2023-08-28 09:03:39 UTC
I'm going through all the readings you people have collected in this bug report.

From a bird's eye view it seems to me these documents describe a potential vulnerability, but I don't see links to patches.

What's the expected course of action here?
Produce a knowledge base article such as the one pointed by Marcus in comment 3?

Again, I may have a clearer view of the situation once I'm done studying all the docs.
Comment 7 Giovanni Gherdovich 2023-11-21 09:38:56 UTC
The relevant information for this vulnerability is at https://collidepower.com (contains summary, paper and code for proof of concept).

LINKS AND CREDENTIALS: IT'S DANIEL GRUSS' TEAM IN GRAZ
------------------------------------------------------
Direct links to paper and proof of concept are:

- paper: https://collidepower.com/paper/Collide+Power.pdf
- proof of concept: https://github.com/iaik/collidepower

It was found by Daniel Gruss' team at the university of Graz (same people who discovered Meltdown and Spectre); I'm mentioning the authors as they're well known for the quality of their research.


SUMMARY (HALFWAY THROUGH READING PAPER): TOTALLY IMPRACTICAL (BUT COOL)
-----------------------------------------------------------------------
I'm halfway through the paper and will post my remarks later today; this is what I have at the moment:

* high level idea of the attack:
  - attacker and victim threads run on hyper-threading siblings (share an L1 cache)
  - attacker primes the cache with known values, which represent guesses of the victim's secret data (think: start with eg. all zeros, then refines the guess iteratively)
  - victim accesses data bringing bytes into L1, evicts attacker's bytes
  - attacker measures CPU power consumption, which depends on how many bit flips take place when the secret replaces the guess in L1
  - start over with a refined guess, repeat until the power consumption to replace secret with guess is zero: at that point guess and secret are the same
* the exfiltration rate is very low. Two attacks are presented; one can discover ~5 bits per hour (described above), the other some 14 bits every 100 hours.
* this exploit requires attacker's and victim's data to be co-located in L1 for an arbitrarily long time, making it rather unpractical

The bulk of the discovery is the derivation of an accurate model for the power consumption of bit flips in CPU caches. Anyone can say "yeah the more bits you flip the highest the power", but they got a formula, found its coefficients, and devised a noise reduction strategy good enough to read memory of a different process.

I haven't yet got to the part where they explain how to measure power; on x86 the RAPL interface has been made privileged since recent research has shown it can be used in power analysis side channel attacks. The way around is to read clock frequency since throttling is a proxy for power consumption, so I guess that's what they're doing.


FURTHER REMARKS: IT'S 100% INTEL TOO, NOT AMD ONLY
--------------------------------------------------
One very odd thing is the CVE says "vulnerability in AMD processors" but this is 100% not limited to AMD; the paper itself uses an Intel Coffee Lake for the analysis.
The AMD bulletin recommends disabling Core Performance Boost (known as Turbo Boost on Intel) and using something called "Performance Determinism Mode", an operating mode I didn't know it existed. For documentation about this mode, the AMD bulletin refers to a 2017 brief by Moor Insights & Strategy, a consulting firm. Them referencing a 3rd party paper instead of their own documentation seems odd.
My current understanding is that disabling hyperthreading, or at least avoid scheduling together threads not trusting each other, prevents the 5 bits/hour exploit described above. The second type of attack is still theoretically possible w/o threads colocation, but is so slow to be totally unpractical.

ARM too released a bulletin too (see website above). Bottom line is: don't expose precise power measuring interface, and restrict co-scheduling of untrusted cores if you have to.

Around the day this embargo was lifted (which is when this very bugzilla was opened, Aug 2nd 2023) collide+power made rounds in the press, see https://www.google.com/search?q=collide+power+side+channel+attack , but if you browse security-related forums you'll see how absolutely nobody cared. This is far from practical to exploit, and a solid WONTFIX over here like everywhere else.

As a distant observer of the space of power analysis attacks on CPUs, I see they've come a long way in the last 2-3 years. Not yet usable, but improving.

History class: RAPL, or Running Average Power Limit, is the embedded power meter in Intel/AMD CPUs. It was added to Sandy Bridge in 2011 so that turbo boost could work a little better; the CPU needed to know how much nearby cores were consuming, so as to decide if the headroom was enough to sustain turbo for longer. Then it was exposed to userspace because why not, since it's already there, maybe sysadmins will like it. Then in 2020: whoopsie, RAPL leaks information (PLATYPUS attack), so RAPL was made privileged. In 2022 (Hertzbleed) it was shown that clock frequency alone is a good enough proxy for power. This Collide+Power is a further refinement of that idea. Maybe if these exploits get good enough, frequency scaling will be dropped altogether? :)

Anyways, closing as WONTFIX.