Bug 1222973 (CVE-2024-26837)

Summary: VUL-0: CVE-2024-26837: kernel: net: bridge: switchdev: race between creation of new group memberships and generation of the list of MDB events to replay
Product: [Novell Products] SUSE Security Incidents Reporter: SMASH SMASH <smash_bz>
Component: IncidentsAssignee: Denis Kirjanov <denis.kirjanov>
Status: NEW --- QA Contact: Security Team bot <security-team>
Severity: Minor    
Priority: P3 - Medium CC: osalvador, thomas.leroy
Version: unspecified   
Target Milestone: ---   
Hardware: Other   
OS: Other   
URL: https://smash.suse.de/issue/402325/
Whiteboard: CVSSv3.1:SUSE:CVE-2024-26837:3.3:(AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:L/A:N)
Found By: Security Response Team Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description SMASH SMASH 2024-04-17 14:02:05 UTC
In the Linux kernel, the following vulnerability has been resolved:

net: bridge: switchdev: Skip MDB replays of deferred events on offload

Before this change, generation of the list of MDB events to replay
would race against the creation of new group memberships, either from
the IGMP/MLD snooping logic or from user configuration.

While new memberships are immediately visible to walkers of
br->mdb_list, the notification of their existence to switchdev event
subscribers is deferred until a later point in time. So if a replay
list was generated during a time that overlapped with such a window,
it would also contain a replay of the not-yet-delivered event.

The driver would thus receive two copies of what the bridge internally
considered to be one single event. On destruction of the bridge, only
a single membership deletion event was therefore sent. As a
consequence of this, drivers which reference count memberships (at
least DSA), would be left with orphan groups in their hardware
database when the bridge was destroyed.

This is only an issue when replaying additions. While deletion events
may still be pending on the deferred queue, they will already have
been removed from br->mdb_list, so no duplicates can be generated in
that scenario.

To a user this meant that old group memberships, from a bridge in
which a port was previously attached, could be reanimated (in
hardware) when the port joined a new bridge, without the new bridge's
knowledge.

For example, on an mv88e6xxx system, create a snooping bridge and
immediately add a port to it:

    root@infix-06-0b-00:~$ ip link add dev br0 up type bridge mcast_snooping 1 && \
    > ip link set dev x3 up master br0

And then destroy the bridge:

    root@infix-06-0b-00:~$ ip link del dev br0
    root@infix-06-0b-00:~$ mvls atu
    ADDRESS             FID  STATE      Q  F  0  1  2  3  4  5  6  7  8  9  a
    DEV:0 Marvell 88E6393X
    33:33:00:00:00:6a     1  static     -  -  0  .  .  .  .  .  .  .  .  .  .
    33:33:ff:87:e4:3f     1  static     -  -  0  .  .  .  .  .  .  .  .  .  .
    ff:ff:ff:ff:ff:ff     1  static     -  -  0  1  2  3  4  5  6  7  8  9  a
    root@infix-06-0b-00:~$

The two IPv6 groups remain in the hardware database because the
port (x3) is notified of the host's membership twice: once via the
original event and once via a replay. Since only a single delete
notification is sent, the count remains at 1 when the bridge is
destroyed.

Then add the same port (or another port belonging to the same hardware
domain) to a new bridge, this time with snooping disabled:

    root@infix-06-0b-00:~$ ip link add dev br1 up type bridge mcast_snooping 0 && \
    > ip link set dev x3 up master br1

All multicast, including the two IPv6 groups from br0, should now be
flooded, according to the policy of br1. But instead the old
memberships are still active in the hardware database, causing the
switch to only forward traffic to those groups towards the CPU (port
0).

Eliminate the race in two steps:

1. Grab the write-side lock of the MDB while generating the replay
   list.

This prevents new memberships from showing up while we are generating
the replay list. But it leaves the scenario in which a deferred event
was already generated, but not delivered, before we grabbed the
lock. Therefore:

2. Make sure that no deferred version of a replay event is already
   enqueued to the switchdev deferred queue, before adding it to the
   replay list, when replaying additions.

References:
http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2024-26837
https://www.cve.org/CVERecord?id=CVE-2024-26837
https://git.kernel.org/stable/c/2d5b4b3376fa146a23917b8577064906d643925f
https://git.kernel.org/stable/c/603be95437e7fd85ba694e75918067fb9e7754db
https://git.kernel.org/stable/c/dc489f86257cab5056e747344f17a164f63bff4b
https://git.kernel.org/stable/c/e0b4c5b1d760008f1dd18c07c35af0442e54f9c8
https://git.kernel.org/pub/scm/linux/security/vulns.git/plain/cve/published/2024/CVE-2024-26837.mbox
Comment 1 Oscar Salvador 2024-04-19 13:26:27 UTC
@Denis: Can you please check

./scripts/check-kernel-fix CVE-2024-26837
dc489f86257c ("net: bridge: switchdev: Skip MDB replays of deferred events on offload") merged v6.8-rc6~32^2~35^2~1
Fixes: 4f2673b3a2b6 ("net: bridge: add helper to replay port and host-joined mdb entries") merged v5.13-rc1~94^2~431^2~8
Security fix for CVE-2024-26837 bsc#1222973 with CVSS 3.3
..............................
ACTION NEEDED!
SLE15-SP6: MANUAL: backport dc489f86257cab5056e747344f17a164f63bff4b (Fixes 4f2673b3a2b6)
SLE15-SP5: MANUAL: backport dc489f86257cab5056e747344f17a164f63bff4b (Fixes 4f2673b3a2b6)