Bug 1212975 - Failure to start firewalld correctly during boot is (more or less) silently ignored
Summary: Failure to start firewalld correctly during boot is (more or less) silently i...
Status: NEW
Alias: None
Product: openSUSE Distribution
Classification: openSUSE
Component: Security (show other bugs)
Version: Leap 15.5
Hardware: Other Other
: P5 - None : Major (vote)
Target Milestone: ---
Assignee: Security Team bot
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-07-04 10:43 UTC by Frank Kühndel
Modified: 2023-07-16 18:45 UTC (History)
6 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Frank Kühndel 2023-07-04 10:43:25 UTC
Short description:

In case `firewalld` does not (longer) understand its configuration files, `systemd` does start it during boot an reports it running. In contrast `firewall-cmd --state` reports `failed` and the machine accepts incoming IP connections from IP addresses it should not.

I would expect: If such a security function like the firewall is not correctly working, the administrator is informed "somehow" and "loudly enough" about this problem. The least minimum I would expect is a red-colored "ERROR"-line in the journal.

To be fair, I should note, there is a Python trace dumped in the journal and in the file `/var/log/firewalld`. Yet, at least I did overlook these for two years.

Bug 1212974 is the technical bug behind this security hole. I open two bugs because I believe these are two independent issues.

Long description:

OpenSUSE Leap 15.0 (2018) replaced `SuSEfirewall2` by `firewalld`. Hence, I configured `firewalld` using among others, this command:

  # firewall-cmd --permanent --zone=home --add-source=192.168.193.0/255.255.255.0

The firewall worked fine afterwards. I upgraded OpenSUSE over the years to 15.5, so that the old configuration files are still in use today. At some point in time, `firewalld` stopped being able to parse the `255.255.255.0` and consequently did not protect my machine anymore.

I even checked the configuration at each upgraded with `firewall-cmd --info-zone home` but stupidly I never typed `firewall-cmd --state`. By chance, I recently looked into the file `/var/log/firewalld` and discovered the following message:

2023-06-30 16:07:59 Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/firewall/server/decorators.py", line 53, in handle_exceptions
    return func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/firewall/server/firewalld.py", line 93, in start
    return self.fw.start()
  File "/usr/lib/python3.6/site-packages/firewall/core/fw.py", line 541, in start
    self._start()
  File "/usr/lib/python3.6/site-packages/firewall/core/fw.py", line 502, in _start
    self.zone.apply_zones(use_transaction=transaction)
  File "/usr/lib/python3.6/site-packages/firewall/core/fw_zone.py", line 178, in apply_zones
    self.apply_zone_settings(zone, use_transaction=use_transaction)
  File "/usr/lib/python3.6/site-packages/firewall/core/fw_zone.py", line 297, in apply_zone_settings
    self._zone_settings(True, _zone, transaction)
  File "/usr/lib/python3.6/site-packages/firewall/core/fw_zone.py", line 267, in _zone_settings
    self._source(enable, zone, args[0], args[1], transaction)
  File "/usr/lib/python3.6/site-packages/firewall/core/fw_zone.py", line 752, in _source
    policy, source, table, chain)
  File "/usr/lib/python3.6/site-packages/firewall/core/nftables.py", line 935, in build_zone_source_address_rules
    "expr": [self._rule_addr_fragment(opt, address),
  File "/usr/lib/python3.6/site-packages/firewall/core/nftables.py", line 1217, in _rule_addr_fragment
    address = {"prefix": {"addr": addr_len[0], "len": int(addr_len[1])}}
ValueError: invalid literal for int() with base 10: '255.255.255.0'

According to the log files, this issues was introduced when I upgraded to OpenSUSE Leap 15.3 in July 2021. Consequently, since two years my firewall is not working properly.

How to reproduce:

Note: I assume the error can only be provoked on an "active zone". I use the `public` zone in the example below. You can list your active zones with `firewall-cmd --get-active-zones`.

Warning: The instructions below tamper with your firewall settings. You machine may end up having no internet access or being reachable from "dangerous" IP addresses. Make sure to restore your firewall settings after your test.

Make a backup first ...

# cp /etc/firewalld/zones/public.xml /etc/firewalld/zones/public.xml.ori

Add a rule and change its netmask in the configuration file into a form `firewalld` does currently not understand (due to Bug 1212974).

# firewall-cmd --permanent --zone=public --add-source=1.2.3.4/24
success
# sed -i 's%1.2.3.4/24%1.2.3.4/255.255.255.0%' /etc/firewalld/zones/public.xml

Provoke the error by having `firewalld` read the configuration file:

# systemctl restart firewalld

Check whether `firewalld` is OK:

# firewall-cmd --state
failed

Notice that `systemctl` reports "active (running)" and thus has not detected the failing of the firewall:

# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
     Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: disabled)
     Active: active (running) since Mon 2023-07-03 20:19:21 CEST; 1min 30s ago
       Docs: man:firewalld(1)
   Main PID: 8569 (firewalld)
      Tasks: 2 (limit: 4915)
     CGroup: /system.slice/firewalld.service
             └─ 8569 /usr/bin/python3 /usr/sbin/firewalld --nofork --nopid

Jul 03 20:19:21 goofy systemd[1]: Starting firewalld - dynamic firewall daemon...
Jul 03 20:19:21 goofy systemd[1]: Started firewalld - dynamic firewall daemon.
Jul 03 20:19:21 goofy firewalld[8569]: Traceback (most recent call last):
                                         File "/usr/lib/python3.6/site-packages/firewall/server/decorators.py", line 53, in handle_exceptions
                                           return func(*args, **kwargs)
                                         File "/usr/lib/python3.6/site-packages/firewall/server/firewalld.py", line 93, in start
                                           return self.fw.start()
                                         File "/usr/lib/python3.6/site-packages/firewall/core/fw.py", line 541, in start
                                           self._start()
                                         File "/usr/lib/python3.6/site-packages/firewall/core/fw.py", line 502, in _start
                                           self.zone.apply_zones(use_transaction=transaction)
                                         File "/usr/lib/python3.6/site-packages/firewall/core/fw_zone.py", line 178, in apply_zones
                                           self.apply_zone_settings(zone, use_transaction=use_transaction)
                                         File "/usr/lib/python3.6/site-packages/firewall/core/fw_zone.py", line 297, in apply_zone_settings
                                           self._zone_settings(True, _zone, transaction)
                                         File "/usr/lib/python3.6/site-packages/firewall/core/fw_zone.py", line 267, in _zone_settings
                                           self._source(enable, zone, args[0], args[1], transaction)
                                         File "/usr/lib/python3.6/site-packages/firewall/core/fw_zone.py", line 752, in _source
                                           policy, source, table, chain)
                                         File "/usr/lib/python3.6/site-packages/firewall/core/nftables.py", line 935, in build_zone_source_address_rules
                                           "expr": [self._rule_addr_fragment(opt, address),
                                         File "/usr/lib/python3.6/site-packages/firewall/core/nftables.py", line 1217, in _rule_addr_fragment
                                           address = {"prefix": {"addr": addr_len[0], "len": int(addr_len[1])}}
                                       ValueError: invalid literal for int() with base 10: '255.255.255.0'


`firewall-cmd` does list the configuration it does not understand without any indication of an error:

# firewall-cmd --info-zone public
public (active)
  target: default
  icmp-block-inversion: no
  interfaces: eth0
  sources: 1.2.3.4/255.255.255.0
  services: 
  ports: 
  protocols: 
  forward: no
  masquerade: no
  forward-ports: 
  source-ports: 
  icmp-blocks: 
  rich rules: 

If you want, you can reboot your machine and check afterwards that `firewalld` is in "failed" state and that there is no red error shown in `journalctl -r` about this security hole.

Finally make sure your firewall is configured correctly by restoring the original configuration file:

# mv /etc/firewalld/zones/public.xml.ori /etc/firewalld/zones/public.xml
# systemctl restart firewalld
# firewall-cmd --state
running
Comment 1 Matthias Gerstner 2023-07-10 09:25:20 UTC
Yes this situation can be problematic and I experienced it myself. It is not a
SUSE Linux specific problem though. I'm adding the firewalld maintainers to CC
for chiming in.

Systemd should at a minimum report the firewalld service as failed if it
couldn't load correctly.

Maybe there is something on systemd level that can be done to avoid silent
firewall failure. Ideally some emergency fallback would be used (minimal safe
firewall setup) or networking would be disabled/blocked.

I am not aware of any feature that systemd or firewalld offer in this
direction.

As systemd administrator you can add custom scripts that check for the
situation and act to your liking. I'm not sure about a good approach for the
generic case. Having a system booting with network but silently without
firewall is probably the worst of the outcomes when security is a priority.
Comment 2 Mohd Saquib 2023-07-12 07:37:58 UTC
Hi Frank,
I've applied the fix for the firewalld crash and it will be available as an update sooner or later.

Reading through your security issue, it seems like it has to do with systemd status reporting rather than firewalld. firewalld already provides --state flag which is giving correct output on failure.

Let me add systemd maintainers for their feedback on this.

Thanks,
Saquib
Comment 3 Frank Kühndel 2023-07-12 20:03:38 UTC
I think the problem is somewhere in the interaction between `systemd` and `firewalld`. The log (`journalctl`) contains entries in this order:


Jun 30 16:07:58 goofy systemd[1]: Starting firewalld - dynamic firewall daemon...
[...]
Jun 30 16:07:58 goofy systemd[1178]: Reached target Sockets.
Jun 30 16:07:58 goofy systemd[1178]: Reached target Basic System.
Jun 30 16:07:58 goofy systemd[1178]: Reached target Main User Target.
Jun 30 16:07:58 goofy systemd[1178]: Startup finished in 202ms.
[...]
Jun 30 16:07:59 goofy systemd[1]: Started firewalld - dynamic firewall daemon.
Jun 30 16:07:59 goofy systemd[1]: Reached target Preparation for Network.
[...]
Jun 30 16:07:59 goofy firewalld[1148]: Traceback (most recent call last):
[...]
Jun 30 16:08:08 goofy systemd[1]: Finished wicked managed network interfaces.
Jun 30 16:08:08 goofy systemd[1]: Reached target Network.
Jun 30 16:08:08 goofy systemd[1]: Started Backup of /etc/sysconfig.
Jun 30 16:08:08 goofy systemd[1]: Reached target Network is Online.


So, first `systemd` prints "Starting firewalld" then it prints "Started firewalld" and after that I see "firewalld[1148]: Traceback ...". The "firewalld[1148]: Traceback ..." is the result of `firewalld` reading its configuration files, detecting it does not understand something in them and throws this error.

I am unsure whether I interpret this right but it looks like, `systemd` starts `firewalld` and "believes" the firewall is up and running while `firewalld` actually has not yet sets up the firewall. At least in the `journal` it looks like as if setting up the firewall happens later.

I doubt this is the behaviour `systemd` expects from `firewalld`.

I am not experienced with `systemd`. Especially not with the type `Type=dbus`. Maybe `firewalld` takes the name on the `dbus` too early. (See "type =" point "Behavior of dbus" in https://www.freedesktop.org/software/systemd/man/systemd.service.html#Options). Moreover, this web-page also states in section "Example 6. DBus services": "systemd will consider the service to be initialized once the name has been acquired on the system bus."

Furthermore, how does `systemd` expect a service of type `dbus` to communicate it failed to start correctly? I guess it either removes from the `dbus` or the main process terminates (see the above web-page Option "Restart="). But `firewalld` keeps running despite `firewall-cmd --state` reports `failed`.

Finally, what should happen when `firewalld` fails? The red line in the journal of course. But Matthias Gerstner (in Comment 1) correctly notes: "Ideally some emergency fallback would be used (minimal safe firewall setup) or networking would be disabled/blocked."

------

Just for info:

# cat /usr/lib/systemd/system/firewalld.service
[Unit]
Description=firewalld - dynamic firewall daemon
Before=network-pre.target
Wants=network-pre.target
After=dbus.service
After=polkit.service
Conflicts=iptables.service ip6tables.service ebtables.service ipset.service nftables.service
Documentation=man:firewalld(1)

[Service]
EnvironmentFile=-/etc/sysconfig/firewalld
ExecStart=/usr/sbin/firewalld --nofork --nopid $FIREWALLD_ARGS
ExecReload=/bin/kill -HUP $MAINPID
# supress to log debug and error output also to /var/log/messages
StandardOutput=null
StandardError=null
Type=dbus
BusName=org.fedoraproject.FirewallD1
KillMode=mixed

[Install]
WantedBy=multi-user.target
Alias=dbus-org.fedoraproject.FirewallD1.service
Comment 4 Frank Kühndel 2023-07-16 18:32:56 UTC
To figure out what would happen if `systemd` would detect that `firewalld` failed, I made the following experiment:

# cp -a /usr/lib/systemd/system/firewalld.service /root/firewalld.service.bck
# sed -i 's/^ExecStart=.*$/ExecStart=false/' /usr/lib/systemd/system/firewalld.service
# systemctl daemon-reload
# reboot

After the reboot:

# systemctl status firewalld.service
    Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: disabled)
     Active: failed (Result: exit-code) since Sun 2023-07-16 19:14:38 CEST; 58s ago
       Docs: man:firewalld(1)
    Process: 1141 ExecStart=false (code=exited, status=1/FAILURE)
   Main PID: 1141 (code=exited, status=1/FAILURE)

# journalctl
Jul 16 19:14:38 goofy systemd[1]: Starting firewalld - dynamic firewall daemon...
Jul 16 19:14:38 goofy systemd[1]: firewalld.service: Main process exited, code=exited, status=1/FAILURE
Jul 16 19:14:38 goofy systemd[1]: firewalld.service: Failed with result 'exit-code'.
Jul 16 19:14:38 goofy systemd[1]: Failed to start firewalld - dynamic firewall daemon.
Jul 16 19:14:38 goofy systemd[1]: Reached target Preparation for Network.

Beside of one red line in the journal there is no other effect. The machine boots normally in graphical target, the network is configured and has an IPv4 address via DHCP, the network services like `sshd` are started and running.

To end this experiment, I used these commands:

# cp /root/firewalld.service.bck /usr/lib/systemd/system/firewalld.service
# systemctl daemon-reload
# systemctl start firewalld.service
# firewall-cmd --state
running
Comment 5 Frank Kühndel 2023-07-16 18:45:13 UTC
I suggest to simplify this bug to:

  * "`systemd` shall detect when `firewalld` fails during boot"

The second question:

  * "What has to happen when `systemd` detects that `firewalld` failed?"

should be made another bug (once this one is solved and the actual reaction is considered insufficient).