Bug 1092352

Summary: Upgrade Leap 42.3 to 15.0 breaks DNS resolution
Product: [openSUSE] openSUSE Distribution Reporter: S. B. <sb56637>
Component: NetworkAssignee: Jonathan Kang <songchuan.kang>
Status: RESOLVED WORKSFORME QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P2 - High CC: arvidjaar, bjoernv, edera, fabian, fcrozat, fvogt, hcderaad, heger.tomas, lnussel, mt, peter.krutel, sb56637, sknorr, sseebergelverfeldt, tonysu, wbauer, will69, yfjiang
Version: Leap 15.0Flags: lnussel: SHIP_STOPPER-
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE 42.3   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: Yes
Marketing QA Status: --- IT Deployment: ---

Description S. B. 2018-05-08 12:36:28 UTC
Hi there, after upgrading from 42.3 to 15.0 there is no DNS resolution. This is the same bug that appeared in Tumbleweed a while back:

- https://forums.opensuse.org/showthread.php/520090-DNS-does-not-work-after-NetworkManager-update

- https://forums.opensuse.org/showthread.php/525732-No-DNS-resolving-after-update?p=2829131#post2829131

- https://bugzilla.opensuse.org/show_bug.cgi?id=1046969

- https://bugzilla.suse.com/show_bug.cgi?id=1047004

I have repeated this bug on multiple system upgrades. The workaround that I used when this bug hit me in Tumbleweed was to delete /etc/resolv.conf , which also works in this case.
Comment 1 Stefan Knorr 2018-05-11 17:28:25 UTC
This just bit me too after upgrading 42.3 -> 15.0. Thank you for the workaround, that indeed helped.

Sven (CC) hypothesized that this might be a permissions issue with the /etc/resolv.conf.

However, that appears not to be the case, both "ls -l" and "lsattr" deliver the same output before and after the deletion/re-creation of the file.

> # lsattr /etc/resolv.conf
> --------------e---- /etc/resolv.conf
> # ls -l /etc/resolv.conf
> -rw-r--r-- root root [...] /etc/resolv.conf
Comment 2 Fabian Vogt 2018-05-12 16:09:21 UTC
Requesting re-priorization.

AFAICT this won't get caught by openQA as none of the upgrade cases use NM.

If this affects all users with NM when upgrading from 42.3 to 15.0, I suggest to mark this as a blocker, as those systems would not be able to receive a fix.
Comment 3 Wolfgang Bauer 2018-05-12 18:02:00 UTC
I just upgraded a 42.3 system using NetworkManager to 15.0 and experienced the same problem.
DNS resolution was broken when rebooting after the upgrade, /etc/resolv.conf only contained this:
# Generated by NetworkManager
search site

Adding a proper nameserver entry fixed my Internet access.
Comment 4 S. B. 2018-05-12 18:40:51 UTC
(In reply to Fabian Vogt from comment #2)
> Requesting re-priorization.
> 
> AFAICT this won't get caught by openQA as none of the upgrade cases use NM.
> 
> If this affects all users with NM when upgrading from 42.3 to 15.0, I
> suggest to mark this as a blocker, as those systems would not be able to
> receive a fix.

I agree this should be a blocker. I thought I set the blocker flag for this bug report but maybe I didn't do it right.
Comment 5 Ludwig Nussel 2018-05-13 07:29:50 UTC
NetworkManager touching /etc/resolv.conf was meant to be fixed several times already, even in 42. Why does keep coming back? Also, I'd assume that the same issue hits on SLED.
Comment 6 Ludwig Nussel 2018-05-13 17:21:58 UTC
My guess is that netconfig fails for whatever reason, the code to run it returns SR_NOTFOUND: https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/src/dns/nm-dns-manager.c#n538

In that case NetworkManager falls back to using direct file write:
https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/src/dns/nm-dns-manager.c#n1260

That just makes things worse. So I a potential fix would return SR_ERROR to prevent NM from using the fallback file method which breaks any further netconfig calls.

Unfortunately NetworkManager does not log the actual error even in debug mode.
Comment 7 S. B. 2018-05-13 17:45:01 UTC
Thanks for setting this as SHIP_STOPPER+
Comment 8 Ludwig Nussel 2018-05-14 08:46:45 UTC
on a quick test locally I could not reproduce. Maybe your resolv.conf was broken on 42.3 already so it never got updated by netconfig again.
Comment 9 Ludwig Nussel 2018-05-14 09:08:30 UTC
anyone who had the problem using btrfs and snapshots? the 42.3 snapshot taken before zypper  dup should be able to confirm whether resolv.conf was written by networkmanager before already.
Comment 10 Wolfgang Bauer 2018-05-14 09:10:16 UTC
(In reply to Ludwig Nussel from comment #8)
> on a quick test locally I could not reproduce. Maybe your resolv.conf was
> broken on 42.3 already so it never got updated by netconfig again.

I'm pretty sure my /etc/resolv.conf was fine before the upgrade.

I upgraded using "zypper dup" in the running system btw if it makes a difference.

No snapshots here though, sorry.
Comment 11 Ludwig Nussel 2018-05-14 09:18:18 UTC
while we are at it, anyone mind to update https://en.opensuse.org/SDB:System_upgrade ? Feel free to also add a call to "netconfig update -f" there
Comment 12 Jonathan Kang 2018-05-14 09:53:15 UTC
I installed Leap 42.3 in my virtual machine and upgraded it to Leap 15.0 using
iso. Before upgrading, /etc/resolv.conf was not modified by NM. I didn't have
this issue after upgrading.
Comment 13 Ludwig Nussel 2018-05-14 11:20:05 UTC
Fabian told me that he can reproduce the issue even without upgrading. Fabian, can you check the journal and also if https://build.opensuse.org/request/show/606917 helps?
Comment 14 Fabian Vogt 2018-05-14 11:36:37 UTC
(In reply to Ludwig Nussel from comment #13)
> Fabian told me that he can reproduce the issue even without upgrading.
> Fabian, can you check the journal and also if
> https://build.opensuse.org/request/show/606917 helps?

I'm not sure whether it's the exact same issue - just that starting from a certain point in time netconfig thinks I modified the file although I already tried mv /etc/resolv.conf.{netconfig,} and rm /etc/resolv.conf. The new generated file does not contain the correct DNS entries (none, actually), but resolv.conf.netconfig always does.

I installed NetworkManager from GNOME:Factory now and will report back.

Journal:

Mai 14 07:56:03 linux-e202.suse.de dns-resolver[31222]: ATTENTION: You have modified /etc/resolv.conf. Leaving it untouched...
Mai 14 07:56:03 linux-e202.suse.de dns-resolver[31224]: You can find my version in /etc/resolv.conf.netconfig
Mai 14 07:56:03 linux-e202.suse.de NetworkManager[22647]: <13>May 14 07:56:03 dns-resolver: ATTENTION: You have modified /etc/resolv.conf. Leaving it untouched...
Mai 14 07:56:03 linux-e202.suse.de NetworkManager[22647]: <13>May 14 07:56:03 dns-resolver: You can find my version in /etc/resolv.conf.netconfig
Mai 14 07:56:03 linux-e202.suse.de NetworkManager[22647]: ATTENTION: You have modified /etc/resolv.conf.  Leaving it untouched...
Mai 14 07:56:03 linux-e202.suse.de NetworkManager[22647]: You can find my version in /etc/resolv.conf.netconfig ...
Mai 14 07:56:03 linux-e202.suse.de NetworkManager[22647]: nisdomainname: you must be root to change the domain name
Mai 14 07:56:03 linux-e202.suse.de NetworkManager[22647]: <warn>  [1526277363.5741] dns-mgr: could not commit DNS changes: Error calling netconfig: exited with status 20
Comment 15 Stefan Knorr 2018-05-14 11:52:37 UTC
Not sure this is going to be helpful but here goes:

So, I tried upgrading a second laptop to Leap 15.0 and did not see this issue, so here is my original PC compared to the new one:

Laptop #1:
 * fully upgraded Leap 42.3 -> Leap 15.0, using NM on GNOME
 * zypper dup
 * before upgrade: (did not check what generated /etc/resolv.conf)
 * after update: empty resolv.conf from NM
 => ISSUE

Laptop #2:
 * not fully upgraded Leap 42.3 -> Leap 15.0, using NM on GNOME
 * zypper dup
 * before upgrade: /etc/resolv.conf from netconfig
 * after upgrade: /etc/resolv.conf from netconfig, all correct
 => NO ISSUE

I think both laptops had their last fresh installation with 42.2 and were zypper dup'd from there.
Comment 16 Ludwig Nussel 2018-05-14 12:12:47 UTC
(In reply to Fabian Vogt from comment #14)
> (In reply to Ludwig Nussel from comment #13)
> > Fabian told me that he can reproduce the issue even without upgrading.
> > Fabian, can you check the journal and also if
> > https://build.opensuse.org/request/show/606917 helps?
> 
> I'm not sure whether it's the exact same issue - just that starting from a
> certain point in time netconfig thinks I modified the file although I
> already tried mv /etc/resolv.conf.{netconfig,} and rm /etc/resolv.conf. The
> new generated file does not contain the correct DNS entries (none,
> actually), but resolv.conf.netconfig always does.

calling "netconfig update -f" should restore /etc/resolv.conf in a way that netconfig keeps updating it again. If from that point on NM again writes to /etc/resolv.conf itself then I'd be interested in the journal output.
 
> I installed NetworkManager from GNOME:Factory now and will report back.
> 
> Journal:
> 
> Mai 14 07:56:03 linux-e202.suse.de dns-resolver[31222]: ATTENTION: You have
> modified /etc/resolv.conf. Leaving it untouched...

That's what netconfig says if resolv.conf already was modified. We are currently looking for the point in time where NM decides to not use netconfig.
Comment 17 Jonathan Kang 2018-05-15 09:49:27 UTC
(In reply to Fabian Vogt from comment #14)
> Journal:
> 
> Mai 14 07:56:03 linux-e202.suse.de NetworkManager[22647]: <warn> 
> [1526277363.5741] dns-mgr: could not commit DNS changes: Error calling
> netconfig: exited with status 20

Looking at the source code of netconfig, it says:

> Exit codes

> 0  success
> 1  error
> 20 a module was not able to change the configuration
>    because it was changed since the last run

Does this mean anything to you guys?
Comment 18 Ludwig Nussel 2018-05-16 11:51:47 UTC
I guess that's what it says when resolv.conf was modified
Comment 19 Andrei Borzenkov 2018-05-19 05:44:49 UTC
I switched Leap 42.3 to NetworkManger instead of wicked and after reboot I got /etc/resolv.conf.netconfig and /etc/resolv.conf created directly by NetworkManager. In logs I have

May 19 08:28:40 leap423 NetworkManager[1265]: <info>  Networking is enabled by state file
May 19 08:28:41 leap423 dns-resolver[1851]: ATTENTION: You have modified /etc/resolv.conf. Leaving it untouched...
May 19 08:28:41 leap423 dns-resolver[1862]: You can find my version in /etc/resolv.conf.netconfig
May 19 08:28:42 leap423 NetworkManager[1265]: <13>May 19 08:28:41 dns-resolver: ATTENTION: You have modified /etc/resolv.conf. Leaving it untouched...

And resolv.conf says

# Generated by NetworkManager
search testdom
nameserver 10.0.2.3

Still I would expect NM to continue to generate resolv.conf also after update just like it does with 42.3?
Comment 20 Andrei Borzenkov 2018-05-19 06:04:26 UTC
OK, I tested switch in Leap 42.3 once more.

1. Reverted back to wicked, netconfig update -f
2. Switched to NM again - resolv.conf unchanged
3. Reboot - still have the same unchanged (netconfig) resolv.conf
4. Power off/Power on - still good resolv.conf

So it looks like sporadic error, probably race condition. No idea how to trigger it on purpose.
Comment 21 Fabian Vogt 2018-05-19 09:26:52 UTC
(In reply to Andrei Borzenkov from comment #19)
> I switched Leap 42.3 to NetworkManger instead of wicked and after reboot I
> got /etc/resolv.conf.netconfig and /etc/resolv.conf created directly by
> NetworkManager. In logs I have
[...]

Can you upload the zypp history logfile of the upgrade and the journal during the upgrade? It might contain some hints.
Comment 22 Andrei Borzenkov 2018-05-19 11:23:27 UTC
(In reply to Fabian Vogt from comment #21)
> Can you upload the zypp history logfile of the upgrade and the journal
> during the upgrade? It might contain some hints.

There was no upgrade. It was all under 42.3. Sorry, I do not have this VM anymore and I am not able to reproduce it again.
Comment 23 Fabian Vogt 2018-05-19 11:39:56 UTC
(In reply to Andrei Borzenkov from comment #22)
> (In reply to Fabian Vogt from comment #21)
> > Can you upload the zypp history logfile of the upgrade and the journal
> > during the upgrade? It might contain some hints.
> 
> There was no upgrade. It was all under 42.3. Sorry, I do not have this VM
> anymore and I am not able to reproduce it again.

Ok, then I misunderstood.

I don't think that this is this same bug as the 42.3 -> 15.0 upgrade, where NM is used before and after.
Comment 24 Ludwig Nussel 2018-05-23 09:17:08 UTC
From what we know the bug is more or less unrelated to upgrades to 15.0, it may happen on 42.3 also. That's why I rated it down.
Nevertheless IMO the code that sends SIGTERM to netconfig is just incorrect, that asks for races and needs to be fixed. Jonathan, could you please modify the code in a way to not send any signals to netconfig until some timeout? netconfig is expected to terminate itself when done.

We need an online update for 42 and 15 then.
Comment 25 Jonathan Kang 2018-05-28 09:47:30 UTC
(In reply to Ludwig Nussel from comment #24)
> From what we know the bug is more or less unrelated to upgrades to 15.0, it
> may happen on 42.3 also. That's why I rated it down.
> Nevertheless IMO the code that sends SIGTERM to netconfig is just incorrect,
> that asks for races and needs to be fixed. Jonathan, could you please modify
> the code in a way to not send any signals to netconfig until some timeout?

Emm, I think the issue is netconfig exits itself with status code 20, instead of
NetworkManager terminating it.

We might need cc netconfig maintainer to explain what exit code 20 means.
Comment 26 Jonathan Kang 2018-05-28 10:15:53 UTC
Can someone who is having this issue provide the output of
> netconfig update -v

It might contain some useful information.
Comment 27 Tony Su 2018-05-28 18:54:04 UTC
Decided to take a look at this in a test virtual machine, upgrading 42.3 > 15.

Machine was working well using Wicked.
Changed to NM, rebooted and verified /etc/resolv.conf now generated by NM.

Did online upgrade following SDB:System Upgrade.
About 2/3 through upgrade, the upgrade threw an error complaining about failed name resolution for https:download.opensuse.org,
But when I opened a windowed terminal, I found that more than just name resolution was failing, Still had an IPv4 address, but PING Internet IPv4 addresses and running nslookup also failed (Network unreachable)

Upgrade was then aborted, but on reboot the system superficially looks like the upgrade completed successfully, but with name resolution still broken.
Tried switching from NM back to Wicked, but that still fails.
Tried running a DVD Upgrade on top of the existing, but that fails.

If anyone is interested in this scenario as I've described it, here is a link to download the zypper.log for the failed upgrade (15MB so can't be posted as an attachment or to any pastebin I know of)

https://www.dropbox.com/s/iklaces4yoy6omq/zypper.log?dl=0

I'll keep this VM around for a little bit if anyone wants me to run some kind of analysis but I'm guessing it's toast.
Comment 28 Tony Su 2018-05-28 19:19:02 UTC
FWIW -
I just fired up the VM I just posted about, and now name resolution is working(currently switched back to wicked).

Running "zypper up" it looks like the system understood that it's missing a number of packages and patterns, and is now offering to install/upgrade 113 packages. Running "zypper dup" returns the same.

So,
It seems this test machine is righting itself.
Still, that logfile for the failed upgrade might contain something useful...

And that old saying about "When your system isn't working, then just keep rebooting" seems to working for this machine... :)

Because this machine's name resolution is working, I don't think running the netconfig command is going to be helpful here. Will do so if the machine acts up again.

If it's helpful, perhaps to locate the timing when the upgrade failed I'm posting the current result of "zypper dup" to identify what didn't get upgraded

 zypper dup
Warning: You are about to do a distribution upgrade with all enabled repositories. Make sure these repositories are compatible before you continue. See 'man zypper' for more information about this command.
Loading repository data...
Reading installed packages...
Computing distribution upgrade...

The following application is going to be installed:
  Falkon

The following 75 NEW packages are going to be installed:
  falkon falkon-gnome-keyring falkon-lang gcc gcc7 glibc-devel gnome-keyring-32bit
  gnome-keyring-pam-32bit hplip-sane kernel-default-4.12.14-lp150.12.4.1
  kernel-default-devel-4.12.14-lp150.12.4.1 kernel-devel-4.12.14-lp150.12.4.1 libasan4 libcilkrts5
  libcrypto43 libcups2-32bit libdcerpc0-32bit libdcerpc-binding0-32bit libdns_sd libelf-devel
  libffi7-32bit libgmp10-32bit libgnutls30-32bit libhogweed4-32bit libidn2-0-32bit liblsan0 liblxqt0
  libmpx2 libmpxwrappers2 libndr0-32bit libndr-krb5pac0-32bit libndr-nbt0-32bit libndr-standard0-32bit
  libnetapi0-32bit libnettle6-32bit libnss_nis2 libnss_nis2-32bit libopenmpt0 libp11-kit0-32bit
  libQt5Xdg3 libQt5XdgIconLoader3 libsamba-credentials0-32bit libsamba-hostconfig0-32bit
  libsamba-passdb0-32bit libsamba-util0-32bit libsamdb0-32bit libsmbconf0-32bit libsmbldap2-32bit
  libtasn1-6-32bit libubsan0 libunistring2-32bit libutf8proc2 libwbclient0-32bit lksctp-tools
  lxqt-session pavucontrol-qt-lang python2-certifi python2-chardet python2-libvirt-python
  python2-ndg-httpsclient python2-py python2-pycurl python2-pyOpenSSL python2-PySocks python2-requests
  python2-urllib3 samba-client-32bit samba-kdc-32bit samba-libs-32bit samba-winbind-32bit star-rmt
  terminfo-iterm terminfo-screen zlib-devel zypper-migration-plugin

The following 2 applications are going to be REMOVED:
  "Firefox Web Browser" QupZilla

The following 14 packages are going to be REMOVED:
  cups-libs-32bit libffi4 libgif6 libruby2_1-2_1 libvirt-python libvpx1 lxqt-common python-pycurl
  python-pyOpenSSL python-requests qupzilla qupzilla-gnome-keyring ruby2.1 ruby2.1-stdlib

The following 113 packages are going to be upgraded:
  bind-utils bridge-utils compton-conf-lang fipscheck gd icewm-lite kernel-macros libadns1 libatomic1
  libbind9-160 libdbus-1-3-32bit libdns169 libdvdnav4 libfipscheck1 libfm libfm4 libfm-extra4 libfm-gtk4
  libfm-lang libfm-qt-lang libgmime-2_6-0 libImlib2-1 libirs160 libisc166 libisccc160 libisccfg160
  libitm1 libldb1-32bit liblwres160 liblxqt-globalkeys0 liblxqt-globalkeys-ui0 liblxqt-lang
  libmenu-cache3 libncurses5 libopenssl1_0_0-32bit libpcre16-0 libpython2_7-1_0-32bit libqtermwidget5-0
  libsamba-errors0-32bit libsnapper4 libsysstat-qt5-0 libsystemd0-32bit libtdb1-32bit
  libtevent-util0-32bit libtsan0 libxcb-xf86dri0 libXfont1 linuxconsoletools linux-glibc-devel
  lximage-qt-lang lxmenu-data lxqt-about-lang lxqt-admin-lang lxqt-config-lang lxqt-globalkeys-lang
  lxqt-l10n lxqt-notificationd-lang lxqt-openssh-askpass-lang lxqt-panel-lang lxqt-policykit-lang
  lxqt-powermanagement-lang lxqt-runner-lang lxqt-session-lang lxqt-sudo-lang man-pages menu-cache
  Mesa-libGLESv2-2 MozillaFirefox ntp obconf-qt-lang patterns-base-apparmor patterns-base-apparmor_opt
  patterns-base-base patterns-base-basesystem patterns-base-enhanced_base
  patterns-base-enhanced_base_opt patterns-base-minimal_base patterns-base-sw_management
  patterns-base-x11 patterns-base-x11_enhanced patterns-base-x11_opt pavucontrol-qt pcmanfm pcmanfm-lang
  pcmanfm-qt-lang psmisc psmisc-lang python3-bind qterminal qterminal-lang qtermwidget-qt5-data
  rollback-helper sharutils sharutils-lang snapper snapper-zypp-plugin star SUSEConnect systemd-logger
  tack terminfo timezone timezone-java virtualbox-guest-kmp-default virtualbox-guest-tools
  virtualbox-guest-x11 yast2-country yast2-country-data yast2-installation yast2-packager
  yast2-pkg-bindings yast2-storage-ng zsh

The following 11 patterns are going to be upgraded:
  apparmor apparmor_opt base basesystem enhanced_base enhanced_base_opt minimal_base sw_management x11
  x11_enhanced x11_opt

The following 60 packages are going to be downgraded:
  avahi-autoipd color-filesystem cvs cvsps finger gnome-icon-theme gnome-icon-theme-extras
  gnome-icon-theme-symbolic google-opensans-fonts icc-profiles icc-profiles-basiccolor-lstarrgb
  icc-profiles-basiccolor-printing2009-coat2 icc-profiles-lcms-lab icc-profiles-mini
  icc-profiles-openicc-rgb intlfonts java-1_8_0-openjdk java-1_8_0-openjdk-headless ksh libaspell15
  libaudiofile1 libavahi-client3-32bit libavahi-common3-32bit libfam0-gamin-32bit libgltf-0_1-1
  libgssglue1 libmodplug1 libmozjs-17_0 libmuparser2_2_5 libobrender32 libobt2 libpisock9 libreadline6
  libsmi libsmi2 libtalloc2-32bit libtevent0-32bit libtidyp-1_04-0 man-pages-posix master-boot-code
  metamail mlocate mlocate-lang mpt-status nss-mdns-32bit obconf openbox OpenPrintingPPDs procinfo
  procmail python-talloc-32bit python-urlgrabber setserial tcpdump telnet udhcp unoconv vlan
  xdm-xsession ypbind

The following package is going to change architecture:
  sharutils-lang  x86_64 -> noarch

113 packages to upgrade, 60 to downgrade, 75 new, 14 to remove, 1 to change arch.
Overall download size: 254.9 MiB. Already cached: 0 B. After the operation, additional 459.5 MiB will be
used.
Continue? [y/n/...? shows all options] (y):
Comment 29 Jonathan Kang 2018-05-29 08:21:12 UTC
cc sysconfig maintainer.

@Marius
Could you please help look at this issue? There are some logs in comment#14
indicating that netconfig exited with status code 20, when NetworkManager
called netconfig to update dns settings.

Thanks
Comment 30 Marius Tomaschewski 2018-05-30 08:55:29 UTC
(In reply to Jonathan Kang from comment #29)
> cc sysconfig maintainer.
> 
> @Marius
> Could you please help look at this issue? There are some logs in comment#14
> indicating that netconfig exited with status code 20, when NetworkManager
> called netconfig to update dns settings.

The code 20 means, that there was some external modifications (e.g. NM wrote
it directly instead to use netconfig) to the files and netconfig refuses to
touch them as it shows in the message it also prints + logs in the log e.g.:

xanthos:~ # echo "nameserver 8.8.8.8" >> /etc/resolv.conf
xanthos:~ # netconfig update 
<13>May 30 10:32:53 dns-resolver: ATTENTION: You have modified /etc/resolv.conf. Leaving it untouched...
<13>May 30 10:32:53 dns-resolver: You can find my version in /etc/resolv.conf.netconfig
ATTENTION: You have modified /etc/resolv.conf.  Leaving it untouched...
You can find my version in /etc/resolv.conf.netconfig ...
xanthos:~ # echo $?
20

You can call "netconfig update -f" to enforce to update the files:

# netconfig update -f
<13>May 30 10:35:59 dns-resolver: force replace set: backup created as /etc/resolv.conf.20180530-103559

(BTW: The /etc/sysconfig/network/config NETCONFIG_FORCE_REPLACE=yes
 variable causes to enforce to overwrite instead to honor e.g. manual
 changes that the admin made.)


This issue is nothing new and will persist as long as NetworkManager
continues to sometimes break the policies and write the files directly
instead to use netconfig.
Comment 31 Marius Tomaschewski 2018-05-30 09:02:38 UTC
(In reply to Ludwig Nussel from comment #16)
[...]
> > Mai 14 07:56:03 linux-e202.suse.de dns-resolver[31222]: ATTENTION: You have
> > modified /etc/resolv.conf. Leaving it untouched...
> 
> That's what netconfig says if resolv.conf already was modified. We are
> currently looking for the point in time where NM decides to not use
> netconfig.

Exactly. Instead to write it directly (on code 20 or any another occasion)
call "netconfig update -f" when a netconfig call returns with code 20.
Comment 32 Ludwig Nussel 2018-05-30 12:57:48 UTC
which leads us back to comment #6
Comment 33 Jonathan Kang 2018-06-01 09:18:36 UTC
(In reply to Marius Tomaschewski from comment #30)
> This issue is nothing new and will persist as long as NetworkManager
> continues to sometimes break the policies and write the files directly
> instead to use netconfig.

I see. I'll propose a patch to NM fixing this. Make NetworkManager force update
/etc/resolv.conf using netconfig when netconfig exits with code 20.

One question about this. NetworkManager spawns
> /sbin/netconfig modify --service NetworkManager
and writes
> INTERFACE='NetworkManager'
> DNSSEARCH='abc'
> DNSSERVERS='xyz'
to netconfig.
To force update /etc/resolv.conf using netconfig, should NetworkManager spawn
> /sbin/netconfig modify -f --service NetworkManager
and write related information to netconfig?

> 
> You can call "netconfig update -f" to enforce to update the files:
> 
And for anyone who is having this issue, run "netconfig update -f" to update
/etc/resolv.conf. You won't encounter this issue any more.
Comment 34 Jonathan Kang 2018-06-01 09:24:26 UTC
(In reply to Marius Tomaschewski from comment #31)
> 
> Exactly. Instead to write it directly (on code 20 or any another occasion)
> call "netconfig update -f" when a netconfig call returns with code 20.

But what if users want to modify /etc/resolv.conf a bit. If NM force update
/etc/resolv.conf using netconfig when netconfig exits with code 20, users won't
be able to modify resolv.conf the way the want.
Comment 35 Frank Krüger 2018-06-01 17:51:30 UTC
(In reply to Jonathan Kang from comment #33)
> (In reply to Marius Tomaschewski from comment #30)
> > This issue is nothing new and will persist as long as NetworkManager
> > continues to sometimes break the policies and write the files directly
> > instead to use netconfig.
> 
> I see. I'll propose a patch to NM fixing this. Make NetworkManager force
> update
> /etc/resolv.conf using netconfig when netconfig exits with code 20.
> 
> One question about this. NetworkManager spawns
> > /sbin/netconfig modify --service NetworkManager
> and writes
> > INTERFACE='NetworkManager'
> > DNSSEARCH='abc'
> > DNSSERVERS='xyz'
> to netconfig.
> To force update /etc/resolv.conf using netconfig, should NetworkManager spawn
> > /sbin/netconfig modify -f --service NetworkManager
> and write related information to netconfig?
> 
> > 
> > You can call "netconfig update -f" to enforce to update the files:
> > 
> And for anyone who is having this issue, run "netconfig update -f" to update
> /etc/resolv.conf. You won't encounter this issue any more.

Is this for granted? I vaguely remember that when the issue appeared in Tumbleweed some users reported in the factory mailing list that “netconfig update -f“ did not solve the problem.
Comment 36 Ludwig Nussel 2018-06-04 09:06:56 UTC
Just make NM to never write resolv.conf manually. Users who fell into the trap now have to run "netconfig update -f" manally once but after that everything should be fine.
Comment 37 Jonathan Kang 2018-06-05 02:54:55 UTC
Current status of NetworkManager is described below:

1. when netconfig is not found in system, NM updates resolv.conf.
2. when NM spawns netconfig and netconfig exits with status blabla, NM won't
   try to update resolv.conf any more.
3. when NM fails to spawn netconfig, NM won't try to update resolv.conf any more
   either.

There are minor things(error handling, etc) to be fixed at NM side. And I'll
submit patch for that. Overall, NM won't write /etc/resolv.conf when rc-manager
is set to netconfig, which is the default for openSUSE.

For users who is having this issue, run "netconfig update -f" manually.
Comment 38 Hans de Raad 2018-06-13 07:27:52 UTC
FWIW, Tumbleweed snapshot 20180606 also has this issue, I didn't see that mentioned in this thread yet so just to be sure you are aware this is as far as I can tell not just an issue from a while back (as mentioned in the first issue description). And running netconfig update -f for me does not reliably fix the issue.
When I am switching networks often (e.g. at a conference when switching between conference WIFI and mobile tethering) and also enabling/disabling VPN connections I experience this issue (in)frequently.
Running netconfig update -f does solve the issue every time, yet it does reappear after switching networks a couple of times, can't exactly put my finger on the exact sequence yet, if I find some time I can try to dive a bit further.
Comment 39 Jonathan Kang 2018-06-15 02:28:58 UTC
The issue seems to be NetworkManager writes *nothing* to /etc/resolv.conf at some
point.

Can someone who met this issue add
> [logging]
> level=debug
to /etc/NetworkManager/NetworkManager.conf to enable debugging information. When
the issue happens again, check related logs to see if there is something useful.
Comment 40 Peter Krútel 2018-07-13 06:37:37 UTC
Hello,
following bug title I have one reproducible scenario, when during boot /etc/resolv.conf is not updated and "netconfig update -f" removes nameservers from /etc/resolv.conf (only search part is preserved). I did upgrade 42.3 -> 15.0

HW setup:
Notebook in docking station
Two network interfaces:
eth0 - standard notebook network interface with plugged in cable connected to main network
eth1 - USB2Ethernet adapter. It does not matter, if ethernet cable is plugged into adapter or how is plugged other end of ethernet cable (disconnected or connected somewhere). It is just enough, that such adapter is plugged in docking station.

In docking station are connected headset (analog), keyboard (USB), mouse (USB), cooler (USB), USB2Ethernet adapter and two external monitors (DP)

Problem:
System starts up and shows login screen on graphical console. When you login via graphical console, system freezes and all screens are black.

Symptoms:
After system starts, login as root to system console and then:
1. 'ip a' - not finished, no output
2. /etc/resolv.conf not refreshed
3. "netconfig update -f" removes nameservers from /etc/resolv.conf (only "search .." is preserved)

Graphical console login as standard user
1. rtkit-daemon timeouts; rtkit-daemon cannot be started

Workaround:
Disconnect USB2Ethernet adapter from USB before boot. System starts normally, no problems with login via graphical console and system runs as expected. If USB2Ethernet adapter is connected during runtime, everything works as expected.

Other tests:
USB2Ethernet removed, cable plugged in eth0 but other end of cable not connected - system starts normally and runs as expected

My USB2Ethernet adapter is D-Link DUB-E100 HW version C1

With mentioned HW setup there was no problem in 42.3 .

Please let me know, if I should open separate bug for mentioned problem.
Comment 42 ede rag 2018-07-29 20:11:54 UTC
Same here, after a fresh install (_not_ an update) of
Leap-15 over Leap-42.2/networkmanager.
After the install, hosts were unreachable.
In YaST, changed from wicked to networkmanager, not better.

/etc/resolv.conf had only commented out lines.
After
sudo netconfig update -f
<13>Jul 29 21:45:31 dns-resolver: force replace set: backup created as /etc/resolv.conf.20180729-214531
the connection was fine.
(and 2 nameserver lines are now in /etc/resolv.conf)
Comment 43 Stefan Willer 2018-09-10 10:47:07 UTC
Just ran into this on Tumbleweed sometime after installing >1000 updates via "zypper dup". Renaming "/etc/resolv.conf" worked. I am not so sure I should be forced to do this as an end user, though. Maybe there should be a section in the manual: "How to repair your system after we bricked it?"

Very much looking forward to a fix!
Comment 44 Jonathan Kang 2018-09-20 07:14:45 UTC
(In reply to Stefan Willer from comment #43)
> Just ran into this on Tumbleweed sometime after installing >1000 updates via
> "zypper dup". Renaming "/etc/resolv.conf" worked. I am not so sure I should
> be forced to do this as an end user, though. Maybe there should be a section
> in the manual: "How to repair your system after we bricked it?"
> 
> Very much looking forward to a fix!

run "netconfig update -f" should be able to fix this issue.

I wonder if logs from the boot when you upgrade Tumbleweed and the one after
the upgrade is still available. Those can be very helpful. Please provide the
whole output of "journalctl -b" for those two boots.

Thanks.
Comment 45 S. B. 2019-01-13 04:20:00 UTC
(In reply to Stefan Willer from comment #43)
> Just ran into this on Tumbleweed sometime after installing >1000 updates via
> "zypper dup". Renaming "/etc/resolv.conf" worked. I am not so sure I should
> be forced to do this as an end user, though. Maybe there should be a section
> in the manual: "How to repair your system after we bricked it?"
> 
> Very much looking forward to a fix!

This is still happening. I'm currently trying to walk a clueless user through the upgrade process, and this hit him. This is something like the 10th system I've upgraded to 15.0, and 100% of them have hit this bug. Unfortunately this makes openSUSE look really bad...
Comment 46 Marius Tomaschewski 2019-01-16 17:01:35 UTC
workaround: NETCONFIG_FORCE_REPLACE=yes in /etc/sysconfig/network/config changes
the behavior to always force and overwrite like "netconfig update -f" does.
Comment 47 Jonathan Kang 2019-04-18 06:59:52 UTC
For whoever met this issue, he/she must have /etc/resolv.conf generated by
NetworkManager before upgrading. This can be caused by bug#960153, which is fixed
later on. This is why if you upgrade from a clean-installed and fully upgraded
42.3 to 15.0, you cannot reproduce this issue.

The real issue is that in 42.3, NetworkManager overwrites /etc/resolv.conf due to
bug#960153 or user modified resolv.conf. This has been fixed in later
NetworkManager version, which is included in Leap 15.0 and Tumbleweed.

So before upgrading to Leap 15.0, double check if /etc/resolv.conf is generated
by NetworkManager. If yes, run "netconfig update -f" to update it.

But the original "empty /etc/resolv.conf" issue reported cannot be identified
without NetworkManager logs.

As for now, NetworkManager won't overwrite /etc/resolv.conf anymore. As long as
you have /etc/resolv.conf generated by netconfig, everything should be okay.
Comment 48 Jonathan Kang 2019-08-13 06:19:34 UTC
Closing this. If you're still experiencing this at later Leap versions, feel free
to report it with detailed NetworkManager logs attached. Thanks.