Bugzilla – Bug 657627
svn crash after syncing
Last modified: 2012-03-30 12:28:39 UTC
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; fr; rv:1.9.2.12) Gecko/20101026 SUSE/3.6.12-1.2 Firefox/3.6.12 When I run svn checkout, it crashes because of a missing symbol. Reproducible: Always Steps to Reproduce: 1.svn checkout http://svn.codehaus.org/modello/tags/modello-1.1 2. 3. Actual Results: svn: symbol lookup error: /usr/lib64/libkdecore.so.5: undefined symbol: _ZN9QListData6removeEi
Problem of coolos magic factory rebuilding scripts
this is plain wrong. And _if_ kdelibs had undefined symbols after recompiling Qt, there would still be a bug.
bug 660116 might be related
(In reply to comment #2) > this is plain wrong. Pretty much messed up. And _if_ kdelibs had undefined symbols after recompiling > Qt, there would still be a bug. Does #export LD_BIND_NOW=true svn co ... works ?
(In reply to comment #4) > #export LD_BIND_NOW=true This fixes the problem for me.
Bug in the linker...here we go...
KDe devs: Hint: workaround this by linking /usr/lib64/libkdecore.so.5 with "-Wl,-z,relro,-z,now".
*** Bug 661397 has been marked as a duplicate of this bug. ***
Right now, I do not have any personal machine running Factory, so I can't check for myself - but probably won't be able to until the end of the year anyway. In the meantime, for someone who can do this easily - can you please attach the output of LD_DEBUG=all svn ... ?
*** Bug 660116 has been marked as a duplicate of this bug. ***
Created attachment 406285 [details] Output of "LD_DEBUG=all svn up" And the subversion package is: # rpm -q subversion subversion-1.6.13-3.1.x86_64
*** Bug 639071 has been marked as a duplicate of this bug. ***
BTW I have noticed that the linker error is not printed for some subcommands, e.g. 'svn up' prints the error, while 'svn diff' doesn't. So it might be triggered by some svn plugin...
I think it has to do with the kwallet integration, see https://bugzilla.novell.com/show_bug.cgi?id=660116#c0
I get the same symbol lookup error when closing firefox, so this is not necessarily only bound to svn.
Forgot to add the line in case that matters: /usr/lib/firefox/firefox: symbol lookup error: /usr/lib/libkdecore.so.5: undefined symbol: _ZN9QListData6removeEi This is with KDE from KDF, i.e. KDE 4.6 RC1.
(In reply to comment #15) > I get the same symbol lookup error when closing firefox, so this is not > necessarily only bound to svn. Correct, as I told before, this is a bug in the linker, unrelated with KDE/firefox whatever, they happend to trigger the bug.
Installing libsvn_auth_kwallet-1-0 masks this bug. Anyone with linking expertise care to tell me why?
Workaround for Firefox cases (reported in bug #639071): Uninstall libproxy1-config-kde4
Hello I get this message on using 'zypper up' in factory. zypper: symbol lookup error: /usr/lib64/libkdecore.so.5: undefined symbol: _ZN9QListData6removeE An example run # zypper up Loading repository data... Warning: Repository '11.4updates' appears to outdated. Consider using a different mirror or server. Reading installed packages... The following package update will NOT be installed: libxine1 The following NEW packages are going to be installed: exo exo-lang libgarcon-branding-upstream The following package is going to be REMOVED: libexo-1-0 The following packages are going to be upgraded: digikam-lang libgarcon-1-0 xfce4-notifyd 3 packages to upgrade, 3 new, 1 to remove. Overall download size: 2.3 MiB. After the operation, additional 1.7 MiB will be used. Continue? [y/n/?] (y): n zypper: symbol lookup error: /usr/lib64/libkdecore.so.5: undefined symbol: _ZN9QListData6removeEi # rpm -qa | grep -i zypper zypper-1.5.3-2.1.x86_64
Ping. This breaks zypper and svn! Glenn: Workarounds * export LD_BIND_NOW=true or * rpm -e libproxy1-config-kde4
Petr: any hope on fixing this at the linker level or should be apply some workaround ? this is a butt-ugly bug, it doesnt go unnoticed.
(In reply to comment #20) > Continue? [y/n/?] (y): n > zypper: symbol lookup error: /usr/lib64/libkdecore.so.5: undefined symbol: > _ZN9QListData6removeEi > > # rpm -qa | grep -i zypper > zypper-1.5.3-2.1.x86_64 Can anyone explain to me WHY on earth it fails with zypper ? does this has something to do with Konsole or kwallet .. otherwise does not make sense. Ok, gonna try to provide a workaround for KDE, hope I wont loose my hair in the attempt ;-)
@Cristian I guess zypper uses libproxy and libproxy's plugin to read KDE's proxy settings that I mentioned above runs into this bug. Yes it freaked me out too.
I have this on top of my TODO list. It would really help if I could access some actual system where this happens so I could reproduce and debug this properly. So far, I could not reproduce it in a simple chroot.
Ok folks, in the meanwhile this bug is fixed in the linker, I want you to try this packages http://download.opensuse.org/repositories/home:/elvigia:/branches:/KDE:/Distro:/Factory/ (if 404, they are not yet published) And report back if this "solves" the issue.
Can you provide x86_64 version of this repository ?
(In reply to comment #27) > Can you provide x86_64 version of this repository ? It is provided, unfortunately, packages are still being built by the OBS, check it out later.
(In reply to comment #27) > Can you provide x86_64 version of this repository ? The are now online.
Using these packages with : svn checkout http://svn.codehaus.org/modello/tags/modello-1.1 does not trigger the symbol lookup error on my system. However, using KDE packages from Milestone 6 still does trigger the bug...
(In reply to comment #30) > Using these packages with : > svn checkout http://svn.codehaus.org/modello/tags/modello-1.1 > does not trigger the symbol lookup error on my system. Right, however I need more information, this workaround MAY cause some performance degradation, does your apps or your desktop "feel" more snappy? does it take longer time to load ? any side effect ?
No performance hit so far, at least I did not notice any (I don't know which benchmark to run...) Applications takes the same time to load.
(In reply to comment #32) > Applications takes the same time to load. Yeah, that's likely because libkdecore* is probably loaded with the desktop enviroment itself and other programs do not incurr in the hit afterwards. While there is a performance impact, as all symbols have to be inmediately resolved in a non-lazy way, this workaround is effective and potentially makes stuff more secure. SUbmitted workaround to KDE repos as SR#59399
(In reply to comment #32) > No performance hit so far, at least I did not notice any (I don't know which > benchmark to run...) # LD_DEBUG=statistics yourapp No hair has been lost in the process ;-)
(In reply to comment #20) I can see it on 11.4 x86 with any zypper operation.
Another example # zypper ref Repository '11.4non-oss' is up to date. Repository '11.4oss' is up to date. Repository '11.4updates' is up to date. Repository 'tumbleweed' is up to date. All repositories have been refreshed. zypper: symbol lookup error: /usr/lib64/libkdecore.so.5: undefined symbol: _ZN9QListData6removeEi
Please. Anyone provide Petr with a system showing the symptoms. Adding workarounds like that to kdelibs is not going to help fixing the (presumed) bug in the loader.
@Michael Petr This is how you can quickly reproduce it Just fire a factory installation kde ( I think the last .iso live so should it too) After that try zypper it works. then install kdesvn zypper in kdesvn ( which push deps ) then try in konsole svn co whatever-repo-you-have The undefine symbol just come ....
The provided LD_DEBUG trace gives some hints. The first occurrence of libkdecore.so.5/libQtCore.so.4 is when /usr/lib64/libproxy-0.4.6/modules/config_kde4.so is dlloaded. From then on it's searched for symbols. But not for long. In fact it's searched only when resolving symbols from config_kde4.so. As soon as no symbols have to be resolved from that plugin anymore, libQtCore.so.4 is not in any list anymore. libproxy uses libmodman for loading plugins, which in turn uses dlopen (..., RTLD_LAZY | RTLD_LOCAL) to load the shared objects. This all hints at a problem in ld.so, that DSOs loaded as dependencies for RTLD_LOCAL are not searching in their local scope tree for resolving symbols (the one started with the dlopen'ed DSO), but rather in the global scope, where they of course don't find anything if they themself depend on dependencies loaded only into their local tree.
Michael: But all the modules should have been long unloaded... It is rather mysterious where that resolution request is actually coming from. It might be a stray destructor that ought to have been called long ago, or some lingering hook. Can someone who can reproduce this retrieve a backtrace? The top of it (ld.so) is not as important as what is below, i.e. what triggers resolution of this symbol. If you could attach a core dump as well, that would be ideal. Also, it would be great if someone could try if this still happens with glibc packages available at http://suse.de/~pbaudis/bug-657627/ - they contain a ld.so fix that probably does not cover this, but it might be related.
Created attachment 412250 [details] "svn up" in gdb Here's a backtrace from "svn up" - I hope my usage of gdb was correct... Unfortunately I miss some debuginfo packages - not sure if they are still available or if they were replaced by newer ones in the meantime.
Oh, so it *is* called from a destructor. Interesting, thanks! In that case, there is somewhat higher chance that the glibc packages I posted above might work. Can someone test them, please? In the meantime, I will continue working on trying to reproduce this.
(In reply to comment #42) > Oh, so it *is* called from a destructor. Interesting, thanks! In that case, > there is somewhat higher chance that the glibc packages I posted above might > work. Can someone test them, please? In the meantime, I will continue working > on trying to reproduce this. No luck with these glibc packages either.
Hmm, I still cannot reproduce this; libproxy is not loading config-kde4 at all. Do you have /etc/proxy.conf or ~/.proxy.conf, and what is its contents?
(In reply to comment #44) > Do you have /etc/proxy.conf or ~/.proxy.conf, and what is its contents? None of those file exists on my system, and I don't use a proxy. (However, I used a proxy for a short time - but that was months ago and still with 11.3. Maybe there is something left in my ~/.kde* config files...) OTOH, I can also reproduce the bug with my "beta" user I created with 11.4 M5. Therefore I'd say this isn't caused by a leftover in ~/.whatever. I can still reproduce the bug with your test packages from comment #40 and current (well, from yesterday) Factory. This means I can install some debuginfo packages (if you tell me which ones you need), and can also create a core dump of "svn up" with my "beta" user if you tell me how to do that.
*** Bug 668192 has been marked as a duplicate of this bug. ***
Petr: note the number of 'calling ' lines in the LD_DEBUG dump. In particular /usr/lib64/libproxy-0.4.6/modules/config_kde4.so is inited and finited multiple times. libkdecore.so.5 is inited only once. So ld.so definitely created a dependency from somewhere to libkdecore.so.5, so that it isn't unloaded together with config_kde4.so . I'm talking about this sequence: 12012: calling init: /usr/lib64/libQtCore.so.4 12012: calling init: /usr/lib64/libQtXml.so.4 12012: calling init: /usr/lib64/libQtDBus.so.4 12012: calling init: /usr/lib64/libQtNetwork.so.4 12012: calling init: /usr/lib64/libkdecore.so.5 12012: calling init: /usr/lib64/libproxy-0.4.6/modules/config_kde4.so 12012: calling init: /usr/lib64/gconv/UTF-16.so 12012: calling init: /usr/lib64/libproxy-0.4.6/modules/network_networ 12012: calling init: /usr/lib64/libproxy-0.4.6/modules/config_gnome.s 12012: calling fini: /usr/lib64/libproxy-0.4.6/modules/config_gnome.s 12012: calling init: /lib64/libnss_files.so.2 12012: calling fini: /usr/lib64/libproxy-0.4.6/modules/config_kde4.so 12012: calling fini: /usr/lib64/libproxy-0.4.6/modules/network_networ 12012: calling init: /usr/lib64/libproxy-0.4.6/modules/config_gnome.s 12012: calling fini: /usr/lib64/libproxy-0.4.6/modules/config_gnome.s 12012: calling init: /usr/lib64/libproxy-0.4.6/modules/config_kde4.so 12012: calling init: /usr/lib64/libproxy-0.4.6/modules/network_networ 12012: calling init: /usr/lib64/libproxy-0.4.6/modules/config_gnome.s 12012: calling fini: /usr/lib64/libproxy-0.4.6/modules/config_gnome.s 12012: calling fini: /usr/lib64/libproxy-0.4.6/modules/config_kde4.so 12012: calling fini: /usr/lib64/libproxy-0.4.6/modules/network_networ [last 'calling ' line before error] My guess would be either UTF-16.so or libnss_files.so.2 that result in the dependency that force libkdecore.so.5 live. In any case even in a case like this, where libkdecore.so.5 is destroyed later than the module initially requiring it, it still is the case that it must look up it's symbols in the scope under which it initially was loaded (as you see, also QtCore is not yet finited, so it's also still loaded, the symbol therein just aren't found). So irrespective if or if not there's an error in module unload order there definitely is a problem with symbol lookup.
Sigh. I also can't reproduce it no matter how hard I try. I meanwhile installed kdesvn and deps, still no luck. zypper and svn now do use /usr/lib64/libproxy-0.4.6/modules/config_kde4.so, and that module is loaded and unloaded multiple times, but no lookup error :-/ The relevant 'calling ' lines for me look like: 18902: calling init: /usr/lib64/libkdecore.so.5 18902: calling init: /usr/lib64/libproxy-0.4.6/modules/config_kde4.so 18902: calling fini: /usr/lib64/libproxy-0.4.6/modules/config_kde4.so 18902: calling init: /usr/lib64/libproxy-0.4.6/modules/network_network 18902: calling init: /usr/lib64/libproxy-0.4.6/modules/config_gnome.so 18902: calling fini: /usr/lib64/libproxy-0.4.6/modules/config_gnome.so 18902: calling init: /usr/lib64/libproxy-0.4.6/modules/config_kde4.so 18902: calling fini: /usr/lib64/libproxy-0.4.6/modules/config_kde4.so 18902: calling init: /lib64/libnss_files.so.2 A difference to the breaking one is that UTF-16.so isn't loaded. It's of course doubtful that this would be the reason; I don't know why it's loaded for Ladislav and I don't know how to make it loaded.
I can reproduce here, if you need any other info, just ask. My setup: Virtualbox with x86_64 factory, KDE must be running, installation is an out of the box LiveCD, and you must run the "svn" test as the user which is logged in KDE.
I've spent another night trying to build testcases to reproduce this, with no luck; and I cannot get the kde livecd to boot in my virtualbox, it aborts halfway throughout the progressbar screen... an ssh access to a broken system would really go a long way.
Share your ssh public key, I can arrange such thing.
Problem is, I cannot reproduce it using ssh... and remote KDE to here will be painfully slow..
My public key: ssh-dss AAAAB3NzaC1kc3MAAACBAIjEU1pp/Vh1U0ksLoFjIAG99bMH/4xn7gDvWyZEhSzKhROJ0dJHdFCHq0uhHAEwDnDcrAu5gltnpQLnNI3WyTN8YxvBHXjSqkMqe8giDlANo8RYVPyrDP9MG6Ucdjmo+CsVjN6PlLnXdjgjEJmVdi7GuspcxT/yyV30n9KDt0SZAAAAFQCn8ixqql6/RjFwLuHCfnPPRkI1VQAAAIEAiCXlZo4AXxjZQsu5ucnz6Swosh8bNTs2fyifbVPnV4DQ3z91WA4mrzF9LHi5cbl265eTCrJr3Lef0H2KWGnPtrnkzFyDXoFfaW1o/qCMnb72Xun8ohFzewUILOPIzQ9OTcVT0e4vJj45a4c+bX6XwpdJoBjmqo1GZvH70TzX1S4AAACAFyCapNJC1c76Klb8RJdfog0uyt9XwLbz1xKHp4E58CTM3O3vr1olz8u64zITKzfoJ+ZcTgQM3U5p+PHRh2YkF6xZ9sagS9k9M6WsBDAKQ0LsxkAYH0l8i0LlAStWHy7f7zvbZ5PN72HRoH671nG3qof6Q3S/+4s41+k32Rm7H/M= pasky@machine Isn't it possible to test it over ssh as if within a started KDE session by exporting some magic environment variables as in the session? $DISPLAY, $XDG_*, $DBUS_*... It would be great if this access could last at least until Monday; I hope to get back to this on Thursday/Friday, but I'm not 100% sure.
I've come somewhat further in reproducing it, I needed to install the live iso to find out, but it's not required. The key is the KDE_FULL_SESSION environment variable, it must be set to 'true' for the bug to reproduce (and the right packages must be installed of course). It's probably easiest to debug neither zypper nor svn, but the very small libproxy client itself (libproxy-tools.rpm), and with that I can reproduce it inside a normal build environment: # export BUILD_ROOT=/abuild/proxy-fuck # build -X libproxy-tools -X libproxy1-config-kde4 -X libproxy1-config-gnome rsh.spec ... # chroot $BUILD_ROOT % proxy --help direct:// % export KDE_FULL_SESSION=true % proxy --help direct:// proxy: symbol lookup error: /usr/lib64/libkdecore.so.5: undefined symbol: _ZN9QListData6removeEi Note that both, libproxy1-config-kde4 _and_ libproxy1-config-gnome are required (except their dependencies no other gnome or kde packages are necessary). My current hypothesis is, that the loading of UTF-16.so is required, which doesn't happen when the config-gnome module isn't there. So, happy debugging :)
Awesome, I can reproduce this too now, thanks! I'll try to debug this further soon. BTW, UTF-16 is not neccessary - when I remove it, it still crashes. However, config-gnome is neccessary.
ping ?
*** Bug 671651 has been marked as a duplicate of this bug. ***
When I run the test command as a normal user, the checkout works, but there is an error at the end "svn: symbol lookup error: /usr/lib64/libkdecore.so.5: undefined symbol: _ZN9QListData6removeEi" When run as root, no error. Permissions on /usr/lib64/libkdecore.so.5.6.0 and the link at /usr/lib64/libkdecore.so.5 are OK.
just to put some pressure on you: right now it's the _only_ ship stopper :-) Even though I consider it a ship stopper light. The problem itself is not as severe as not to be fixable as update - if we would better understand what the problem really is.
Ok, I think I have a fix; building now. I will write a longer writeup together with some guide to ld.so debugging and scope internal structures soon, but to sum it up, the trouble is that libkdecore contained reference to the local scope of an object being unloaded; normally, this would not be troublesome since either libkdecore would be already unused or it would have alternative scope to defer to at the time of the unload. However, libkdecore was marked as NODELETE, therefore survived the removal of config_kde4 which also removed its crucial local scope. The logic to prevent local scope providers going away has been missing, but it would hurt only in this precise case. (I had the NODELETE hunch at the very beginning, but eu-readelf did not confirm any STB_GNU_UNIQ symbols lingering around - it turns out that rather than UNIQ, it prints their binding as LOOS+0! (bug 673872) One extra debug print in ld.so to confirm would put me at the right track much sooner...)
I have committed the fix.
Works, thank you Petr. :-)
*** Bug 673642 has been marked as a duplicate of this bug. ***
*** Bug 444800 has been marked as a duplicate of this bug. ***
This is an autogenerated message for OBS integration: This bug (657627) was mentioned in https://build.opensuse.org/request/show/73729 Factory / glibc
Just for completeness, Petrs writeup is at http://swik.net/opensuse/Planet+SuSE/Petr+Baudis:+ld.so+Scopes/e860l Upstream bug report at http://sourceware.org/bugzilla/show_bug.cgi?id=12561