Bug 383390 - OOo: valgrind shows errors in stripMatchingPrefix
Summary: OOo: valgrind shows errors in stripMatchingPrefix
Status: RESOLVED WORKSFORME
Alias: None
Product: openSUSE 11.0
Classification: openSUSE
Component: OpenOffice.org (show other bugs)
Version: Beta 1
Hardware: x86-64 Other
: P4 - Low : Normal (vote)
Target Milestone: ---
Assignee: Jan Holesovsky
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on: 384481
Blocks:
  Show dependency treegraph
 
Reported: 2008-04-24 14:52 UTC by Vladimir Nadvornik
Modified: 2009-09-28 03:11 UTC (History)
2 users (show)

See Also:
Found By: Development
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
the backtrace (7.45 KB, text/plain)
2008-04-24 14:56 UTC, Vladimir Nadvornik
Details
pmladek's backtrace (46.56 KB, text/x-log)
2008-04-24 17:27 UTC, Petr Mladek
Details
output from valgrind (555.37 KB, text/plain)
2008-04-25 09:53 UTC, Vladimir Nadvornik
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Vladimir Nadvornik 2008-04-24 14:52:16 UTC
When I run "ooffice some_file.ods" from xterm in Windowmaker, it sometimes freezes. It does not seem to be dependent on the file.
Comment 1 Vladimir Nadvornik 2008-04-24 14:56:03 UTC
Created attachment 210242 [details]
the backtrace

The backtrace is not exactly from the beta1 package because I could not find the corresponding debuginfo. It is from 2.4.0.6-5 from STABLE which differs only in rpm release number.
Comment 2 Petr Mladek 2008-04-24 15:22:32 UTC
It looks quite serious => increasing severity.

It looks like a deadlock between threads. Thorsten, I think that you were interested into this area.

The configmgr stuff is mentioned in the backtrace. Michael, I think that you are familiar with this area.
Comment 3 Petr Mladek 2008-04-24 17:24:42 UTC
I did some more testing:


1. It was reproducible only when I used the home directory from our NTS server

2. It was reproducible only via the soffice wrapper but not directly with 
   soffice.bin => it is somewhat related to the oosplash.bin

3. It opened all files from that NFS via the soffice wrapper in the read-only 
   mode. It opened them for writing when I used soffice.bin or when I moved
   them to the local disk.

I'll attach my backtrace.

I'll reassign it to Kendy because it is somewhat related to the oosplash.

If you see anything interesting in the backtraces, please help.
Comment 4 Petr Mladek 2008-04-24 17:27:22 UTC
Created attachment 210287 [details]
pmladek's backtrace
Comment 5 Petr Mladek 2008-04-24 17:29:30 UTC
It is a very annoying bug because it freezed for Vlada and he was not able to start another instance. It crashed for me and it complained about that there is another instance running when I tried to start it => increasing priority.
Comment 6 Michael Meeks 2008-04-25 01:54:08 UTC
I guess I'm paranoid about memory corruption always - but - this looks strange:

#7  <signal handler called>
#8  0x00007fd0483935a5 in raise () from /lib64/libc.so.6
#9  0x00007fd048394b93 in abort () from /lib64/libc.so.6
#10 0x00007fd0483d6a90 in ?? () from /lib64/libc.so.6
#11 0x00007fd0483d6f2d in ?? () from /lib64/libc.so.6
#12 0x00007fd0483d9e40 in ?? () from /lib64/libc.so.6
#13 0x00007fd048bec5ff in __cxa_allocate_exception () from /usr/lib64/libstdc++.so.6
#14 0x00007fd03f13e48c in configmgr::configuration::Path::stripMatchingPrefix (_aPath=<value optimized out>, _aPrefix=@0x431a0d20)
    at /usr/src/debug/ooo-build-2.4.0.6/build/ooh680-m12/configmgr/source/treemgr/configpath.cxx:485

should throw an exception successfully back to:

#20 0x00007fd03f0a318d in configmgr::backend::CacheController::refreshComponent (this=0x72ad60, _aRequest=<value optimized out>)
    at /usr/src/debug/ooo-build-2.4.0.6/build/ooh680-m12/configmgr/source/treecache/invalidatetree.cxx:241

which does the relevant try.

OTOH - it -seems- to crash in the allocator: which may explain why Thread 1 stops and cannot make progress.

Vladimir - this is a great trace & bug: thank you !

Any chance we can get a better trace with glibc-debuginfo installed ? :-)

Failing that, if we can repeat this with:

 valgrind --tool=memcheck --num-callers=128 --trace-children=yes

on the wrapper [ will take forever I know ], to see if this is indeed memory corruption: that would be wonderful :-)
Comment 7 Vladimir Nadvornik 2008-04-25 09:53:00 UTC
Created attachment 210452 [details]
output from valgrind 

This is what I got from valgrind. It indeed looks like a memory corruption.
Comment 8 Michael Meeks 2008-04-28 10:00:57 UTC
chasing it.
Comment 9 Michael Meeks 2008-04-28 13:43:30 UTC
Botheration - I can't reproduce this; with OOHm10 or with DEV300 - it may be file specific - any chance of getting your test file ?

The valgrind trace looks interesting & leads me to suspect some things but since I can't reproduce it, life is bad.
Comment 10 Petr Mladek 2008-04-28 14:11:03 UTC
Michael, I has been able to reproduce it only when I had ~/.ooo-2.0 on a NFS-mounted home.
Comment 11 Petr Mladek 2008-04-28 14:38:29 UTC
Heh, I am somewhat not able to reproduce it with the last build (based on ooo-build-2.4.0.8) at all.
Comment 12 Petr Mladek 2008-04-28 14:54:17 UTC
Hmm, I have reproduced it again after the rebooted to openSUSE-11.0.

It worked for me when I was running SLED10-SP1 and was just chrooted to the openSUSE-11.0 system => It should be related to a system stuff.

I reproduce it only with NFS-mounted home => might be NFS related.
I reproduce it only with oosplash => might be related to pipes or so.

=> it would look like a kernel problem
Comment 13 Petr Mladek 2008-04-28 14:59:48 UTC
I am going to double check that the symptoms are really correct.
I will also check strace, ...

I wonder if it might be somewhat related to the missing users, see http://en.opensuse.org/Bugs:Most_Annoying_Bugs_11.0_dev#Missing_System_Users_in_11.0_Beta1
I have done the workaround but there might be one more missing user.

Also note that part of the openSUSE-11.0 system is installed from some pre-installed images to speed up the installation phase. It might have some strange side effects.

If you have any idea what to check, please let us know.
Comment 14 Petr Mladek 2008-04-28 16:10:11 UTC
The problem is related to file locking.

The locking does not work via our NFS (at least on openSUSE-11.0). The locking is enabled only when it is started by the soffice wrapper, see:

--- cut ---
# file locking now enabled by default
SAL_ENABLE_FILE_LOCKING=1
export SAL_ENABLE_FILE_LOCKING
--- cut ---

Hmm, I think that there was a hack for this problem in kernel or in OOo.
Michael, does it trigger a bell in your head? ;-)
Comment 15 Petr Mladek 2008-04-28 17:15:41 UTC
The file is successfully locked on SLED10. I am going to create small test-case based on the code in osl/unx/file.cxx.
Comment 16 Petr Mladek 2008-04-29 09:56:35 UTC
The problem with NFS locking is beeing solved as the bug #384481

As a workround, you might start rpc.startd by hand (as root):

     rpc.statd --no-notify
Comment 17 Michael Meeks 2008-04-29 10:56:57 UTC
Well - the problems running under valgrind are real and rather concerning - if they are only provoked with NFS home - then I guess I need to test under NFS home: IMHO NFS can only be a contributing factor, not the underlying cause.
Comment 18 Vladimir Nadvornik 2008-04-29 12:08:10 UTC
I can confirm that the running/not running rpc.statd is the difference that
triggers this bug.
It is not file specific, it crashes on all files for me.
Comment 19 Jan Holesovsky 2008-05-12 15:58:55 UTC
The freeze is being solved in the bug 384481, let's have this bug just to track the valgrind errors - changing the severity and description accordingly.

My wild guess without anyhow trying to actually reproduce the issue etc. is that the culprit is the following:

482         if (aResult.isEmpty() || !matches(*it,aResult.getFirstName()))
483             throw InvalidName(aResult.getFirstName().toPathString(), "does not match the expected location.");

In case aResult is empty, the sanity check in getFirstName() throws - doing a throw while throwing - and if I'm not mistaken, this is not the right thing to do ;-)  Will try to reproduce, fix, ...
Comment 20 Ke Yu 2009-09-28 03:11:06 UTC
I did regression test on SLED11GM with OOo-3.1.1.1
I create some test files in NFS mounted folder.
I don't find any freeze or crash when I open these files by "ooffice testfile.odt" -> WORKSFORME