Bug 151226

Summary: inefficiency: x86_64 ypxfr can't decode i586 ypxfrd replies
Product: [openSUSE] SUSE LINUX 10.0 Reporter: Matthias Andree <matthias.andree>
Component: NetworkAssignee: Thorsten Kukuk <kukuk>
Status: RESOLVED WONTFIX QA Contact: E-mail List <qa-bugs>
Severity: Enhancement    
Priority: P5 - None    
Version: Final   
Target Milestone: ---   
Hardware: x86-64   
OS: SuSE Linux 10.0   
Whiteboard:
Found By: Other Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Matthias Andree 2006-02-15 17:32:05 UTC
Setting up a SUSE Linux 10.0 i586 machine as NIS slave works with ypxfr.
Setting up a SUSE Linux 10.0 x86_64 machine as NIS slave fails with ypxfr and falls back to enumerating the maps which is inefficient.

The ypxfr is ypserv-2.18-3 for both architectures.

To reproduce
1. set up a NIS master server on an i586 machine and launch ypxfrd.
2. Then log into the machine that is to become the NIS slave.
3. run: /usr/lib/yp/ypinit -s nismaster.example.org

On i586 the transfer works fine, on x86_64, I get these messages:

We will need a few minutes to copy the data from nismaster.example.org.
Transferring auto.master...
Trying ypxfrd ...rpc.ypxfrd doesn't support the needed database type
call to rpc.ypxfrd failed: RPC: Can't decode result

 (failed, fallback to enumeration)

The ypxfrd is running SuSE Linux 9.2 i586 with ypserv-2.14-2.
Comment 1 Thorsten Kukuk 2006-02-15 18:27:55 UTC
Yes, and where should be the bug?
The database is incompatible between this architectures.
Comment 2 Matthias Andree 2006-02-15 21:28:52 UTC
Nothing in the documentation (man pages, README) says the databases are incompatible beyond endianness -- and x86_64 and i586 are both little-endian machines. Perhaps the database backend drivers need to be replaced anyhow, gdbm is neither stable nor fast.

Please make sure the 64-bit machines can read the 32-bit databases.
Comment 3 Thorsten Kukuk 2006-02-15 21:37:29 UTC
Sorry, gdbm is the most stable and fastest interface for this which is available.
And I'm one of the persons who designed the protocol and wrote that stuff for Linux, so I know about what I speak.
Comment 4 Matthias Andree 2006-02-15 23:43:09 UTC
Thorsten, the problem doesn't go away by your defining the bug invalid.
It is a real-world problem.

At the very least, the incapability to reuse the database from machines with same endianness but different word width should be documented, but it had better be fixed.

Inventing a protocol that makes network data transmission dependent on the machine architecture is broken by design -- if you say you designed it, you might as well have picked a portable database.

Please leave this bug report open until a revised ypxfr protocol or database format in ypserv resolves this problem.
Comment 5 Thorsten Kukuk 2006-02-16 04:56:08 UTC
(In reply to comment #4)
> Thorsten, the problem doesn't go away by your defining the bug invalid.

There is no problem => thus the bug is invalid.

> It is a real-world problem.

There is no problem at all.

> At the very least, the incapability to reuse the database from machines with
> same endianness but different word width should be documented, but it had
> better be fixed.

Why? Sorry, telling things which shows that you have no clue about what you speak is no reason to design and implement a new protocol instead of using an exiting, working one used by a lot of OS.


> Inventing a protocol that makes network data transmission dependent on the
> machine architecture is broken by design -- if you say you designed it, you
> might as well have picked a portable database.

Again: The first part is not true, and about the second part: It only shows you never looked at it. gdbm operates with O(1) with low memory footprint and a stable database format, libdb, the only partly portable database operates with O(N), a very high memory footprint and a always changing database format, which makes it even more worse.

> Please leave this bug report open until a revised ypxfr protocol or database
> format in ypserv resolves this problem.

No, you should trust people working on it for yet for years before you insist on changes, which makes everything much worser without thinking about what this changes really means.

This is a invalid request.
Comment 6 Matthias Andree 2006-02-16 09:38:30 UTC
This deserves a longish comment because of the personal attacks and provably wrong statements of Thorsten's.

0. Thorsten should be very careful with slander like you posted above, particularly if you aren't backing your claims by traceable facts.

1. My request to document restrictions in a given protocol is substantiated and reasonable. The documentation just refers to endianness, not word width, and is therefore incomplete unless the code is intended to be fixed.

2. Fact is that transferring NIS tables i586->x86_64 requires the slower yp_all() enumeration since ypxfr refuses the transfer, as demonstrated by my initial report.

3. the so-called protocol is documented as copying the whole database file; if its format is machine-dependent, and the network protocol has been specifically invented to copy the file, a protocol for good weather was invented. That's what I call "broken by design", but it isn't exactly surprising after the issues we've had between 1999 and 2001.

4. gdbm is not the only database with O(1) complexity for individual access. Your claim is libelling libdb aka Berkeley DB.

5. Oracle Berkeley DB (formerly known as Sleepycat Berkeley DB, BDB for short from now on) and QDBM work with O(1) or O(log N) for hash or btree access, respectively. QDBM is faster than GDBM by a factor of 4 (read)...10 (write) in Hirabayashi's benchmark. QDBM is available from http://qdbm.sourceforge.net/ and BDB from http://www.sleepycat.com/

6. CDB has been available since the year 2000. This, its FreeCDB sibling and the TinyCDB reimplementation are O(1), write machine-independent on-disk formats and are a factor of 5 (read)...17 (write) faster than GDBM in said benchmark, see http://qdbm.sourceforge.net/benchmark.pdf
URLS: http://cr.yp.to/cdb.html and http://www.corpit.ru/mjt/tinycdb.html

7. Hirabayashi's benchmark also shows that BDB, QDBM and CDB files are half the GDBM size.

8. It is true that BDB is a huge library compared to the others and that BDB on-disk formats are changing even in minor releases, which indeed makes BDB unsuitable for use in NIS. This leaves QDBM and CDB as alternatives, and given the scheme used by NIS, CDB appears as the most suitable.

Conclusion: 

A. Thorsten Kukuk creates a false dilemma here by (i) claiming the only alternatives had been GDBM and BDB, which was shown untrue by offering QDBM and CDB as further alternative.

B. Thorsten attacks my person rather than my arguments. He claims I had no clue rather than stating why the protocol must be machine dependent.

C. Thorsten, rather than giving tracable arguments, appeals to trusting to his experience. This comment shows that his opinion is disputed, and trusting a person is no proof for anything.

I'm not proposing a particular solution, I'm just kindly asking that this bug report remains open until a solution for the ypxfr issue is found or it was proven that the enumeration fallback is not more inefficient than the copy. It is ypxfr documentation that claims enumeration were more inefficient than the copy BTW, not my idea.

For future implementations, it appears reasonable to consider CDB as database backend. This could be integrated in a new protocol version, and would then still work with different architectures. Since the yp_all() enumeration fallback doesn't go away by adding the CDB database, existing systems can migrate smoothly.

Final warning: any further ad-hominem attacks will be filed for criminal prosecution.
Comment 7 Thorsten Kukuk 2006-02-16 09:50:07 UTC
(In reply to comment #6)

> 2. Fact is that transferring NIS tables i586->x86_64 requires the slower
> yp_all() enumeration since ypxfr refuses the transfer, as demonstrated by my
> initial report.

Yes, but this is not a bug.
 
> 3. the so-called protocol is documented as copying the whole database file; if
> its format is machine-dependent, and the network protocol has been specifically
> invented to copy the file, a protocol for good weather was invented. That's
> what I call "broken by design", but it isn't exactly surprising after the
> issues we've had between 1999 and 2001.

This comment doesn't make sense.

> 4. gdbm is not the only database with O(1) complexity for individual access.
> Your claim is libelling libdb aka Berkeley DB.

Read what I really wrote.

> 5. Oracle Berkeley DB (formerly known as Sleepycat Berkeley DB, BDB for short
> from now on) and QDBM work with O(1) or O(log N) for hash or btree access,
> respectively. QDBM is faster than GDBM by a factor of 4 (read)...10 (write) in
> Hirabayashi's benchmark. QDBM is available from http://qdbm.sourceforge.net/
> and BDB from http://www.sleepycat.com/

QDBM has, according to the author, the same problems as gdbm: the on disk format is not architecture independent.
For the YP needs Berkeley DB interface allows only O(N) or you need a very huge amount of memory.

You can be for sure that all laternative database backends have already been evaluated.

> 6. CDB has been available since the year 2000. This, its FreeCDB sibling and
> the TinyCDB reimplementation are O(1), write machine-independent on-disk
> formats and are a factor of 5 (read)...17 (write) faster than GDBM in said
> benchmark, see http://qdbm.sourceforge.net/benchmark.pdf
> URLS: http://cr.yp.to/cdb.html and http://www.corpit.ru/mjt/tinycdb.html

CDB is not useable as backend for ypserv. It fails when we evaluated it for this case.
 
> 7. Hirabayashi's benchmark also shows that BDB, QDBM and CDB files are half the
> GDBM size.

But does not help, since they have the same or other problems which makes it much worser.

> 8. It is true that BDB is a huge library compared to the others and that BDB
> on-disk formats are changing even in minor releases, which indeed makes BDB
> unsuitable for use in NIS. This leaves QDBM and CDB as alternatives, and given
> the scheme used by NIS, CDB appears as the most suitable.
> 
> Conclusion: 
> 
> A. Thorsten Kukuk creates a false dilemma here by 

Wrong, the reporter never looked deep enough in all details and does not accept, that his problem is no problem at all and that other spends a huge amount of time on that problem only to find out, that there is nothing to change.


> (i) claiming the only
> alternatives had been GDBM and BDB, which was shown untrue by offering QDBM and
> CDB as further alternative.

As written: they are no alternatives, they suffer from the same problems.
the reporter did not take the time to check all facts.

> I'm not proposing a particular solution, I'm just kindly asking that this bug
> report remains open until a solution for the ypxfr issue is found or it was
> proven that the enumeration fallback is not more inefficient than the copy

You still did not mention why the enumeration fallback should be a back.
You don't like it, but that is no reason.
Comment 8 Matthias Andree 2006-02-16 10:16:39 UTC
(In reply to comment #7)

> QDBM has, according to the author, the same problems as gdbm: the on disk
> format is not architecture independent.

Granted, note though that I did not claim otherwise.

> For the YP needs Berkeley DB interface allows only O(N) or you need a very huge
> amount of memory.

Untrue. Check the BDB documentation on sizing the cache, and perhaps secondary literature such as Postfix's documentation, quoting Wietse Venema's DB_README (postfix-2.3-20060126)

  * berkeley_db_read_buffer_size (default: 128 kBytes per table). This setting
    is used by all other Postfix programs. The buffer size is adequate for
    reading. If the cache is smaller than the table, random read performance is
    hardly cache size dependent, except with btree tables, where the cache size
    must be large enough to contain the entire path from the root node.
    Empirical evidence shows that 64 kBytes may be sufficient. We double the
    size to play safe, and to anticipate changes in implementation and bloat.

Besides that, a single-threaded daemon using a few static buffers that are mmaped() is hardly relevant for servers.

> You can be for sure that all laternative database backends have already been
> evaluated.

Then it should be possible to publish the evaluations if you make such severe claims as calling CDB unsuitable or failing as you do here:

> CDB is not useable as backend for ypserv. It fails when we evaluated it for
> this case.

> But does not help, since they have the same or other problems which makes it
> much worser.

The claim CDB had the same problem is not backed by facts yet.

> You still did not mention why the enumeration fallback should be a back.

...should be "a back"? Please rephrase.

> You don't like it, but that is no reason.

It is documented by your own texts as inefficient, that is reason enough to either address it, or if it is not inefficient, revise the texts.

And I'll repeat: closing the bug before either the code has changed or the restrictions have been documented in the ypserv package is only a waste of time.
Comment 9 Matthias Andree 2006-02-16 10:22:06 UTC
clarification:
in case the code is deemed as good as possible (even though disputed),
at least the FULL set of restrictions on the database should be documented. Adding a line that for efficient transfers, endianness AND WORDSIZE have to match is sufficient.
Comment 10 Thorsten Kukuk 2006-02-16 10:46:48 UTC
(In reply to comment #8)


> And I'll repeat: closing the bug before either the code has changed or the
> restrictions have been documented in the ypserv package is only a waste of
> time.

And I will repeat of closing it and reopening by you is a waste of your time. 

Comment 11 Matthias Andree 2006-02-16 11:46:52 UTC
You still haven't answered the question why CDB is unsuitable or how it failed either, and I'd add the question why the CDB maintainers have apparently not been contacted at that time to make CDB suitable.
Comment 12 Matthias Andree 2006-02-16 12:14:19 UTC
Demoting Severity to minor and adding "inefficiency" to Summary.
Comment 13 Thorsten Kukuk 2006-02-16 12:18:55 UTC
(In reply to comment #11)
> You still haven't answered the question why CDB is unsuitable or how it failed
> either, and I'd add the question why the CDB maintainers have apparently not
> been contacted at that time to make CDB suitable.

I don't remember anymore why it was unsitable, this is long ago.
And why should I contact maintainers and convience them to change their software if I have something which is working perfectly fine for everybody except you?

Between, the protocol definition of ypxfrd can be found in
/usr/include/rpcsvc/ypxfrd.x, and there you can see that 32bit/64bit is a difference.


Comment 14 Matthias Andree 2006-02-16 12:30:57 UTC
> I don't remember anymore why it was unsitable, this is long ago.

So you're using ASSUMPTIONS and nontraceable facts to defame CDB in a misguided attempt to defend the GDBM format that causes these problems. Pretty weak argument, don't you think?

> And why should I contact maintainers and convience them to change their
> software if I have something which is working perfectly fine for everybody
> except you?

Your phrase is illogical, you appear to mean "inconvenience".
Assuming the bug would hit just me is invalidated by your next paragraph:

> Between, the protocol definition of ypxfrd can be found in
> /usr/include/rpcsvc/ypxfrd.x, and there you can see that 32bit/64bit is a
> difference.

Thanks for proving it's a design flaw BTW.

And mind Comment #9: I'm not even requesting to fix this, documentation is sufficient.
Comment 15 Matthias Andree 2006-02-16 12:32:20 UTC
Answering the open question: since GDBM isn't working "perfectly fine" for anyone in heterogenous networks, contacting the CDB maintainer might have resolved the issues you were having with CDB, if any.
Comment 16 Thorsten Kukuk 2006-02-16 12:36:34 UTC
GDBM is working perfectly fine for everybody except you in heterogenous networks.
And it is documented in the RPC protocol file.
Comment 17 Matthias Andree 2006-02-16 12:42:47 UTC
Your statement is false, as proven, and RPC protocol files are not proper documentation for application on end user systems.
Comment 18 Thorsten Kukuk 2006-02-16 12:45:58 UTC
(In reply to comment #17)
> Your statement is false, as proven, 

No, you proved that you don't like it and assumes that this is the case for everybody else, too.

> and RPC protocol files are not proper
> documentation for application on end user systems.

Normal end users don't care and don't need to know how the file is transfered and why as long as it works. And thus is the case.
Comment 19 Matthias Andree 2006-02-16 12:58:37 UTC
(In reply to comment #18)

Thorsten, this has nothing to with personal emotions, but with technical reasons that I stated in comment #6. Everybody can see that for yourself.

> > and RPC protocol files are not proper
> > documentation for application on end user systems.
> 
> Normal end users don't care and don't need to know how the file is transfered
> and why as long as it works. And thus is the case.

Efficiency in heterogenous networks doesn't count for Thorsten Kukuk?
That is, see above, my complaint: inefficiency.

I don't know the motives for disputing this inefficiency or playing it down. Who gives you the right to claim it is unimportant.

If you want to drive customers away from SUSE, you're on a very suitable track. Your behavior is harmful to Novell's business.
Comment 20 Thorsten Kukuk 2006-02-16 13:10:17 UTC
(In reply to comment #19)
> (In reply to comment #18)
> 
> Thorsten, this has nothing to with personal emotions, but with technical
> reasons that I stated in comment #6. Everybody can see that for yourself.

But nobody agrees with your reasons.

> I don't know the motives for disputing this inefficiency or playing it down.
> Who gives you the right to claim it is unimportant.

That would raise the same question: Who gives you the right to claim that
only your opinion is the right one?
 
It is all OpenSource. If you disagree with something, you can change it. But you can not expect that everybody will agree with you and change everything to make you alone happy.
Comment 21 Matthias Andree 2006-02-16 13:47:01 UTC
(In reply to comment #20)
> > Thorsten, this has nothing to with personal emotions, but with technical
> > reasons that I stated in comment #6. Everybody can see that for yourself.
> 
> But nobody agrees with your reasons.

Nobody except you disagrees either.

> That would raise the same question: Who gives you the right to claim that
> only your opinion is the right one?

The facts have been stated above; if it's the new Novell/SUSE policy that heterogenous networks need not function efficiently, I can as well advise against using Novell software and migrate to a sane system next week.

I'm just not so sure if the management is very happy about your behavior.
I shall see to that.
Comment 22 Thorsten Kukuk 2006-02-16 13:50:08 UTC
You should really stop threaten people in bugzilla, this doesn't make your report more valid.
Comment 23 Matthias Andree 2006-02-16 14:04:44 UTC
Name a single reason why a system with known flaws and documented vendor disinterest/resource shortage would be more adequate than another.

Having said that, I'll let it rest. In the meanwhile, adrian@ responded it were policy to close feature requests if no implementation is planned, so let's just assume it were valid and then mark bug as RESOLVED WONTFIX.
Comment 24 Matthias Andree 2006-02-16 14:06:15 UTC
reopening as intermediate step to WONTFIX
Comment 25 Matthias Andree 2006-02-16 14:07:39 UTC
Marking bug WONTFIX as current author has no interest in improving the situation,
in accordance with private message from adrian@ explaining feature request policy.
Demoting to Enhancement at the same time.