Bug 515018 - monsoon 0.2.0: possible low-hanging fruit for speeding up checksums
Summary: monsoon 0.2.0: possible low-hanging fruit for speeding up checksums
Status: NEW
Alias: None
Product: Mono: Tools
Classification: Mono
Component: MonoTorrent (show other bugs)
Version: unspecified
Hardware: x86-64 Other
: P5 - None : Normal
Target Milestone: ---
Assignee: Alan McGovern
QA Contact: Alan McGovern
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-06-21 06:54 UTC by Charles Kerr
Modified: 2009-06-30 21:22 UTC (History)
1 user (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
screenshot of monsoon 0.21 at the end of verifying a 19 GiB torrent (603.45 KB, image/png)
2009-06-30 20:52 UTC, Charles Kerr
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Charles Kerr 2009-06-21 06:54:37 UTC
User-Agent:       Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2a1pre) Gecko/20090620 Minefield/3.6a1pre

I was testing out various BitTorrent clients recently.  One thing that I tested was how long each of them took to verify a 19.5 GiB torrent that had already been downloaded.

Deluge took 5:45, Vuze took 5:46, Transmission took 5:47, and KTorrent took 5:49... all of them were extremely close to each other wrt time, so all were probably hitting the same bottlenecks.

By contrast, Monsoon 0.2.0 took almost 5x as long... 23:44 in total.  I haven't looked at the code, but my guess is there's some low-hanging fruit here wrt optimizations. :)

Also, I'm just guessing that "MonoTorrent" is the correct component, instead of "Monsoon".  Sorry if I'm wrong about that.

Reproducible: Always

Steps to Reproduce:
1.
2.
3.
Comment 1 Charles Kerr 2009-06-21 07:02:22 UTC
Oh, I should probably include this.  I don't know if it's important, but it couldn't hurt:

mono-core.x86_64                   2.4-19.fc11                @fedora           
mono-data.x86_64                   2.4-19.fc11                @fedora           
mono-data-sqlite.x86_64            2.4-19.fc11                @fedora           
mono-extras.x86_64                 2.4-19.fc11                @fedora           
mono-web.x86_64                    2.4-19.fc11                @fedora           
mono-winforms.x86_64               2.4-19.fc11                @fedora           
monodoc.x86_64                     2.4-19.fc11                @fedora
Comment 2 Alan McGovern 2009-06-23 00:20:48 UTC
Hrmm, this is actually a bit strange. On my system monsoon takes ~50 seconds to hash a 1.5GB file. Extrapolating upwards and this means my system should take 650 seconds (10:50) to hash 19.5GB in Monsoon.

Using the commandline openssl client to perform the same hash function it takes 31 seconds for a 1.5GB file. Extrapolating again and that'd be 403 seconds (6:45) for 19.5 GB.

That's a performance delta of about 40% between the SHA1 algorithm in mono and all related overhead in Monsoon versus the fastest SHA1 algorithm I know of. That's far less than the 500% you're saying that you see.

I also hashed the exact same file under Transmission 1.52 (8229), it took ~50 seconds to hash. Under Deluge 1.1.0 it took roughly the same again. That makes the delta between all the clients less than ~5%. So, what kind of system are you running on to get such a huge delta between Monsoon and everything else? Are you 100% sure that those numbers are correct?
Comment 3 Charles Kerr 2009-06-23 01:59:19 UTC
> What kind of system are you running on to get such a huge delta between Monsoon and everything else?

A patched Fedora 11 running on an x86_64 processor.

> Are you 100% sure that those numbers are correct?

Yes, the numbers are correct. 

Could some kind of configuration error could explain this?  Or the differences between Monsoon 0.2.0 and the newer version you're using?

I'm happy to run tests or provide more information as needed.
Comment 4 Alan McGovern 2009-06-23 08:23:30 UTC
Sorry, when I said "What kind of system" I actually meant "What spec is your system". I just need the basic info, CPU/RAM/HD (raid/ssd/etc).

I'm definitely interested in finding out what's causing this difference. I'll do up a little test application later on to rule out a mono issue aswell.

I really haven't a clue what might cause this, nothing major has changed in monotorrent since 0.7.2 which would cause such a huge performance gain. It's also worth noting that for me hashing a 1.5 GB file is actually limited by disk IO performance. 'time cat bigfile > /dev/null' takes 63 seconds to run the very first time. If I then hash the file immediately with openssl, it takes ~6 seconds and with Monsoon ~24 seconds. This matches up with the performance difference you describe, though optimising that is going to be very hard if I can't see the difference locally. It's also a hard one to optimise because the case is rare for the entire file to already be held in memory.

Thanks.
Comment 5 Charles Kerr 2009-06-26 05:59:38 UTC
> "What spec is your system". I just need the basic info CPU/RAM/HD (raid/ssd/etc).

AMD Athlon 64 X2 6000+ Windsor 3.0GHz Socket AM2 125W Dual-Core
Kingston 4GB DDR2 800 (PC2 6400)
Seagate Barracuda 7200.11 ST31000340AS 1TB 7200 RPM SATA 3.0Gb/s 3.5"
EXT 3
No raid.

> I really haven't a clue what might cause this, nothing major has changed in
monotorrent since 0.7.2 which would cause such a huge performance gain. It's
also worth noting that for me hashing a 1.5 GB file is actually limited by disk
IO performance.

If you like, I'll try doing a 1.5 GB torrent for comparison.

Also, I see the Fedora 11 repo has an updated version of Monsoon.  I'll try to give that a spin this weekend.
Comment 6 Charles Kerr 2009-06-30 17:18:12 UTC
Monsoon 0.21 took about 20 minutes for the same torrent.

Attached screenshot shows the Monsoon session and my hardware configuration.

One thing that may be an issue here is that the .torrent contains *many* files.  Would this affect Monsoon?
Comment 7 Alan McGovern 2009-06-30 20:38:44 UTC
Ah hah, that was actually going to be my next question :)

Roughly how many files are in the torrent and what would the average size be? If possible, attaching the actual .torrent would be great. I did my testing with a large torrent which has a single file, though I wouldn't have expected a torrent with many small files to cause such a slowdown. It's likely to be something I can fix though.
Comment 8 Charles Kerr 2009-06-30 20:52:04 UTC
Created attachment 301740 [details]
screenshot of monsoon 0.21 at the end of verifying a 19 GiB torrent
Comment 9 Charles Kerr 2009-06-30 20:55:35 UTC
Alan: it's a private torrent.  I can wipe the passkey and attach it, but even then it would only be useful for examining the file list.  Do you still want it?
Comment 10 Alan McGovern 2009-06-30 21:01:19 UTC
Yup, that's what I want it for. If I have a known 'bad' filelist I'll be able to see what's making it bad and then fix it. The fix could be as simple as just increasing the size of the pre-buffer for hashing.

When you attach the torrent you can mark the post as private so only novell employees (me) will be able to read it. Alternatively you could send it directly to my email address.

Finally, when editing the file to remove the passcode, it'd be better if you just replaced it with a fake string of equal length, so the torrent doesn't get reported as corrupt.

Thanks.
Comment 11 Charles Kerr 2009-06-30 21:13:29 UTC
> Alternatively you could send it directly to my email address.

Will do.

> Finally, when editing the file to remove the passcode, it'd be better if you
just replaced it with a fake string of equal length, so the torrent doesn't get
reported as corrupt.

Did you see my email address? :)
Comment 12 Charles Kerr 2009-06-30 21:22:13 UTC
> Alternatively you could send it directly to my email address.

Sent to "Alan McGovern <alan.mcgovern@gmail.com>"