Bug 315999 (MONO65734) - System.Threading.Timer 20x slower than MSFT's
Summary: System.Threading.Timer 20x slower than MSFT's
Status: RESOLVED FIXED
Alias: MONO65734
Product: Mono: Class Libraries
Classification: Mono
Component: CORLIB (show other bugs)
Version: unspecified
Hardware: Other Other
: P3 - Medium : Normal
Target Milestone: ---
Assignee: Dick Porter
QA Contact: Mono Bugs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-09-11 07:39 UTC by Ben Maurer
Modified: 2007-10-16 10:29 UTC (History)
2 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Single thread timer scheduler (17.85 KB, patch)
2006-06-25 21:40 UTC, Thomas Wiest
Details | Diff
new patch (18.16 KB, patch)
2006-08-10 18:51 UTC, Thomas Wiest
Details | Diff
updated patch fixing Dispose deadlock issue. (21.33 KB, patch)
2006-11-05 21:01 UTC, Thomas Wiest
Details | Diff
rewritten patch (7.77 KB, patch)
2007-10-15 17:56 UTC, Paolo Molaro
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Wiest 2007-09-15 18:53:19 UTC


---- Reported by bmaurer@users.sf.net 2004-09-11 00:39:30 MST ----

This test takes 20x longer on Mono than on MSFT's runtime:

using System;
using System.Threading;

class X {
	static void Main ()
	{
		for (int i = 0; i < 1000; i ++) {
			Timer t = new Timer (new TimerCallback (Foo), i, 1000, 0);
			// do stuff
			t.Dispose ();
		}
	}
	static void Foo (object o) {}
}

The problem is that we create a thread for each timer, while MSFT appears
to only create one thread for the lifetime of the program.



---- Additional Comments From raf@ophion.org 2006-06-25 14:40:47 MST ----

Created an attachment (id=166759)
Single thread timer scheduler




---- Additional Comments From gonzalo@ximian.com 2006-07-30 20:12:31 MST ----

If the callback throws, the scheduler thread is killed.



---- Additional Comments From bmaurer@users.sf.net 2006-07-31 06:33:58 MST ----

Style wise, this patch needs some work:

+			static object sync_obj = new object();
+			static object sync_obj2 = new object();

How are those used?

Why is TimerScheduler a class? it seems like it could be static
methods in Timer? Also, why a TimerJob class. This is a 1<->1 relation
with a Timer.

Variables and methods are named with mixed conventions. Make things
consistent, at the very least. In general, no underscore_names for
methods. Also, enum values shouldn't be C_CONSTANT_NAMED.

+			readonly int TIME_SLICE = 10 ; // 10 msec

Should be const

+					log("could not properly signal timer-scheduler, waiting...");
+					Thread.Sleep(5); 

THis is *really* *really* dubious. You need to do something like
"calcuate the next time you will wake up, and wait on an object with
that timeout. Pulse the object if a new timer is added". This will
wake up the CPU far too often, and (for example) waste battery power.
It is a blocker for this patch.

+			void log(string str) {
+				if (Environment.GetEnvironmentVariable("MONO_TIMER_DEBUG") != null)

Should probably be cached.





---- Additional Comments From bmaurer@users.sf.net 2006-07-31 06:36:47 MST ----

Never mind this comment:

THis is *really* *really* dubious. You need to do something like
"calcuate the next time you will wake up, and wait on an object with
that timeout. Pulse the object if a new timer is added". This will
wake up the CPU far too often, and (for example) waste battery power.
It is a blocker for this patch.

The sleep is dubious, but it's not polling the cpu like i thought it
was. It's 3 am here ;-)



---- Additional Comments From raf@ophion.org 2006-08-01 12:32:07 MST ----

Awesome feedback guys... my responses are prefixed by ">>"

* If the callback throws, the scheduler thread is killed.
>> perfect catch, I"ll add a catch all "catch Exception" in order to
correct this.

Style wise, this patch needs some work:

+			static object sync_obj = new object();
+			static object sync_obj2 = new object();

How are those used?

>> sync_obj is used to prevent race conditions on the "GetInstance"
call of the singleton TimerScheduler class. sync_obj2 is used to make
signaling the scheduler class sane so only one signal is sent at a
time and a decision can be made on the signal so signals are "batched"
or discarded .With that said, I'm not sure why I have 2 separate sync
objects, I should be able to combine them without issues.

Why is TimerScheduler a class? it seems like it could be static
methods in Timer? Also, why a TimerJob class. This is a 1<->1 relation
with a Timer.

>> Timer scheduler is a singleton class because it has a 1-1 mapping
with a timer-scheduler thread which is the thread in charge of firing
all the timer events, so it is a 1 to n mapping between
timer-scheduler threads and timers. Now with regards to why it needs
to be a class of its own instead of  static methods in Timer i guess
it's merely a matter of choice... I believe I chose that route just to
keep all the scheduler logic in one place, which made it easier for me
to merge it with the Timer class. Of course, if you guys want, I can
made the conversion. I believe TimerJob follows under the same logic,
I just wanted to keep the scheduler specific meta data in one place.

Variables and methods are named with mixed conventions. Make things
consistent, at the very least. In general, no underscore_names for
methods. Also, enum values shouldn't be C_CONSTANT_NAMED.

>> I'll correct all that

+			readonly int TIME_SLICE = 10 ; // 10 msec

Should be const

>> same here

+					log("could not properly signal timer-scheduler, waiting...");
+					Thread.Sleep(5); 

THis is *really* *really* dubious. You need to do something like
"calcuate the next time you will wake up, and wait on an object with
that timeout. Pulse the object if a new timer is added". This will
wake up the CPU far too often, and (for example) waste battery power.
It is a blocker for this patch.

>> Yeah the logic is slightly counter intuitive but it actually
performs really well under most conditions. The timer logic is
actually very very loosely base on Quartz for java (I guess the main
loop logic is the same). From my testing you start seeing issues once
the number of number of timers is very large and the period of the
events is small which puts pressure on the scheduler to run more often
and doesn't give it a chance to sleep. Also, if the "period" of the
Timer is shorter than 10 msec (the TIMER_SLICE base), we start having
issues as well, and that is not as simple of a problem to solve as it
may seem. I did some performance observations that you might find
interesting here:

http://ophion.org/index.php?gadget=Blog&action=SingleView&id=9


+			void log(string str) {
+				if (Environment.GetEnvironmentVariable("MONO_TIMER_DEBUG") != null)

Should probably be cached.

>> agreed

>> I'm going to post a new patch tonight reflecting those changes, I
just need to test removing the "sync_obj" to make sure it doesn't
create any problems. Also Miguel had suggested to change the patch to
not use thread pool threads, so I might make that change as well but
I'm not sure how that is going to affect performance. 

Once again thanks for the very insightful feedback!

- raf






---- Additional Comments From bmaurer@users.sf.net 2006-08-01 15:10:00 MST ----

Please put the "how locks are used" comments in the file itself.

I'd like to see the refactoring done to remove the two extra classes.
I think the mapping just causes more code. 



---- Additional Comments From raf@ophion.org 2006-08-10 11:51:58 MST ----

Created an attachment (id=166760)
new patch




---- Additional Comments From raf@ophion.org 2006-08-10 11:52:43 MST ----

changelog for the attached patch:
* fixed all cosmetic issues (let me know if I missied something)
* got rid of inner classes (there's only 1 left: Runner)
* Added logic to fire events using non-threadpool threads (that's the
default bevahior but you can set an env variable to makes the
scheduler use the thread pool again)
* Added caching logic to the log method
* Removed sync_obj2 

Let me know how committable this one is!



---- Additional Comments From miguel@ximian.com 2006-08-12 13:24:21 MST ----

ccing gonzalo



---- Additional Comments From gonzalo@ximian.com 2006-08-12 14:06:45 MST ----

On Monday I'll check this out.



---- Additional Comments From gonzalo@ximian.com 2006-10-03 15:01:59 MST ----

When running the tests, mono hangs. One of the threads is here:

"Timer-Scheduler" tid=0x0xb693bba0 this=0x0x33c9a0:
  at (wrapper managed-to-native)
System.Object.__icall_wrapper_mono_object_new_specific (intptr) <0x00004>
  at (wrapper managed-to-native)
System.Object.__icall_wrapper_mono_object_new_specific (intptr)
<0xffffffff>
  at Enumerator.get_Current () [0x0003d] in
/mono-HEAD/src/trunk/mcs/class/corlib/System.Collections/Hashtable.cs:956
  at System.Threading.Timer.SchedulerThread () [0x0006e] in
/mono-HEAD/src/trunk/mcs/class/corlib/System.Threading/Timer.cs:213
  at (wrapper delegate-invoke) System.MulticastDelegate.invoke_void ()
<0xffffffff>
  at (wrapper runtime-invoke) System.Object.runtime_invoke_void
(object,intptr,intptr,intptr) <0xffffffff>




---- Additional Comments From raf@ophion.org 2006-10-04 00:46:56 MST ----

Gonzalo, 

can you rerun the test with MONO_TIMER_DEBUG=1? 

I'm going to look into this shortly. 

- raf



---- Additional Comments From raf@ophion.org 2006-10-05 01:46:36 MST ----

Gonzallo, 

I'm looking into this further and on my 1 cpu machine the only Timer
test that fails is
MonoTests.System.Threading.TimerTest.TestDisposeOnCallback which was
marked as [NotWorking] all the other tests are passing. Can you post
the name of the failing test? Also, can you try running it with the
timer debug flag on? 





---- Additional Comments From raf@ophion.org 2006-11-05 14:01:46 MST ----

Created an attachment (id=166761)
updated patch fixing Dispose deadlock issue.




---- Additional Comments From raf@ophion.org 2006-11-05 14:04:36 MST ----

I believe I found and fixed the deadlock issue you encountered, please
give the attached patch a shot. Also, I've emailed the mono-dev
mailing list to start a discussion about the patch. 



---- Additional Comments From gonzalo@ximian.com 2006-11-28 20:04:50 MST ----

Adding this patch makes the JIT say:
** (process:23595): ERROR (recursed) **: file domain.c: line 177
(mono_jit_info_table_find): assertion failed: (ret == 0)
aborting...

when running make run-test in corlib



---- Additional Comments From raf@ophion.org 2006-12-11 19:12:16 MST ----

thanks gonzallo,  I'll look into this shortly and I'll try to ping you
in IRC after I have time to research the JIT error. 

Imported an attachment (id=166759)
Imported an attachment (id=166760)
Imported an attachment (id=166761)

Unknown operating system unknown. Setting to default OS "Other".

Comment 1 Paolo Molaro 2007-10-15 17:56:38 UTC
Created attachment 178610 [details]
rewritten patch

The patch doesn't easily apply anymore, so I rewrote it, with a few simplifications and a few bug fixes (mostly locking on Jobs and avoiding the Abort calls). I'll apply in a day or two if there are no objections.
Comment 2 Paolo Molaro 2007-10-16 10:29:03 UTC
Fixed in svn.
For reference, the new code creates 10K timers in 1.5 seconds on my box.
The old code dies with a "can't create thread" exception after 6 seconds.
There are more optimizations possible, but this is already a big improvement
(in the stability of the code, too).