A day with disruptor-net

by joel 15. August 2011 09:22
My idle hands jumped on the chance to play around with disruptor-net, a .NET port of disruptor, from the guys over at LMAX. Having just come off an interesting contract with an investment bank to build a high frequency market making system in .NET, I was curious to see if the disruptor framework had the potential to further push the boundaries of scale and latency. 

We had leveraged CCR (Concurrency and Coordination Runtime) from the Microsoft Robotics guys as the concurrency programming model. If you peek under the hood, it contains a lot of the data structures that the distruptor guys claim have little mechanical sympathy: locks and queues (and these mechanics did show up during a performance milestone analysis). 

I spent a day toying around, building a very simple stock price to option price simulator, where a price comes in off the wire, we run it through an option pricing model (Black Scholes), and then hand the option price back to the wire. It's embarrassingly parallel if all your doing is single instrument pricing. For five million price messages, the results are in: 

Number of Processors: 1
- Name: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
- Description: Intel64 Family 6 Model 42 Stepping 7
- ClockSpeed: 3401 Mhz
- Number of cores: 4
- Number of logical processors: 4
- Hyperthreading: OFF

Memory: 16364 MBytes
L1Cache: 256 KBytes
L2Cache: 1024 KBytes
L3Cache: 8192 KBytes

Detailed test results

Scenario Implementation Run Operations per second Duration (ms) # GC (0-1-2) Comments
Market1P1C Disruptor 0 787 277ops 6371(ms) 691-1-0  
Market1P1C CCR 0 537 634ops 9304(ms) 757-69-6  

Now, there are probably hundreds of different ways we can optimize both scenarios, so take the results with a grain of salt, but poking around using the Visual Studio Concurrency perf tool and a trial of Intel's VTune Amplifier XE, I did see a lot of synchronisation overhead with CCR vs. disruptor-net, so the mechanical sympathy features on the box seem to work (and well, the L2 cache aligned RingBuffer approach should definitely yield us more execution time in theory) . I suspect batching the messages in the CCR case would immediately reduced that overhead though.

However (and with the caveat that my entire distruptor experience is a day plus change), while coding this up, I couldn't help but want the CCR programming model and API back. You get elegant composability, join calculus, and error handling, which means you spend less time thinking about concurrency and synchronisation constructs, and more time on the problem domain. The distruptor-net API, while small in surface area, seems less subtle: pack the data tightly, grok the semantics of how things get executed and when, consume rinse repeat. And really, that's no great surprise given what it's trying to achieve.

It's a classic case of "paying for abstractions". I just wonder if we can't have both? Regardless, disruptor-net shows promise, I just hope the contributors keep it up.



blog comments powered by Disqus

Powered by BlogEngine.NET
A modified theme by Mads Kristensen

About the author

Joel Pobar works on languages and runtimes at Facebook