Apr 16 2011

A few principles for writing blazing fast code in .NET

When authoring high performance applications, the following generalized rules can be considered valuable for creating highly performant capabilities:

Share nothing across threads, even at the expense of memory

  • Sharing across thread boundaries leads to increased preemptions, costly thread contention, and may introduce other less obvious expenses in L2 cache, and more
  • When working with shared state that is seldom or never updated, give each thread its own copy even at the expense of memory
  • Create thread affinity if the workload represents saga’s for object state, but keep in mind this may limit scalability within a single instance of a process
  • Where possible, isolating threads is ideal

Embrace lock-free architectures

  • The fewer locks the better, which is obvious to most people
  • Understanding how to achieve thread-safety using lock-free patterns can be somewhat nuanced, so digging into the details of how the primitive/native locking semantics work and the concepts behind memory fencing can help ensure you have leaner execution paths

# dedicated long-running Threads == Number of processing cores

  • It’s easy to just spin up another thread. Unfortunately, the more threads you create the more contention you are likely to create with them. Eventually, you may find your application is spending so much time jumping between threads; there is no time to do any real work. This is known as a ‘Live Lock’ scenario and is somewhat challenging to debug
  • Test the performance of your application using different threading patterns on hardware that is representative of the production environment to ensure the number of threads you’ve chosen is actually optimal
  • For background tasks that have more flexibility in how often they can be run, when, and how much work can be done at any given time, consider Continuation or Task Scheduler patterns and collapse them onto fewer (or a single) threads
  • Consider using patterns that utilize the ThreadPool instead of using dedicated long-running threads

Stay in-memory and avoid or batch I/O where possible

  • File, Database, and Network I/O can be costly
  • Consider batching updates when I/O is required. This includes buffering file writes, batch message transmissions, etc…
  • For database interactions, try using bulk inserts even if it’s only to temp tables. You can use Stored Procedures to signal submission of data, which can then perform ETL like functions in the database

Avoid the Heap

  • Objects placed on the Heap carry with them the burden of being garbage collected. If your application produces a large number of objects with very short lives, the burden of collection can be expensive to the overall performance of your application
  • Consider switching to Server GC (a.k.a multi-core GC)
  • Consider switching to Value Types maintained on the call stack and don’t box them
  • Consider reusing object instances. Samples of each of these can be found in the below Coding Guidelines

Use method-level variables during method execution and merge results with class-level variables after processing

  • Using shared variables that are frequently updated can create inefficiencies in how the call stack is managed and how L2 Cache’s behave
  • When working with relatively small variables, follow a pattern of copy-local, do work, merge changes
  • For shared state that is updated frequently by multiple threads, be aware of ‘False Sharing’ concerns and code accordingly

Avoid polling patterns

  • Blind polling can lead to inefficient use of resources and reduce a systems ability to scale or reduce overall performance. Where possible, apply publish-subscribe patterns

Know what things cost

  • If you dig a little deeper into the Framework you may find some surprises with regard to what things cost. Take a look at Know What Things Cost.

Develop layers with at least two consumers in mind… Production & Tests

  • Developing highly performant systems requires a fair amount of testing. As such, each new layer/component needs to be testable in isolation so that we can better isolate performance bottlenecks, measure thresholds and capacity, and model/validate behavior under various load scenarios
  • Consider using Dependency Injection patterns to allow injection of alternate dependencies, mocks, etc…
  • Consider using Provider patterns to make selection of separate implementations easier. It’s not uncommon for automated test systems to configure alternate implementations to help suite various test cases
    • Ex. Layers that replicate network instability, layers that accommodate Bot-users that drive change in the system, layers that replicate external resources with predictable behaviours, etc…

Tags: , ,

Comments

1.
trackback DotNetKicks.com says:

A few principles for writing blazing fast code in .NET

You've been kicked (a good thing) - Trackback from DotNetKicks.com

Comments are closed