Mar 19 2010

Putting your DataTable on a diet

Category: Tips and TricksJoeGeeky @ 01:08

As the name suggests, the System.Data.DataTable is a fat object type. It's true that there's a lot you can do with this class, but this may come at a pretty high price. To demonstrate this, lets start by creating two in-memory structures and compare their sizes using WinDbg. The size of each structure will be a conservative 500,000 records, each with a puny 7 columns of 32 Bit Integers. If you've developed high-performance applications you know this is a pretty conservative amount of data, but this should still make the point.

const int MaxItemCount = 500000;

Our first structure is built around the suppossed fat DataTable. 

var table = new DataTable("MyFatTable");
table.Columns.AddRange(new[] {new DataColumn("A", typeof(int)),
                                new DataColumn("B", typeof(int)),
                                new DataColumn("C", typeof(int)),
                                new DataColumn("D", typeof(int)),
                                new DataColumn("E", typeof(int)),
                                new DataColumn("F", typeof(int)),
                                new DataColumn("G", typeof(int))});
for (int i = 0; i < MaxItemCount; i++)
{
    DataRow row = table.NewRow();
    row["A"] = i;
    row["B"] = i;
    row["C"] = i;
    row["D"] = i;
    row["E"] = i;
    row["F"] = i;
    row["G"] = i;
    table.Rows.Add(row);
}

Next we'll create a more traditional object collection so we can see what the DataTable "might" be costing us.

public class DataItem
{
    public int A { get; set; }
    public int B { get; set; }
    public int C { get; set; }
    public int D { get; set; }
    public int E { get; set; }
    public int F { get; set; }
    public int G { get; set; }
}

IList<DataItem> data = new List<DataItem>();
for (int i = 0; i < MaxItemCount; i++)
{
    var dataItem = new DataItem();
    dataItem.A = i;
    dataItem.B = i;
    dataItem.C = i;
    dataItem.D = i;
    dataItem.E = i;
    dataItem.F = i;
    dataItem.G = i;
    data.Add(dataItem);
}

If we run the sample and create a Dump file, we can see how much memory is being consumed by each type of structure. The DataRows in the DataTable make up almost twice the amount of memory as that of the more traditional object collection (e.g. DataItems). Keep in mind this was a really simplistic sample, and real-world structures will likely be much larger (in row count and columns). Aside from memory pressure, the DataTable will cause your application to spend more time in Garbage Collection, which will translate to your application being slower.

Spend a little time profiling your application and you'll likely find other large concrete implementations with hidden costs. In this case, the solution is easier than you might think. In many cases, all we really need from a DataTable is the underlying DataReader. Assuming that is true in your case, all you really need to do is implement a custom System.Data.IDataReader. This is a fairly large interface but it is really easy to work with and many solutions don't actually need every method implemented. This allows you to start simple and add additional features when and if you need them.

using System;
using System.Collections.Generic;
using System.Data;
using System.Reflection;

public sealed class GenericListDataReader<T> : IDataReader
{
    private readonly IEnumerator<T> _enumerator;
    private readonly List<PropertyInfo> _properties = new List<PropertyInfo>();

    public GenericListDataReader(IEnumerable<T> list)
    {
        _enumerator = list.GetEnumerator();

        const BindingFlags flags = BindingFlags.Public | BindingFlags.Instance | BindingFlags.GetProperty;

        /* Map properties to columns based on the order in which they occur */
        foreach (PropertyInfo property in typeof(T).GetProperties(flags))
        {
            /* Only support adds properties that can be mapped to database types */
            if (property.PropertyType.IsPrimitive ||
                property.PropertyType == typeof(string) ||
                property.PropertyType == typeof(DateTime))
            {
                _properties.Add(property);
            }
        }
    }

    public bool Read()
    {
        return _enumerator.MoveNext();
    }

    public int FieldCount
    {
        get { return _properties.Count; }
    }

    public string GetName(int i)
    {
        return _properties[i].Name;
    }

    public Type GetFieldType(int i)
    {
        return _properties[i].PropertyType;
    }

    public object GetValue(int i)
    {
        return _properties[i].GetValue(_enumerator.Current, null);
    }

    public void Close()
    {
        _enumerator.Dispose();
    }

    public void Dispose()
    {
        Close();
    }

    #region IDataRecord Members
        /* throw NotSupportedException for all remaining members */
    #endregion

    #region IDataReader Members
        /* throw NotSupportedException for all remaining members */
    #endregion

}

Get the full class GenericListDataReader.cs (4.30 kb).

This is a really basic wrapper around enumerable types. As a side benefit your Data Reader is inherently Mockable which makes testing easy as well. As an IDataReader, this structure can be used to bridge the gap between traditional object structures and data persistence layers. Here is a simple example using the SqlBulkCopy structure to persist our previous collection into a database.  

using (var dataReader = new GenericListDataReader<DataItem>(data))
{
    using (var bulkCopy = new SqlBulkCopy("A Connection String"))
    {
        bulkCopy.DestinationTableName = "Target Table Name";

        bulkCopy.ColumnMappings.Add(0, "ColumnA");
        bulkCopy.ColumnMappings.Add(1, "ColumnB");

        bulkCopy.WriteToServer(dataReader);
    }
}

So whats the lesson here? It has two parts... As I said in the beginning, the DataTable is fat. Also, if you're writing high performance applications, look closely at those "free" concrete implementations, they may be costing you more than you think.

Tags: ,

Feb 11 2010

Hiding from the debugger

Category: Tips and TricksJoeGeeky @ 18:56

There is no doubt the modern debugger has made developers lives so much easier and without it... well... lets just not think about that. More specifically, the Microsoft Visual Studio debugger is really a beautiful piece of engineering kit. Having praised it, now lets talk about the pain it can cause us. Tools like this have become so advanced, providing so much detail that sometimes it can actually hamper our efficiency or in some cases give away more information than we are comfortable with. Consider the following examples:

  • When you have deep inheritance chains, auto-generated code for things like designers, or methods with high Cyclomatic Complexity, stepping through code can take considerably longer than you might like and if segments of the code are already trustworthy, you may wish to have the Debugger ignore them when stepping through code
  • When dealing with classes that have large numbers of private variables & properties that reflect the same values this can clutter up the variables windows makings your debugging experience less then optimal
  • When developing applications to be consumed by third-parties, you may wish to hide select members from the debugger. This could be a desired part of your obfuscation process or simply to make the debugging experience less confusing for consumers of your API

Lucky for us, these and other scenarios can be dealt with easily by using facilities provided within the System.Diagnostics namespace.

Lets start with hiding code from the Debugger:

[DebuggerHidden]
public void MyHiddenMethod()
{ /* Do something */ }

By simply appling the attribute DebuggerHidden when stepping through code, any time this method is encountered by the debugger, it will skip over it. This will occur even if the method or a descendant has a breakpoint set.

Hiding select members can be just as easy. Lets say we have a simple variable that we want to hide. Before hiding the variable we can see that it is clearly visible in the debugger.

private string _internalMember;

To hide this member, all we need to do is apply the DebuggerBrowsable attribute:

[DebuggerBrowsable(DebuggerBrowsableState.Never)]
private string _internalMember;

When debugging you will see that the member is now missing from variable windows such as Locals.

While writing this, I tried to think of some basic rules for when to apply this. The truth is, there aren't any "rules". This is something you will have to apply when it feels right for your situation. Keep in mind that applying techniques like this can lead to hiding important information, on the other hand it can make debugging experiences more productive. With that said, remember to never apply these techniques until you are absolutely sure the code being hidden is trusted. Take a look at the System.Diagnostics namespace for more tricks you can use to make you debugging experience better.

Tags: ,

Jan 16 2010

Exceptional Threading

Category: Rabbit Trails | Tips and TricksJoeGeeky @ 21:27

When multi-threading applications it can be easy to lose the plot from time to time. Sometimes it can take all your energy just to remember what is running when, how to sync, lock, join, etc... Often, exception handling takes a back seat or can lose consideration with respect to where exceptions should; or will; be communicated and how they may be handled. Even if you assume you are the greatest developer who ever lived, exceptions are inevitable, and when they occur in a multi-threaded application the root cause can be very hard to isolate. In fact, depending on the type of feature being executed on a thread you may have silent failures leading to no end of rabbit-trails as dependent behaviors and/or components exhibit who knows what.

With that in mind, there are a number of patterns that can keep you out of trouble; or at least; help you isolate problems when trouble strikes. Lets tackle one of the most commonly used threading patterns first, the QueueUserWorkItem.

ThreadPool.QueueUserWorkItem(DoSomethingFeature, null);

This is something I see a lot of and unfortunately it can lead to disappointment. Any unhandled exceptions that occur in the aforementioned DoSomethingFeature() method will reach the AppDomain and will crash your application. There are; at least; two patterns we can employ to deal with this kind of problem. The first pattern focuses on catching exceptions. Thanks to lambda support, we can easily wrap our feature methods with some basic try {} catch {} blocks.

ThreadPool.QueueUserWorkItem(state =>
    {
        try
        {
            DoSomethingFeature(state);
        }
        catch (Exception ex)
        {
            //Handle the exception
        }
    });

The above approach will provide you an opportunity to catch unhandled exceptions but does not provide an elegant means of communicating to other threads so they can take action if needed. To achieve that, you could employ the Observer Pattern using static Events... Here is a simplified example:   

Define a delegate and EventArgs implementation to communicate whatever is needed to facilitate your exception handling needs...  For this sample, all we need is the Exception itself.

public delegate void CustomExceptionHandler(object sender, ExceptionArgs e);

public sealed class ExceptionArgs : EventArgs
{
    public Exception Exception { get; set; }
}

Next, define a static Event in a location that is accessible to all required areas of concern.

public static event CustomExceptionHandler OnCustomException;

With that in place, we can now queue our threads as we did before, but this time we will wire up the new event/delegate created previously to communicate exception details.

ThreadPool.QueueUserWorkItem(state =>
    {
        try
        {
            DoSomethingFeature(state);
        }
        catch (Exception ex)
        {
            if (OnCustomException != null)
                OnCustomException(null, new ExceptionArgs { Exception = ex });
        }
    });

For those layers charged with handling or responding to unhandled exceptions, they just need to subscribe to the Events. 

OnCustomException += ((sender, e) => Console.WriteLine(e.Exception.Message));

Now lets address a second commonly used unhandled exception catch pattern. You may have seen code such as follows:

AppDomain.CurrentDomain.UnhandledException += ((sender, e) => /* catch and continue */));

This approach is often misunderstood... On the surface, it may appear as a method of catching an unhandled exception and preventing your application from crashing, but testing will show that this is not true starting with .NET 2.0. This delegate is provided to allow the application to save state, log exception details, etc. but will not prevent a terminal Exception from bringing down the AppDomain. Using this for the stated purposes is still a good idea, but you will need to employ other methods such as the ones above to prevent total failure.  

Tags: , ,

Nov 27 2008

Be careful with that StackTrace

Category: Rabbit Trails | Tips and TricksJoeGeeky @ 12:02

A colleague contacted me the other day with an interesting Production problem. Apparently he was walking a StackTrace to derive a method name for logging and in Production it led to NullReferenceExceptions. After he described what he saw, unfortunately I have seen this risky pattern used before and I knew exactly what was wrong.

I cannot reproduce this anywhere other than Production and when I step through it, it does not reveal anything either. Strangely enough, after I added a bunch of logging & tracing routines to a few of the call paths and redeployed to production it no longer occurred in those paths.
- The Colleague

For some strange reason, people just love doing this without realizing what might happen. Lets look at a small sample I wrote to demonstrate the scenario he employed:

using System;
using System.Diagnostics;
using System.Reflection;

public sealed class Program
{
    static void Main()
    {
        try
        {
            DoSomething();
        }
        catch
        {
        }
    }

    static void DoSomething()
    {
        DoSomethingElse();
    }

    static void DoSomethingElse()
    {
        string className;
        string methodName;

        SampleHelper.RetrieveCallersClassAndMethodName(out className, out methodName, 1);

        Console.WriteLine(string.Format("I was called by {0}/{1}."
            className, methodName));
    }
}

public static class SampleHelper
{
    public static void RetrieveCallersClassAndMethodName(out string className, out string methodName, int stackLevelOffset)
    {
        StackFrame frame = new StackFrame(stackLevelOffset);

        MethodBase methodBase = frame.GetMethod();

        className = methodBase.DeclaringType.Name;
        methodName = methodBase.Name;
    }
}

Notice the stack offset is set to 1

SampleHelper.RetrieveCallersClassAndMethodName(out className, out methodName, 1);

If you run this, it will lead to the following result:


I was called by Program/DoSomethingElse.

Similarly, if you set the offset to 2 it will lead to the following result:


I was called by Program/DoSomething.

So far this looks pretty straightforward. If you run this from Visual Studio (VS) you'll never see any issues with this, but once you've compiled and deployed this to Production your results may vary and you'll likely get a NullReferenceException. The reason for this is simple, but if you're not aware of how the JIT Compiler optimizes IL you'll lose your mind trying to figure it out.

The quickest way to reproduce the Null Reference condition is to set the offset to something like 99999. Run it again, and you'll reproduce the error. What this illustrates is a condition where you attempt to read to a Stack offset that doesn't exist. But wait, shouldn't the 1, 2, 3 offsets always exist. In short... No and to make matters worse, even if they do exist they may no longer be the location you expect. This is why this pattern is so troublesome; not to mention prone to exceptions.

The issue lies in JIT Compiler Optimizations. The truth is, we don't always write code in the most performant manner and method calls can be expensive (relatively speaking). Consequently, the JIT Compiler optimizes IL to be more performant for a given processor. These optimizations may lead to methods being combined or broken apart, which is why the above pattern leads to unpredictable results. The good news is, we're not completely powerless. The rules revolving around this process are well documented, so knowing how it works will help you assess and/or mitigate your risks of being a victim of optimizations. Here is a list of conditions under which methods will NOT be inlined, if your code does not meet one of these conditions your code is subject to optimizations:

  • Methods that are greater than 32 bytes of IL will not be inlined, this explains why the addition of logging functions in the previous report stopped this from occurring. Essentially, the impacted methods crossed the 32 byte threshold
  • Virtual functions are not inlined
  • Methods that have complex flow control will not be in-lined. Complex flow control is any flow control other than if/then/else; in this case, switch or while
  • Methods that contain exception-handling blocks are not inlined, though methods that throw exceptions are still candidates for inlining
  • If any of the method's formal arguments are structs, the method will not be inlined

Although not on the official Microsoft list of conditions, there are three other implicit rules; the first two of which explain why this is not reproducible in VS or outside the Production environment.

  • Inlining is disabled when a debugger is attached
  • By default, inlining is disabled when compiling in Debug mode
  • Inlining will be disabled when any method is decorated with the MethodImplAttribute and the MethodImplOptions.NoInlining is set
[System.Runtime.CompilerServices.MethodImpl(MethodImplOptions.NoInlining)]

If after all this you still want to use this pattern, lets refactor the original code to reduce the risk of optimization issues:

using System;
using System.Diagnostics;
using System.Reflection;
using System.Runtime.CompilerServices;

public sealed class Program
{
    [MethodImpl(MethodImplOptions.NoInlining)]
    static void Main()
    {
        try
        {
            DoSomething();
        }
        catch
        {
        }
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static void DoSomething()
    {
        DoSomethingElse();
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static void DoSomethingElse()
    {
        string className;
        string methodName;

        SampleHelper.RetrieveCallersClassAndMethodName(out className, out methodName, 2);

        Console.WriteLine(string.Format("I was called by {0}/{1}.",
            className, methodName));
    }
}

public static class SampleHelper
{
    [MethodImpl(MethodImplOptions.NoInlining)]
    public static void RetrieveCallersClassAndMethodName(out string className, out string methodName, int stackLevelOffset)
    {
        className = string.Empty;
        methodName = string.Empty;

        StackFrame frame = new StackFrame(stackLevelOffset);

        MethodBase methodBase = frame.GetMethod();

        if (methodBase == null)
            return;

        if (methodBase.DeclaringType == null)
            return;

        className = methodBase.DeclaringType.Name;
        methodName = methodBase.Name;
    }
}

One final point.  Every time I have this discussion someone inevitably asks if they can write code to tell them whether or not their method has been inlined.  Practically speaking the answer is No.  In fact, you won't even see this in Reflector. Remember, optimizations occur during JIT compilation, not when you're compiling to IL.  Additionally, the same code may be inlined differently on two different machines because this process is partially impacted by the type of processor you are running against.  

Well that's it... Again, I caution you against using this pattern but if you do need it, pay very close attention, and test this well. 

Tags: ,

Jul 26 2007

Debugging a Web Garden

Category: Rabbit TrailsJoeGeeky @ 13:24

Today I experienced one of those errors that can just drive you crazy. Ostensibly, the error was something that you would normally expect to see in code behind or at least represented in the classic white and yellow ASP.NET error page. In this case, our testers reported they were receiving fully-fledged “Windows Internet Explorer” dialog messages with every .NET developer’s favorite message “Object reference not set to an instance of an object.”. This was happening everywhere, and appeared to increase as the system load increased.

No matter how hard I tried, I could not recreate the problem on the development systems nor was I able to catch it with any debugger…  After some head-scratching I noticed the only difference between the two environments (e.g. Development and Testing) was the number processors and that reminded me I that Web Gardening had recently been enabled. While this is a great way to enhance performance, memory management, and scalability for your web applications it requires an out-of-process state manager to ensure session state is not lost between the different processor instances in the web garden. In this case, there was no state manager and once one was added all the problems went away.

<?xml version="1.0"?>
<configuration>
    <system.web>
        <!--This added to support web gardening and 
        out-of-process session management-->
        <sessionState mode="StateServer"
            stateConnectionString="tcpip=localhost:42424"
            cookieless="false"
            timeout="20"/>
 </system.web></configuration>

Conceptually, one instance started handling a given set of user requests and as the load increased, some of them started being handled by a new instance which did not have any of the state information that was being requested. With that solved, I was left with one lingering questions…  Why did they get a dialog box?  In every case, the error messages were thrown from within AJAX Async processes. I am just glad it’s over…

Tags: