Mar 19 2010

Putting your DataTable on a diet

Category: Tips and TricksJoeGeeky @ 01:08

As the name suggests, the System.Data.DataTable is a fat object type. It's true that there's a lot you can do with this class, but this may come at a pretty high price. To demonstrate this, lets start by creating two in-memory structures and compare their sizes using WinDbg. The size of each structure will be a conservative 500,000 records, each with a puny 7 columns of 32 Bit Integers. If you've developed high-performance applications you know this is a pretty conservative amount of data, but this should still make the point.

const int MaxItemCount = 500000;

Our first structure is built around the suppossed fat DataTable. 

var table = new DataTable("MyFatTable");
table.Columns.AddRange(new[] {new DataColumn("A", typeof(int)),
                                new DataColumn("B", typeof(int)),
                                new DataColumn("C", typeof(int)),
                                new DataColumn("D", typeof(int)),
                                new DataColumn("E", typeof(int)),
                                new DataColumn("F", typeof(int)),
                                new DataColumn("G", typeof(int))});
for (int i = 0; i < MaxItemCount; i++)
{
    DataRow row = table.NewRow();
    row["A"] = i;
    row["B"] = i;
    row["C"] = i;
    row["D"] = i;
    row["E"] = i;
    row["F"] = i;
    row["G"] = i;
    table.Rows.Add(row);
}

Next we'll create a more traditional object collection so we can see what the DataTable "might" be costing us.

public class DataItem
{
    public int A { get; set; }
    public int B { get; set; }
    public int C { get; set; }
    public int D { get; set; }
    public int E { get; set; }
    public int F { get; set; }
    public int G { get; set; }
}

IList<DataItem> data = new List<DataItem>();
for (int i = 0; i < MaxItemCount; i++)
{
    var dataItem = new DataItem();
    dataItem.A = i;
    dataItem.B = i;
    dataItem.C = i;
    dataItem.D = i;
    dataItem.E = i;
    dataItem.F = i;
    dataItem.G = i;
    data.Add(dataItem);
}

If we run the sample and create a Dump file, we can see how much memory is being consumed by each type of structure. The DataRows in the DataTable make up almost twice the amount of memory as that of the more traditional object collection (e.g. DataItems). Keep in mind this was a really simplistic sample, and real-world structures will likely be much larger (in row count and columns). Aside from memory pressure, the DataTable will cause your application to spend more time in Garbage Collection, which will translate to your application being slower.

Spend a little time profiling your application and you'll likely find other large concrete implementations with hidden costs. In this case, the solution is easier than you might think. In many cases, all we really need from a DataTable is the underlying DataReader. Assuming that is true in your case, all you really need to do is implement a custom System.Data.IDataReader. This is a fairly large interface but it is really easy to work with and many solutions don't actually need every method implemented. This allows you to start simple and add additional features when and if you need them.

using System;
using System.Collections.Generic;
using System.Data;
using System.Reflection;

public sealed class GenericListDataReader<T> : IDataReader
{
    private readonly IEnumerator<T> _enumerator;
    private readonly List<PropertyInfo> _properties = new List<PropertyInfo>();

    public GenericListDataReader(IEnumerable<T> list)
    {
        _enumerator = list.GetEnumerator();

        const BindingFlags flags = BindingFlags.Public | BindingFlags.Instance | BindingFlags.GetProperty;

        /* Map properties to columns based on the order in which they occur */
        foreach (PropertyInfo property in typeof(T).GetProperties(flags))
        {
            /* Only support adds properties that can be mapped to database types */
            if (property.PropertyType.IsPrimitive ||
                property.PropertyType == typeof(string) ||
                property.PropertyType == typeof(DateTime))
            {
                _properties.Add(property);
            }
        }
    }

    public bool Read()
    {
        return _enumerator.MoveNext();
    }

    public int FieldCount
    {
        get { return _properties.Count; }
    }

    public string GetName(int i)
    {
        return _properties[i].Name;
    }

    public Type GetFieldType(int i)
    {
        return _properties[i].PropertyType;
    }

    public object GetValue(int i)
    {
        return _properties[i].GetValue(_enumerator.Current, null);
    }

    public void Close()
    {
        _enumerator.Dispose();
    }

    public void Dispose()
    {
        Close();
    }

    #region IDataRecord Members
        /* throw NotSupportedException for all remaining members */
    #endregion

    #region IDataReader Members
        /* throw NotSupportedException for all remaining members */
    #endregion

}

Get the full class GenericListDataReader.cs (4.30 kb).

This is a really basic wrapper around enumerable types. As a side benefit your Data Reader is inherently Mockable which makes testing easy as well. As an IDataReader, this structure can be used to bridge the gap between traditional object structures and data persistence layers. Here is a simple example using the SqlBulkCopy structure to persist our previous collection into a database.  

using (var dataReader = new GenericListDataReader<DataItem>(data))
{
    using (var bulkCopy = new SqlBulkCopy("A Connection String"))
    {
        bulkCopy.DestinationTableName = "Target Table Name";

        bulkCopy.ColumnMappings.Add(0, "ColumnA");
        bulkCopy.ColumnMappings.Add(1, "ColumnB");

        bulkCopy.WriteToServer(dataReader);
    }
}

So whats the lesson here? It has two parts... As I said in the beginning, the DataTable is fat. Also, if you're writing high performance applications, look closely at those "free" concrete implementations, they may be costing you more than you think.

Tags: ,

Comments

1.
trackback DotNetKicks.com says:

Putting your DataTable on a diet

You've been kicked (a good thing) - Trackback from DotNetKicks.com

2.
Riester Riester France says:

Hi, I  came accross your blog and have been reading along your posts. I thought I would leave my first comment. I dont know what to say except that I have enjoyed reading your blog. Nice blog. I will keep visiting this blog very often...

3.
pingback topsy.com says:

Pingback from topsy.com

Twitter Trackbacks for
        
        Smelser.NET | Putting your DataTable on a diet
        [smelser.net]
        on Topsy.com

4.
audio self help audio self help United States says:

Just popping in to give kudos on how this reads and flows.

Comments are closed