Jan 21 2010

What kind of Response is that?

Category: Intellectual Pursuits | Rabbit TrailsJoeGeeky @ 22:14

During my time developing Business-to-Business (B2B) technologies, I saw and continue to see what I feel is a ridiculous pattern of behavior. In this community many systems communicate by POSTing content between systems. As you might expect, there are a number of things that can go wrong so inevitably a system responding to a request will have to send an HTTP Status Code to indicate success, failure, etc. All too often these systems recognize/send only two HTTP Status Codes, namely 200 (success) or 500 (failure). Unfortunately they are usually implemented incorrectly which leads to no end of useless, duplicative, redundant, and inevitably failed communications. Lets walk through a simple scenario to demonstrate exactly what happens.

  • Requesting system sends a request with incorrect content
  • Receiving system recognizes invalid content and sends a 500 (e.g. Internal Server Error) to indicate failure  

If you define the status codes simply as 200 (success) and 500 (failure) this may seem reasonable, but as usual, the devil's in the details. 200 equaling success is generally not in dispute, but lets take a closer look at the definition for 500.

Response status codes beginning with the digit "5" indicate cases in which the server is aware that it has erred or is incapable of performing the request.
- Hypertext Transfer Protocol -- HTTP/1.1 [RFC2616] (http://tools.ietf.org/html/rfc2616#section-10.5)

Note the statement "it has erred". Any system that is compliant with the HTTP standard would read this and think the target system had a problem and retry the same request again immediately, or at a later date. The problem is, the sender is the problem in this scenario... They can resubmit all they like and the result should always be the same. Here is where you start becoming your own worst enemy. Many organizations will create cases around failures like these and follow up, triage, or otherwise investigate because in B2B you never like saying no to any request.

The sad part is; at least in my experience; they end up contacting the requester and telling them to fix their content and not fixing their own system. In a manner of speaking, generating responses that lead to an unnecessary increase in work-load is like defecating in your own mouth. Sure... it's survivable, but no matter how you look at it, it will never taste right. Increased traffic, wasted bandwidth, redundant case load, customer (e.g. requester) frustration, the list goes on. In this case, a better response would be to use the 400 series message.

The 4xx class of status code is intended for cases in which the client seems to have erred.
- Hypertext Transfer Protocol -- HTTP/1.1 [RFC2616] (http://tools.ietf.org/html/rfc2616#section-10.4)

Again... note the statement "client seems to have erred". This would be a more appropriate response to our scenario, a response of 400 would be enough to tell a standards compliant system that the request was not valid and NOT to try again until the request is corrected. Oddly enough, many B2B standards provision for codes beyond 200 and 500 but for some reason they don't seem to follow their own advise. Another common misuse of the 500 code is to send it when a user fails to authenticate or fails to have the authorization level (e.g. permission, right, or role) to perform a given action. In this case, a response of 401 (e.g. Unauthorized) is more appropriate.

For those of you who are thinking this is completely obvious and you would "never" do something so silly, chances are you have already done it and didn't even realize it. Consider the following code:

Response.Redirect(http://www.smelser.net/blog/default.aspx);
 

On the surface this may look pretty straight-forward. What you may not realize is that under the covers this sends the browser an HTTP Status Code of 307 (e.g. Temporary Redirect). If you have done your research on how web-crawlers work (Internet or Intranet) this kind of response can lead to pages not being indexed. Assuming you actually want to be found, this could be a problem. In my experience, people who use this call rarely mean it to represent a "temporary" redirect, so a more appropriate response would be 301 (e.g. Moved Permanently). As a side note, Microsoft will make this more obvious as part of ASP.NET 4.0 and MVC 2.0 by providing the Response.RedirectPermanent and Response,RedirectToRoutePermanent. In this mean time you can just make your own by writing an Extension Method for the HttpResponse class. 

Here is another missed opportunity for those of you who use durable (or reliable) services such as message queues. If you are sure you will never lose a request, and you actually experience a "real" 500 condition (e.g. your service fails), you can respond with a status code of 202 (e.g. Accepted). This tells the requester that you got the request, although you cannot provide a complete answer with respect to its success. If your system (or protocol) has a mechanism to check the status of requests this is ideal because you can process the requests once your system comes back online and they can check for final acceptance at a later date. This can also be useful when you need to take a system offline for maintenance or upgrades.

Consider services that are load balanced. If a slow response is worse than a failed response; due to SLA's, customer experience, or whatever; you may want to have your server/application send a signal that tells your load balancer to send a request to another available server. For many load balancing products this can be as simple as sending a response code of 503 (e.g. Service Unavailable). Again, this could be useful when performing upgrades or maintenance.

So what does this all mean? Take a closer look at the standard codes and do your best to align to the them. It may be a little additional development, but in the end it will save you a lot of time and money. For a summary of the basic status codes, take a look at http://en.wikipedia.org/wiki/List_of_HTTP_status_codes.

Tags:

Jan 16 2010

Exceptional Threading

Category: Rabbit Trails | Tips and TricksJoeGeeky @ 21:27

When multi-threading applications it can be easy to lose the plot from time to time. Sometimes it can take all your energy just to remember what is running when, how to sync, lock, join, etc... Often, exception handling takes a back seat or can lose consideration with respect to where exceptions should; or will; be communicated and how they may be handled. Even if you assume you are the greatest developer who ever lived, exceptions are inevitable, and when they occur in a multi-threaded application the root cause can be very hard to isolate. In fact, depending on the type of feature being executed on a thread you may have silent failures leading to no end of rabbit-trails as dependent behaviors and/or components exhibit who knows what.

With that in mind, there are a number of patterns that can keep you out of trouble; or at least; help you isolate problems when trouble strikes. Lets tackle one of the most commonly used threading patterns first, the QueueUserWorkItem.

ThreadPool.QueueUserWorkItem(DoSomethingFeature, null);

This is something I see a lot of and unfortunately it can lead to disappointment. Any unhandled exceptions that occur in the aforementioned DoSomethingFeature() method will reach the AppDomain and will crash your application. There are; at least; two patterns we can employ to deal with this kind of problem. The first pattern focuses on catching exceptions. Thanks to lambda support, we can easily wrap our feature methods with some basic try {} catch {} blocks.

ThreadPool.QueueUserWorkItem(state =>
    {
        try
        {
            DoSomethingFeature(state);
        }
        catch (Exception ex)
        {
            //Handle the exception
        }
    });

The above approach will provide you an opportunity to catch unhandled exceptions but does not provide an elegant means of communicating to other threads so they can take action if needed. To achieve that, you could employ the Observer Pattern using static Events... Here is a simplified example:   

Define a delegate and EventArgs implementation to communicate whatever is needed to facilitate your exception handling needs...  For this sample, all we need is the Exception itself.

public delegate void CustomExceptionHandler(object sender, ExceptionArgs e);

public sealed class ExceptionArgs : EventArgs
{
    public Exception Exception { get; set; }
}

Next, define a static Event in a location that is accessible to all required areas of concern.

public static event CustomExceptionHandler OnCustomException;

With that in place, we can now queue our threads as we did before, but this time we will wire up the new event/delegate created previously to communicate exception details.

ThreadPool.QueueUserWorkItem(state =>
    {
        try
        {
            DoSomethingFeature(state);
        }
        catch (Exception ex)
        {
            if (OnCustomException != null)
                OnCustomException(null, new ExceptionArgs { Exception = ex });
        }
    });

For those layers charged with handling or responding to unhandled exceptions, they just need to subscribe to the Events. 

OnCustomException += ((sender, e) => Console.WriteLine(e.Exception.Message));

Now lets address a second commonly used unhandled exception catch pattern. You may have seen code such as follows:

AppDomain.CurrentDomain.UnhandledException += ((sender, e) => /* catch and continue */));

This approach is often misunderstood... On the surface, it may appear as a method of catching an unhandled exception and preventing your application from crashing, but testing will show that this is not true starting with .NET 2.0. This delegate is provided to allow the application to save state, log exception details, etc. but will not prevent a terminal Exception from bringing down the AppDomain. Using this for the stated purposes is still a good idea, but you will need to employ other methods such as the ones above to prevent total failure.  

Tags: , ,

Nov 27 2008

Be careful with that StackTrace

Category: Rabbit Trails | Tips and TricksJoeGeeky @ 12:02

A colleague contacted me the other day with an interesting Production problem. Apparently he was walking a StackTrace to derive a method name for logging and in Production it led to NullReferenceExceptions. After he described what he saw, unfortunately I have seen this risky pattern used before and I knew exactly what was wrong.

I cannot reproduce this anywhere other than Production and when I step through it, it does not reveal anything either. Strangely enough, after I added a bunch of logging & tracing routines to a few of the call paths and redeployed to production it no longer occurred in those paths.
- The Colleague

For some strange reason, people just love doing this without realizing what might happen. Lets look at a small sample I wrote to demonstrate the scenario he employed:

using System;
using System.Diagnostics;
using System.Reflection;

public sealed class Program
{
    static void Main()
    {
        try
        {
            DoSomething();
        }
        catch
        {
        }
    }

    static void DoSomething()
    {
        DoSomethingElse();
    }

    static void DoSomethingElse()
    {
        string className;
        string methodName;

        SampleHelper.RetrieveCallersClassAndMethodName(out className, out methodName, 1);

        Console.WriteLine(string.Format("I was called by {0}/{1}."
            className, methodName));
    }
}

public static class SampleHelper
{
    public static void RetrieveCallersClassAndMethodName(out string className, out string methodName, int stackLevelOffset)
    {
        StackFrame frame = new StackFrame(stackLevelOffset);

        MethodBase methodBase = frame.GetMethod();

        className = methodBase.DeclaringType.Name;
        methodName = methodBase.Name;
    }
}

Notice the stack offset is set to 1

SampleHelper.RetrieveCallersClassAndMethodName(out className, out methodName, 1);

If you run this, it will lead to the following result:


I was called by Program/DoSomethingElse.

Similarly, if you set the offset to 2 it will lead to the following result:


I was called by Program/DoSomething.

So far this looks pretty straightforward. If you run this from Visual Studio (VS) you'll never see any issues with this, but once you've compiled and deployed this to Production your results may vary and you'll likely get a NullReferenceException. The reason for this is simple, but if you're not aware of how the JIT Compiler optimizes IL you'll lose your mind trying to figure it out.

The quickest way to reproduce the Null Reference condition is to set the offset to something like 99999. Run it again, and you'll reproduce the error. What this illustrates is a condition where you attempt to read to a Stack offset that doesn't exist. But wait, shouldn't the 1, 2, 3 offsets always exist. In short... No and to make matters worse, even if they do exist they may no longer be the location you expect. This is why this pattern is so troublesome; not to mention prone to exceptions.

The issue lies in JIT Compiler Optimizations. The truth is, we don't always write code in the most performant manner and method calls can be expensive (relatively speaking). Consequently, the JIT Compiler optimizes IL to be more performant for a given processor. These optimizations may lead to methods being combined or broken apart, which is why the above pattern leads to unpredictable results. The good news is, we're not completely powerless. The rules revolving around this process are well documented, so knowing how it works will help you assess and/or mitigate your risks of being a victim of optimizations. Here is a list of conditions under which methods will NOT be inlined, if your code does not meet one of these conditions your code is subject to optimizations:

  • Methods that are greater than 32 bytes of IL will not be inlined, this explains why the addition of logging functions in the previous report stopped this from occurring. Essentially, the impacted methods crossed the 32 byte threshold
  • Virtual functions are not inlined
  • Methods that have complex flow control will not be in-lined. Complex flow control is any flow control other than if/then/else; in this case, switch or while
  • Methods that contain exception-handling blocks are not inlined, though methods that throw exceptions are still candidates for inlining
  • If any of the method's formal arguments are structs, the method will not be inlined

Although not on the official Microsoft list of conditions, there are three other implicit rules; the first two of which explain why this is not reproducible in VS or outside the Production environment.

  • Inlining is disabled when a debugger is attached
  • By default, inlining is disabled when compiling in Debug mode
  • Inlining will be disabled when any method is decorated with the MethodImplAttribute and the MethodImplOptions.NoInlining is set
[System.Runtime.CompilerServices.MethodImpl(MethodImplOptions.NoInlining)]

If after all this you still want to use this pattern, lets refactor the original code to reduce the risk of optimization issues:

using System;
using System.Diagnostics;
using System.Reflection;
using System.Runtime.CompilerServices;

public sealed class Program
{
    [MethodImpl(MethodImplOptions.NoInlining)]
    static void Main()
    {
        try
        {
            DoSomething();
        }
        catch
        {
        }
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static void DoSomething()
    {
        DoSomethingElse();
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static void DoSomethingElse()
    {
        string className;
        string methodName;

        SampleHelper.RetrieveCallersClassAndMethodName(out className, out methodName, 2);

        Console.WriteLine(string.Format("I was called by {0}/{1}.",
            className, methodName));
    }
}

public static class SampleHelper
{
    [MethodImpl(MethodImplOptions.NoInlining)]
    public static void RetrieveCallersClassAndMethodName(out string className, out string methodName, int stackLevelOffset)
    {
        className = string.Empty;
        methodName = string.Empty;

        StackFrame frame = new StackFrame(stackLevelOffset);

        MethodBase methodBase = frame.GetMethod();

        if (methodBase == null)
            return;

        if (methodBase.DeclaringType == null)
            return;

        className = methodBase.DeclaringType.Name;
        methodName = methodBase.Name;
    }
}

One final point.  Every time I have this discussion someone inevitably asks if they can write code to tell them whether or not their method has been inlined.  Practically speaking the answer is No.  In fact, you won't even see this in Reflector. Remember, optimizations occur during JIT compilation, not when you're compiling to IL.  Additionally, the same code may be inlined differently on two different machines because this process is partially impacted by the type of processor you are running against.  

Well that's it... Again, I caution you against using this pattern but if you do need it, pay very close attention, and test this well. 

Tags: ,

Jul 26 2007

Debugging a Web Garden

Category: Rabbit TrailsJoeGeeky @ 13:24

Today I experienced one of those errors that can just drive you crazy. Ostensibly, the error was something that you would normally expect to see in code behind or at least represented in the classic white and yellow ASP.NET error page. In this case, our testers reported they were receiving fully-fledged “Windows Internet Explorer” dialog messages with every .NET developer’s favorite message “Object reference not set to an instance of an object.”. This was happening everywhere, and appeared to increase as the system load increased.

No matter how hard I tried, I could not recreate the problem on the development systems nor was I able to catch it with any debugger…  After some head-scratching I noticed the only difference between the two environments (e.g. Development and Testing) was the number processors and that reminded me I that Web Gardening had recently been enabled. While this is a great way to enhance performance, memory management, and scalability for your web applications it requires an out-of-process state manager to ensure session state is not lost between the different processor instances in the web garden. In this case, there was no state manager and once one was added all the problems went away.

<?xml version="1.0"?>
<configuration>
    <system.web>
        <!--This added to support web gardening and 
        out-of-process session management-->
        <sessionState mode="StateServer"
            stateConnectionString="tcpip=localhost:42424"
            cookieless="false"
            timeout="20"/>
 </system.web></configuration>

Conceptually, one instance started handling a given set of user requests and as the load increased, some of them started being handled by a new instance which did not have any of the state information that was being requested. With that solved, I was left with one lingering questions…  Why did they get a dialog box?  In every case, the error messages were thrown from within AJAX Async processes. I am just glad it’s over…

Tags: