A few years ago I successfully implemented a BizTalk solution making use of a WCF-NetMsmq receive location which was used to receive XML messages into BizTalk.  The solution worked really well for a good couple of years until a few months ago when the environment encountered a major unrelated outage during which a pretty nasty side effect, poisoned messages, was encountered.  The outcome of this experience is that I will only use the WCF-Custom adapter with a netMSMQBinding rather than using the WCF-NetMsmq adapter which has no capability to handle poisoned messages.

The outage was caused due to the file server on which the BizTalk database transaction logs were stored running out of disk space.  As a result all transactions that BizTalk attempted to make failed and were rolled back, including transactions trying to commit the received messages from the transactional MSMQ queue to the BizTalk message box.  The problem with this is that after retries are exhausted (the default MSMQ retry settings dictate that there will be 5 retries with 30 minute intervals between them) the message will be considered a poisoned message, and the receive location will be stuck in a faulted state until the message has been removed from the queue.  The WCF-NetMsmq adapter doesn’t allow the poison message handling settings to be overridden.

Thankfully, if you use the WCF-Custom adapter with the netMsmqBinding binding instead you will find that you have full control over the poison message handling settings (these settings are detailed in this MSDN article).  You’ll of course be able to override the number of retries and retry interval, but the setting which is most important to us is the one called ReceiveErrorHandling.  When using the WCF-NetMsmq adapter this setting is set to “Fault”, which means that poisoned messages remain in the queue and no messages can be consumed till the message has been removed.  We can instead set this to “Drop” if we want to get rid of the message automatically, “Reject” if we want to drop the message and send a negative acknowledgement back to the sending queue, or “Move” if we want to move the message to a queue (actually a sub-queue of the main queue) called poison.  Note that the aforementioned MSDN article states that the “Reject” and “Move” options are only available on Windows Vista, however I have successfully tested them on Windows Server 2008R2, and Windows Server 2012, and would be extremely surprised if they don’t also work on Windows 7 and Windows 8/8.1.

PoisonMessageSettings

We chose to go down the “Move” path ourselves because that results in the continued processing of our queue and allows us to deal with the poisoned message in our own time.  You’d need to find a smart way to deal with the poisoned messages, building some sort of notification process to ensure that messages don’t get left in the poison queue indefinitely.  One option is to use a vanilla MSMQ (rather than WCF based, in case the reason the message is poisoned is due to a problem with the SOAP envelope or malformed XML) receive location to receive messages off the poison queue (the URL for the poison queue is in the following format – net.msmq://<machine-name>/applicationQueue;poison) thus kicking off your notification process.

One takeaway from this post is to avoid using the WCF-NetMsmq adapter and to use it’s more flexible cousin, the WCF-Custom adapter with the netMsmqBinding binding.  I would extend this advice to the majority of the WCF adapters, since using the WCF-Custom adapter generally affords you a lot more flexibility in the way of being able to make use of WCF behaviors and binding settings that the WCF adapters don’t always expose.  The one definite exception (at least that I can think of) to this advice is for the webHttpBinding (i.e. REST) in which case it is best to use the WCF-WebHttp adapter, since it gives you access to URL variable mapping which you will not find on the WCF-Custom adapter.  Luckily Microsoft allow you to use WCF behaviors on the WCF-WebHttp adapter so you don’t lose anything by avoiding the WCF-Custom adapter in this case.

Another takeaway is that you should always consider how you will deal with poisoned messages when consuming messages off an MSMQ queue with BizTalk.  Even if you have full control over the WCF clients that send messages to the queue, you might still encounter poisoned messages out of no fault of the message sender, like in the outage scenario I described above.  It is best to familiarize yourself with the concept of poisoned messages and plan how you will handle them, rather than find yourself in the position of having to figure it out during a production outage (as was the case for me).