Category: Orchestration


The book is now available, well done co-authors, reviewers and Packt Publishing team.

Connected Pawns

I am pleased to announce “SOA Patterns with BizTalk Server 2013 and Microsoft Azure – Second Edition” by Mark Brimble, Johann Cooper, Coen Dijgraaf and Mahindra Morar can now be purchased on the Amazon or from Packt. This is based on the first edition written by by Richard Seroter. Johann has given a nice preview of the book here.

This is the first book I have co-authored and I can now tick that off on my bucket list. It was an experience to work with such talented co-authors and I was continually amazed how they rose to the occasion. I agree with Johann’s statement about being privileged to write the second edition of what was the first BizTalk book I ever owned.

As I updated chapters I was continually struck by how little had changed. We added new chapters, Chapter 4, Chapter 5, and Chapter 6…

View original post 125 more words

Advertisements

On one of my current projects I had quite a few requirements that required me to make some smart use of BizTalk orchestration capabilities.  After some hard work I managed to stitch together an orchestration that functioned exactly as I wanted and performed fantastically given it’s level of complexity, however there was one problem….  My orchestration was so humongous it epitomizes the definition of super-monolithic.

A zoomed out view of the Godzilla sized orchestration

A zoomed out view of the Godzilla sized orchestration

I decided that this was a problem just waiting to happen, as the next developer who came along to modify the solution would look at this orchestration and rethink his career focus altogether.  I decided to break this orchestration up into smaller components that weave together to perform the same functionality as the monolith and that I would make a few performance tweaks while I was at it.

A bit of background on my project requirements first… This project requires me to expose one-way WCF services (keep in mind that BizTalk one-way WCF services still send back a void response and implement a request-response interface, but they relate to a one-way receive location), with authentication and authorization happening in the WCF stack, schema resolution with the use of an XML dissassembler pipeline component, many business rules being executed by the BRE Pipeline Framework as well as schema validation in the receive pipeline. If the message passes all the validation then it would be sent to the BizTalk message box at which stage a response is sent back to the WCF client and the message would get routed to the monolith of an orchestration for further processing. The solution demanded a high level of throughput with both the ingestion of messages by the WCF Service and the orchestrations processing 100 messages per second.

Before breaking things up I decided I would run a baseline performance test using SOAPUI, passing in a static WCF request message to one of the services over 30 threads with a 50 ms delay between service calls over a period of 5 minutes.  I found that the CPU on my BizTalk server was running hot at 100% throughout the test with not very heavy memory usage, while my SQL Server was wasn’t running too hot.  I also determined that each orchestration instance was creating about 2-3 persistence points on average and that the average time to complete running an orchestration was 570 ms.  The first orchestration instance started up within the same second my load test was started, and the last orchestration instance to complete did so in the same second my load test ended.

My SOAPUI load test results are as below and I found that my orchestration completion rate seems to be relatively similar to my WCF ingestion rate (which implies to me that more threads are being used by orchestration than ingestion given that orchestrations take longer to complete than WCF service calls), and I also found that a lot of the server’s CPU was dedicated to my processing host.  I was pretty confident that this was the maximum sustainable load for my BizTalk application given the current state of the application source code and configuration and environment configuration.

Monolithic test results

Baseline test results

 

I then got to work breaking my orchestration up into smaller pieces, and found that it could logically be broken down into 6 different orchestrations, one of which contained my exception handling logic and was reused about 10 times throughout the other 5 orchestrations which is a big plus of breaking up the orchestrations.  I decided to use the call orchestration shape rather than using direct binding through the message box or the start orchestration shape as performance was paramount in this solution and I wanted to avoid message box hops and persistence points in my orchestration.  My orchestration is also quite procedural in nature and I wanted my parent orchestration to be blocked while waiting for child orchestrations to complete (with the call orchestration shape the child orchestration is executed on the same thread as the parent orchestration and the parent is blocked till the child completes).  I got to work creating orchestration parameters in my child orchestrations with parameters being marked as having a direction of In/Out/Ref as my orchestration logic required.

Re-running my load test left me at a loss seeing a drastic drop in performance especially given that while I broke my orchestration up I made what I thought would be performance tweaks as well.  My average orchestration completion time was now 797 ms.  CPU on my BizTalk server was still running hot at 100% but it looked like more CPU was being dedicated to the orchestration host this time around.  Nothing else (memory usage, persistence points, resource usage on SQL server etc…) seemed different from my previous run.

All In Out test results

Test results on first attempt to break up monolithic orchestration

 

It then struck me that passing the parameters as In parameters (the majority of my parameters had a direction of In) must be causing a fair bit of overhead within the .Net CLR.  Typically in .Net passing a reference type as an In parameter would have relatively low overhead while passing a value type as in In parameter would have a higher overhead as a copy of the object would have to be made.  Repeating the test with all parameters set to be Ref or Out parameters gave me a much better result with the maximum sustainable throughput now rising (most likely due to efficiencies and streamlining of the called orchestrations) and the average orchestration completion time now dropping to 405 ms.

All ref

Results with all parameters having a direction of Ref/Out

 

Further testing highlighted that switching orchestration message parameters from In to Out or Ref parameters resulted in the biggest change in performance, while changes to value type parameters resulted in lower yet still substantial levels of change in performance and changes to reference type parameters appeared to yield no performance implications (this appears to be in line with .Net principles that reference types behave the same way when passed as In or Ref parameters).  My colleague Mark Brimble pointed me towards this BizTalk 2004 article which implies that all In parameters (with no exception explicitly stated for reference type parameters) are copied when calling an orchestration which sounds quite different from typical .Net behavior and didn’t match up with my observed results as I expected that reference type In parameters will not result in a copy being made.  I decided that I had to prove my theory.

I threw together an orchestration that receives a message with a distinguished element, the orchestration also spins up a variable of a custom class I created calling on the default constructor to set a default value to a contained string property, and also spins up a variable of type int32 with a default value.  These values are all traced to a file, and based on the value of the distinguished element in the received message one of two orchestration is called via a call orchestration shape.  Both of these orchestrations are used to update the message (messages are immutable so it can’t really be updated, however the reference to the message variable can be updated to point to a new message), custom object, and int32 variables, these updated values then being traced to file as well.  The catch is one of these called orchestrations accepts these parameters as Ref parameters and one accepts them as In parameters.  The calling orchestration then traces the values of it’s message/variables to the file to see whether they have been updated or not.

Orchs

Test orchestrations

 

The results of this test are below and they matched up with my understanding of .Net behavior.

Results

Test results

 

Based on the above one can surmise that a message behaves similarly to a value type in .Net (maybe not under the hood but in behavior) in that if you pass it as an In parameter and it’s reference is updated in a called orchestration then it’s value won’t be updated in the calling orchestration.  The same applies to value types as expected.  Reference types will be updated in the calling orchestration regardless whether they were passed in as In or Ref parameters.  If passed as a Ref parameter then regardless of the type of parameter it will have it’s value updated in the calling orchestration if updated in a called orchestration.

Now that we have proven the behavior of In/Ref parameters (an Out parameter is effectively like a Ref parameter except for the fact that there is no guarantee that the parameter value was initialized in the caller prior to calling the called orchestration) the big question is why the slowdown with In parameters?  The obvious answer for messages and value types is that an actual copy of the value is made for the called orchestration which would definitely have an impact on the CPU, the larger and more complex the footprint of the parameter in question the larger the performance implication.

In summary, if performance is very important to you and you intend to use the call orchestration shape then beware parameters with a direction of In.  If the reason you are using call orchestration shapes is purely to break up a large orchestration and you are calling on other orchestrations within your control then it should be quite safe to have a parameter direction of Ref even if you don’t actually require it to be Ref, but obviously this needs to be evaluated on a case by case basis.

My current project requires an orchestration to be built that will call out to one or potentially many WCF Services through a sequential loop, the specific services to be called on within the loop being resolved from the Business Rules Engine based on message context. The orchestration needs to provide for guaranteed delivery and has retries built around the WCF Service calls in case of routing failures since the logical send port is of the direct binding type, or soap faults encountered when calling the WCF Service in question, or exhaustion of retries on the send port, or in case the  orchestration doesn’t hear back from the send port after a specified timeout period. The orchestration needs to perform as quickly as possible and use the minimum amount of machine resources since it is a high throughput orchestration which needs to process many millions of messages per day.

I managed to achieve the above using the below patterns.

My orchestration looked a bit like the below after implementing this logic (note that I have cut out a lot of my internal logic in this screenshot, there was a lot more to it but this gives you a jist of the flow).

Orchestration before

While running load tests it occurred to me that by catching timeouts the regular way with a long running transactional scope I was effectively forcing a persistence point every time the scope completed. To make things worse, my scope was nested within two parent scopes for error handling and variable scoping purposes and I was forced to mark both of these as long running too since you can’t nest a transactional scope within a non-transactional scope. If there were no orchestration shapes in between the ends of my three scopes then the persistence points would be collapsed into a single persistence point but my error handling really didn’t allow for this so I now had a minimum of three persistence points to deal with per orchestration instance, which my gut told me would definitely be causing performance issues and draining my server resources under heavy load thus constraining my throughput.

An alternative I decided to explore was using the listen shape with a receive branch to handle response messages and a delay branch to catch timeouts instead and to change all my scopes to be non-transactional (I wasn’t taking advantage of compensation so didn’t think I would lose any benefits of using a long running transactional scope).  However it looks like because I was using a logical request-response send shape in my orchestration this was not possible and I was encountering the error “incomplete requestresponse; missing receive” as described in this forum post whose poster was trying to achieve exactly the same ends.  It appears that the listen/delay timeout catching pattern does not work with request-response logical send ports in orchestrations.

When sending out a request-response message from my orchestration on a direct bound send port I noticed that my orchestration had an instance subscription to receive back the response messages based on the BTS.CorrelationToken context property.  This led me to believe that using a request-response logical send port in an orchestration automatically generates a GUID value in the BTS.CorrelationToken context property and promotes it, using that value to receive the response message back to the orchestration. physical send ports appear to automatically copy over the BTS.CorrelationToken promoted property from the request to the response or fault messages.  I decided to do the same thing except with a one-way logical port for the send and another one-way logical port for the receive and by manually promoting the BTS.CorrelationToken context property, thus enabling me to use the listen/delay timeout catching pattern.

I created a one-way logical send port instead replacing my request-response send port and also created a new one-way receive port with an operation with a message matching the response message from the send port.  When constructing the request message I created a new GUID and set it’s value to the BTS.CorrelationToken context property on the message.  I created a correlation type containing the BTS.CorrelationToken context property, created a correlation set of the aforementioned type in my innermost scope which I initialized on my send shape and followed on my receive shape to force the property to be promoted.  I then created a listen shape and moved my receive into the first branch, and created a delay shape in the second branch to catch timeouts and put my timeout exception handling logic in that branch.  I could now safely mark all my scopes as non-transactional and my orchestration looked like the below.

Async Timeouts1

Load testing immediately garnered better results, with my transactions per second on a previously base-lined application rising from 78 to 90 (a 15% increase in throughput) and the CPU utilization on my message box SQL Server dropping massively.  I instantly felt vindicated that the extra effort had paid off, but then realized that I had lost out on my ability to catch SOAP faults and messages indicating retries on my send port had exhausted which was not acceptable.

To catch SOAP faults I had to add a new operation to my one-way receive port with a message type of BTS.soap_envelope_1__2.Fault, add a new branch to my listen shape in which I put a receive shape for a message of the same SOAP fault type and linked it to the new operation on the receive port, and also had this receive shape follow the same correlation set that I initialized on the send.  I could then run XPath statements against the soap fault message to extract the exception details and handle it accordingly. Since send ports copy over the BTS.CorrelationToken context property to all response messages including fault messages this wasn’t too hard to do.

The failed messages on send ports that had exhausted retries were a bit trickier to deal with.  This is because I could not find any clean way (there was one method which could work as described in this blog post but I really wanted to avoid having to receive back the original request message into my orchestration instance to support this pattern as that would add inefficiencies and affect throughput) to correlate these failed messages back to my orchestration as the failed messages were of the same message type as the original request message except they now had some error context properties.

I decided to take advantage of NACK (negative acknowledgment) messages instead (see this blog series if you want more information about generating NACKs on send ports).  NACK messages are simply messages of type BTS.soap_envelope_1__1.Fault however do not have a BTS.MessageType context property set against them.  They are also only generated by send ports upon retry exhaustion if there is an existing subscription for the NACK (or if you use the orchestration delivery notification functionality or BTS.ACKRequired context property but those weren’t suitable for my purposes).

I decided to use a loopback send port (the loopback adapter in question being developed by my friend and colleague Mark Brimble and is proprietary so can’t be shared, though you can find other implementations on the internet) to subscribe to NACKs off the WCF service send ports as well with an XMLReceive receive pipeline to resolve the message type of the NACK since they don’t have a BTS.MessageType context property by default.  An orchestration could have been used instead of a loopback send port however that means i would not be able to adjust the filter properties at runtime which wasn’t flexible enough for my purposes. See an example of filter properties on the loopback send port to have my WCF service send port generate NACKs.

Loopback

I then added another operation to my logical one-way receive port in my orchestration with a message type of BTS.soap_envelope_1__1.Fault, added a new branch to my listen shape with a receive shape for a message of type BTS.soap_envelope_1__1.Fault linked to the aforementioned operation which also followed the correlation set initialized on the send of the WCF Service request message.  I could then run XPath statements against the received NACK message to extract fault details and handle the exception appropriately.  The orchestration now looks like the below.

FullyImplemented

The one catch now is that I need to discard the failed messages that get generated as a result of the “routing for failed messages” flag being enabled on the send ports (I still need this enabled because my solution calls for error handling to be done from within the orchestration for this specific message flow rather than from send ports for guaranteed delivery purposes, and I want to ensure that send port instances do not remain suspended after retries get exhausted) as they will be created in addition to NACKs.  This does result in extra unnecessary messaging when retry exhaustion occurs, but this is expected to be the exception rather than the norm and was deemed acceptable for this solution. The same justification applies for the addition of the loopback send port.

Something else to keep in mind is that if the Web Service you’re calling is not a WSHttp binding (or equivalent WS* based) based Web Service then the fault you will need to catch will most likely be of type BTS.soap_envelope_1__1.Fault which is the same type as NACK messages.  In this case you would have to consolidate the listen branch for the SOAP fault and NACKs into a single branch and inspect the message to find out if it is a SOAP fault or a NACK before dealing with it appropriately.

There is no question that this solution is more complicated than using a request-response logical send port in your orchestration in combination with a long running transaction scope to catch timeouts, and adds more components for future developers to wrap their heads around as well as making life more complicated for support people.  However if throughput is of the utmost importance to you and every message per second processed by your BizTalk application makes a world of a difference, then this might just be the solution you need.

One of my pet peeves about the usage of dynamic send ports in BizTalk Server is the loss of flexibility that comes along with the programming model that is generally accepted when using them (I’m normally quite pragmatic rather than puritanical but for some reason this really grinds my gears). The loss of flexibility typically presents itself in the below ways.

  • In most scenarios that require dynamic send ports developers will introduce orchestrations into their solutions to setup the properties of the dynamic send port, even when the requirements would be better suited to a messaging only solution.
  • When orchestrations are introduced to cater for dynamic send ports they usually contain knowledge of the transport mechanism to be used as well as the send pipeline rather than making use of an external resolver. In my mind this is a loss in terms of separation of concerns and loose coupling.
  • When specifying a logical send port in an orchestration to be of a dynamic binding type, you are forced to manually bind the orchestration logical port to a physical dynamic send port rather than making use of property filters, thus introducing a further level of rigidity.

This technet article describes how dynamic send ports could be supported in a messaging only solution which in my view opens the door for a lot of goodness in terms of loose coupling and flexibility. This is done by setting the BTS.OutboundTransportType and BTS.OutboundTransportLocation context properties on the message prior to it being handled by the dynamic send port.  It highlights that one isn’t forced down the path of using orchestration to support the usage of dynamic send ports, and even if orchestration is required there is no requirement to make use of the dynamic logical port binding (which is equivalent to a specify-later binding) and one can instead use direct binding and make use of property filters for routing messages to the appropriate dynamic send port.

The aforementioned article only contains a sample which requires the hardcoding of the transport type and URL as parameters of a pipeline component. This blog post will explore taking this concept one step further through the use of the BizTalk BRE Pipeline Framework to conditionally (and thus dynamically) set the transport type and the URL (and potentially other supporting properties for the given transport type) that will be used by a dynamic send port in a messaging scenario.

To start with let’s assume we have a receive location that receives XML messages which contain either a FileName element or an EmailAddress element, these values being promoted to custom context properties (see the schema designer representation of this schema and the property schema below). The existence of these context properties should be used to determine how to handle the message.

Schema

If the message contains a FileLocation element then we want to write the message to the specified location.  If the message contains an EmailAddress element then it is to be sent out over SMTP to the specified address.  To mix things up we will also map the email messages to a flat file format, convert it to a flat file and set the subject of the email to “AITM sample”.

This can be achieved by creating two dynamic send ports (note that if not for the mapping/flat file requirement for the email messages then this could have been achieved with one send port, but I have introduced the extra complication to help you understand what is possible), each one these send ports having filters based on the existence of the aforementioned context properties.

Snd_OutboundMessagesSnd_OutboundFlatMessages

An outbound map can be applied on the dynamic send port that is used to send email messages while no outbound map is specified for the dynamic send port that is used to write to the file system.  The dynamic send port that writes the message to the file system will employ a PassThroughTransmit pipeline while the SMTP dynamic send port will employ a pipeline that makes use of the flat file assembler pipeline component.

On our receive location we will make use of a pipeline that contains an XML disassembler pipeline component which is used to promote the relevant context properties as well as the BREPipelineFrameworkComponent pipeline component which we will use to execute our business rules.

Pipeline

The DynamicSendPortResolver policy contains three rules, one for each of the two transport types we are supporting and an additional rule to throw an exception if no transport type can be resolved (just to be on the safe side).

The “Set File Properties” rule is fired when the custom FileLocation context property contains a non-blank value.  It sets the BTS.OutboundTransportType context property to FILE, and sets the BTS.OuboundTransportLocation context property to the value of the FileLocation context property.

SetFileLocation

The “Set SMTP Properties” rule is fired when the custom EmailAddress context property contains a non-blank value.  It sets the BTS.OutboundTransportType context property to SMTP, the BTS.OuboundTransportLocation context property to the value of the EmailAddress context property, and sets the SMTP.Subject context property to “AITM sample” as below.

SetSMTPProperties

The “Unknown transport” rule is fired when neither the EmailAddress or FileLocation context properties exist or their values are blank.  It throws an exception stating “A transport type could not be resolved” as below.

Unknown Transport

Dropping an XML file into the receive location with a FileLocation element results in the file being written to the file system as below.

FileWritten

Dropping an XML file into the receive location with an EmailAddress element results in the message being mapped and converted to a flat file format and sent out as an email with a subject of “AITM sample” as below.

EmailSent

Dropping a file with neither a FileLocation nor an EmailAddress element will result in the message being suspended as below.

Error

In the above example all the rules to resolve the BTS.OutboundTransportType and BTS.OuboundTransportLocation context properties were placed on the receive pipeline in the receive location.  This was purposely done because it is very important that the BTS.OutboundTransportType context property is set to the appropriate transport before the message reaches the dynamic send port as once the message has been received then it is too late to override the adapter to be employed.  It is of course still possible to override the BTS.OutboundTransportLocation so you do have some options there.  If you want a message to be routed to multiple dynamic send ports with different transports then you will most likely need to introduce an orchestration.

I hope this blog post portrays how dynamic send ports can be used in a more loosely coupled manner.

Most developers who have worked with BizTalk for a while will realize the benefits of using direct binding on orchestration ports to improve flexibility and loose coupling. What is not often realized is that using direct rather than specify later port binding comes with a bit of a trade-off in that the increased flexibility results in less hand holding for administrators, and it is altogether possible for them to stop/unenlist an orchestration or send port resulting in routing failures and unprocessed messages which might be difficult to replay. Seeing as guaranteed delivery is one of the biggest selling points on most projects involving BizTalk Server this is too big a hole to overlook. In this blog post I will detail how hand-holding is relaxed for direct binding and introduce error handling patterns that can be used in orchestrations to overcome routing failures.

One of the things most BizTalk administrators might notice when orchestrations use direct binding is that starting an orchestration no longer require starting all send ports that the orchestration publishes messages to. At run time this could mean that the loose coupling offered by direct binding has removed the guarantee that subscribers to your published messages will be active. If an orchestration publishes a message for which there are no subscribers then a non-resumable routing failure instance will be raised and a PersistenceException exception will be raised in the orchestration.

I have had some pretty concerned colleagues ask me about the dreaded PersistenceException. This is nothing more than a curiously named exception representing a routing failure. The way I tend to deal with such exceptions in guaranteed delivery scenarios is to create a scope around my send shape (possibly around other close by associated shapes as well) and to catch the PersistenceException (this is in the Microsoft.XLANGS.BaseTypes namespace and it’s assembly is referenced by default in BizTalk Server projects). If the exception is encountered then I raise an alert to administrators (via the ESB portal or relevant exception handling framework) advising of the routing failure, and suspend the orchestration instance. Once the administrators have fixed the problem they can resume the orchestration which will loop back to before the send shape and resend the message.

Single Message

Now of course this pattern comes with a bit more effort in terms of development but one has to ask whether their system can afford to lose a message or at the very least have to go through painful and manual message replay processes.

Another fun scenario I’ve encountered that can go very wrong is taking advantage of direct binding in orchestrations in tandem with a publish subscribe pattern with multiple subscribers to the same message. Say you have an orchestration that sends out a message which is subscribed to by a logging/auditing port and another port which actually performs an important action such as updating a LOB system. If the LOB send port was in an unenlisted state then the message would be directed to your auditing send port, no PersistenceException would be raised in your orchestration and the message would never update the LOB system. So much for guaranteed delivery…

What I would do in this case is create two copies of the same message in the orchestration, each with some sort of instructional context property that is used to direct the message to the relevant send port (the values of this context property being set to abstract values such as “Audit” or “UpdateLOB” rather than the name of the send port since this doesn’t steer away from the concept of loose coupling too far), wrap the send shapes for the two messages in an atomic scope so that only one persistence point is encountered when the messages are sent out (this is a whole other subject but it is important to keep your persistence point and thus your I/Os against your SQL server to a minimum), and wrap the atomic scope with a long running scope which catches a PersistenceException. I then implement the same exception handling pattern I mentioned earlier in this post.

Multiple messages

Once again this comes at a greater effort in development and introduces more overhead on the BizTalk runtime, muddies up your orchestration with logic which is arguably plumbing rather than business logic, and somewhat takes away from how loosely coupled your orchestration is.  That said it does ensure guaranteed delivery.

The exception handling patterns that I’ve discussed here stem from my own cautious nature but I have seen them pay off in multiple instances and have also seen the cost when such thought isn’t applied when required. I wouldn’t say they are required in every orchestration but I would at least encourage developers to think of these scenarios when they decide to what extent they are going to handle exceptions.  At the very least if such patterns aren’t implemented do think about putting a section or at least a blurb in your support documentation to discuss how to recover from such failures.

%d bloggers like this: