Tag Archive: Service Bus Relays


One of the key areas to focus on when using Service Bus Relays to expose on premise BizTalk hosted WCF services externally without making any firewall changes is availability. Service Bus Relay endpoints in Azure will only be enlisted upon initialization of your on premise service. NOT when your application pool starts, NOT when you create your Service Bus Namespace, but only when a request is made to the local endpoint of the service. Browsing (effectively an HTTP GET) to the local .svc file on the BizTalk VM serves just as well to enlist the Azure Service Bus Relay endpoint.

A common solution to this problem is to use the IIS 8.0 Application Initialization module which has been really well documented here. What this effectively results in is that your .svc file is activated any time IIS resets, your application recycles, or your app domain reloads. The Application Initialization module is also available for IIS 7.5 and can be downloaded here.  This results in your relay endpoint being enlisted in Azure.

Application Initialization is an absolutely core part to any Service Bus Relay service, regardless whether the backend service is a vanilla WCF service or based on a BizTalk receive location. No WCF Service Bus Relay solution should be built without this in mind.

However, I found that when dealing with BizTalk there is an additional consideration to keep in mind. What I observed is that if your BizTalk environment encounters outages due to an inability to connect to the BizTalk Management database (due to network or database issues) the services (not the application pool, not IIS, the services themselves!) will get shut down by BizTalk. When BizTalk recovers from the outage the services will not get spun up again until someone calls on the local service endpoint or browses to the local .svc file. Because the IIS application pool has not restarted, Application Initialization will not kick in and thus your endpoint will not be enlisted in Azure.

The solution I have put in place is to generate keepalive requests to the local endpoint every minute. This makes me feel a bit dirty, but I haven’t found a better solution yet, so I will detail it for you. If you can think of a better option, please do share it and I will update this post.

What I have done is setup a receive location based on the Scheduled Task adapter that generates a keepalive message every one minute (I’ve chosen quite a regular interval because availability is really key to my services, choose your interval appropriately) that gets routed to a solicit-response WCF-WebHttp send port (actually in my case 4 send ports, one corresponding to each service being exposed by Service Bus Relays that I want to keep alive). The endpoint address on this send port is pointed towards the .svc file for the service I want to keep alive, and the “HTTP Method and URL Mapping” section of the send port’s configuration is set to “GET” since we want to perform an equivalent action to browsing to the .svc file.

webhttp

One more thing of note is that since this is an HTTP GET, the target endpoint is not expecting a request message body, so we must use the WCF-WebHttp adapter’s functionality to supress it, as per the following screenshot.

Supress

Because I’m not really interested in the response messages being returned to the send port, I route these messages to a file send port which uses the BRE Pipeline Framework to discard the message (utilizing the NullifyMessage vocabulary definition in the BREPipelineInstructions.SampleInstructions.HelperInstructions vocabulary).  The address on the send port can be set to any folder since no file will actually get written out by the adapter.
Discard

Using the WCF-WebHttp adapter and pointing it directly to the .svc file helps minimize the requirement for additional development since you aren’t forced down the path of exposing additional operations in your service to cater for the keepalives.

While working on a project using the WCF webHttpRelayBinding binding with SAS based authentication over transport security, I found that my services were taking a very long time to spin up (30-60 seconds) and that my runtime performance was a bit less than optimal in terms of latency (and I had proven that the latency was not a result of my backend service).  To give you an idea what I was working with, my web.config file had contents similar to the below in the system.serviceModel element.

      <endpointBehaviors>
        <behavior name="sharedSecretClientCredentials">
          <transportClientEndpointBehavior>
            <tokenProvider type="SharedAccessSignature">
              <sharedAccessSignature keyName="keyname" key="key" />
            </tokenProvider>
          </transportClientEndpointBehavior>
        </behavior>
      </endpointBehaviors>
    </behaviors>
    <services>
      <service name="Microsoft.BizTalk.Adapter.Wcf.Runtime.BizTalkServiceInstance" behaviorConfiguration="ServiceBehaviorConfiguration">
        <endpoint name="RelayEndpoint" address="https://sbnamespace.servicebus.windows.net/service" binding="webHttpRelayBinding" bindingNamespace="http://tempuri.org/" bindingConfiguration="RelayEndpointConfig" behaviorConfiguration="sharedSecretClientCredentials" contract="Microsoft.BizTalk.Adapter.Wcf.Runtime.ITwoWayAsync" />
      </service>
    </services>
    <bindings>
      <webHttpRelayBinding>
        <binding name="RelayEndpointConfig">
          <security relayClientAuthenticationType="RelayAccessToken" mode="Transport" />
        </binding>
      </webHttpRelayBinding>     
    </bindings>

I didn’t observe such problems on my development VM (which I was running with pretty much no firewalls behind it), but did observe this on my client’s UAT environment. This was in spite of following Microsoft’s guidelines that suggest that you should have outbound TCP ports 9350-9354 open on your firewall to enable Service Bus connectivity.  I went through an exercise using the PortQuiz website to prove that these ports were indeed accessible from the UAT server so the performance issues were puzzling.

Next up, I spun up a fiddler capture.  To start with I applied the below filter into Fiddler to get rid of the extra noise.

tcp.port eq 9350 || tcp.port eq 9351 || tcp.port eq 9352 || tcp.port eq 9353 || tcp.port eq 9354

I then initialized some of my services (I shut them down forcibly and then spun them up again) to observe which ports were in use.  I saw that the conversation with Service Bus was being initialized on port 9350 as expected, however that appeared to be the end of the story.  I wasn’t seeing any comms on ports 9351-9354.  I then right clicked one of the displayed records in WireShark and chose “Conversation Filter -> IP” which updates the filter such that it displays anything with a source or destination IP address matching those on the selected record.

This suddenly resulted in a whole lot more records being displayed and helped me get to the root of the issue.  What I was observing was that after Service Bus made the initial connection on port 9350, it attempted to continue the conversation on port 5671 (AMQPS or AMQP over SSL) which hadn’t been opened on the firewall.  This connection attempt was obviously failing, and the observed behavior was that some retries were attempted with fairly large gaps in between until Service Bus finally decided to fall back to port 443 (HTTPS) instead.  Pay particular attention to the lines in the following screenshot with the numbers 1681, 2105, and 2905 in the first column.

2.6 Capture

This explained why my service was taking a long time to start up (because Service Bus was failing to connect via AMQPS and was going through retry cycles before falling back to HTTPS) and also explained why my runtime performance was lower than my expectation (because HTTPS is known to be slower than TCP).  However this didn’t explain why Service Bus was attempting to use port 5671 rather than 9351-9354 as per documentation.

Repeating the same test on my own VM showed that Service Bus was continuing the connection on ports 9351-9354 as expected… So why the difference? On the suggestion of my colleague Mahindra, I compared the assembly versions of the Microsoft.ServiceBus assembly across the two machines. You can do this by running “gacutil -l Microsoft.ServiceBus” in a Visual Studio command prompt, or by manually checking the GAC directory which is typically “C:\Windows\Microsoft.NET\assembly\GAC_MSIL” for .NET 4+ assemblies.

Voila. I found that I was running version 2.1.0.0 on the machine that was behaving correctly, and version 2.6.0.0 on the machine that was misbehaving. It appears that the protocol choosing behavior for Service Bus relays has changed sometime in between these two assembly versions. I have not pinpointed exactly which version this change occurred in, and I don’t yet know whether this change was by design or accidental. Either way, Microsoft have not yet updated their documentation, which means that others will be as confused as I was.

So what are your choices?

  1. You can downgrade to an older version of the assembly.  2.1.0.0 will definitely work for you, but you might be able to get away with a slightly higher version which is less than 2.6.0.0, but it will be up to you to establish which versions are acceptable since I haven’t managed to do this.  You’ll need to update the webHttpRelayBinding binding registration in your machine.config files (or wherever you’ve chosen to register it if you’ve gone for a more granular approach) to point to the older assembly as well.
  2. You can choose to stick with the latest version of the assembly and open up outbound TCP connections on port 5671 on your firewall.

I chose to stick with option #1 because I’m not sure at this stage whether this change in behavior is intentional or incidental, and also because my impression is that raw TCP over ports 9351-9354 would be faster than the AMQPS protocol.  You will find that option #2 is also functional.

With the older version of the assembly in play I could not see traffic on ports 9351-9354 as expected, my services were spinning up in less than a second, and latency was much more in line with my expectations.
2.1 Capture

%d bloggers like this: