Push Notification Fails with a 504 Server Time Out

While troubleshooting push notification failure issues with a client, I found an interesting problem.  The client had already configured the SRV record as required (http://blog.ucmadeeasy.com/), and disabled the URL filtering as required (http://support.microsoft.com/kb/2664650), but push notification was still failing with a 504 error code.  To take it one step further, we completely disable all IM filtering just in case.  However, we still received a 504 error (server timeout) from the Push service.

As background, the Push Notification Clearing House (PNCH) runs in the office 365 Cloud using Lync Edge servers and dynamic federation.  For more information on the 3 types of federation, refer to the article I wrote here: http://ocsguy.com/2011/04/20/a-few-words-on-federation/.

We were unable to troubleshoot the issue from the Office 365 side, so I decided to reconfigure my company’s Edge server with dynamic federation (it was configured with direct federation) and see if I could find any errors related to the customers configuration.

I began by removing the customer’s domain information from the Federated Domains tab within the Lync Server Control Panel in my Lync environment.  Next I signed into a test account on the customers Lync environment (jsmith@contoso.com) and attempted to IM an account in my environment (kevinp@tailspintoys.com).  The IM failed immediately and I began reviewing the UCCAPI log from my client and the SIP Stack logs from my Edge servers.

It didn’t take long to find the 504 error in the logs, including some useful diagnostic information:

In the “ms-diagnostics” line we see “No match for domain in DNS SRV results” followed by the domain name (contoso.com) and the A record usim.us.contoso.com.

The problem lies within the A record the Federation SRV record is using.  It doesn’t match the SIP domain.

Now you may be thinking “they are both contoso.com”, and if you are, you are not alone!  The catch however, is there is a subdomain in the A record (usim.US.contoso.com) that does not exist in the SIP domain.  This causes a failure to match the SIP domain with the SRV record.  Since they don’t match, you would have to use Direct Federation instead of Enhanced or Dynamic Federation to federate with this organization.  This would seem to be an easy fix, but since Office 365 only supports Dynamic Federation, the fix is a configuration change on the customer side.

To resolve the issue we created a new DNS A record to use for Federation (sip.contoso.com).  We also updated the Access Edge certificate to include this name in the SAN field. Once these steps were completed, Push Notification began working on the mobile clients.

Lesson learned: As a best practice, make sure you’re DNS A records for Federation don’t have a subdomain unless your sip domain does as well.

About Kevin Peters

My name is Kevin Peters.
This entry was posted in Uncategorized and tagged , , , , , , , . Bookmark the permalink.

4 Responses to Push Notification Fails with a 504 Server Time Out

  1. Steve says:

    Were seeing the SIP/2.0 504 Server time-out issue. Lync mobiliy was all working fine but now only the push notifications are not working to the iphones

    In tracing I find this – ms-diagnostics: 1034;reason=”Previous hop federated peer did not report diagnostic information”;Domain=”push.lync.com”;PeerServer=”sipfed.online.lync.com”;source=”sip.ourdomain.com”

    Is this an issue on our side or thiers? Would redoing the entire federation commands for push fix this issue?

    Ive verfied its not a connectivity issue to sip.online.lync.com, our certs & SRV record are fine and the URL filtering policy is correct as well.

    Other federations are fine….

    • Kevin Peters says:

      Hi Steve,

      This is probably an issue on your side. Do you have any partner companies you can test open federation with and get them to send you their client and SIP stack logs?

      -kp

  2. Luke says:

    Hi Kevin,
    We have the same issue as Steve. We can successfully run the ‘Get-FederatedPartner’ command but the ‘Test-CsMcxPushNotification’ command fails with ‘A 504 (server time-out)’. In tracing I : ms-diagnostics: 1034;reason=”Previous hop federated peer did not report diagnostic information”;Domain=”push.lync.com”;PeerServer=”sipfed.online.lync.com”;source=”sip.ourdomain.com”
    Even after a re-boot of front end server, push notificaiton still dont work.

  3. oh!!! you saved my life, again!!! thx!!! subdomain, dah!!!

Leave a reply to Kevin Peters Cancel reply