My company is currently in the process of migrating our client-facing web server. This has been an ongoing project for several months(namely because we’re making a very, very, big leap in technology) and we are now less than a week away from our migration window closing. We have hit a wall with one of our client-side processes that needs to be fixed before we can go live with the new server. Everything else seems to be working correctly though.<\/p>\n
Advertisement
In short, one of our stored procedures keeps timing out. It works fine on our live server(completes in <1 second), however when we try to run the process on the new server, it times out after 30 seconds. Through testing and tracing we have pinned down the exact procedure that is timing out, but have no explanation as to why it works in our production environment and not on our replacement server.<\/p>\n
Advertisement
Some possible explanations we have:<\/p>\n
Network Latency - Unlikely at this point, but possible because the production server and it’s replacement are in two different physical locations with low bandwidth between them.<\/p>\n
Something to do with MSDTC - We encountered some errors with this process that involved MSDTC. We have corrected those errors and verified that the connection between the new server and the SQL server is working correctly via DTCPing.<\/p>\n
Something to do with AD DS - Our new server is part of our domain. The old one is not joined to a domain, nor is the SQL server. However we have not seen any authentication errors, or Networking errors during troubleshooting of this particular issue.<\/p>\n
What i have tried:<\/p>\n
After googling SQL server timeouts and finding several similar cases, i tried recompiling the affected stored procedure with an sp_recompile query, and resetting stats with an sp_resetstats query. neither solution has yielded any change.<\/p>\n
I have traced the transaction(via SQL Server profiler) on both the servers and they are both making all the same calls up to this one point. what’s even more baffling is that the same procedure that keeps timing out is successfully called earlier in the trace on both servers.<\/p>\n
Unfortunately i cannot post the trace files here because they contain sensitive information.<\/p>\n
Relevant details of our servers and setup:<\/p>\n
Note: None of these servers are running in virtualized environments, each one has dedicated hardware.<\/p>\n
Note: Both the Production and new servers are connected to the same SQL database.<\/p>\n
Production Web Server:<\/p>\n
\n
\n
NAME: SVR23<\/p>\n<\/li>\n
\n
SYSTEM: Dell PowerEdge 2950<\/p>\n<\/li>\n
\n
OS: Windows Server 2003<\/p>\n<\/li>\n
\n
IIS: 6.0<\/p>\n<\/li>\n
\n
CPU ARCH: x86 (32-bit)<\/p>\n<\/li>\n
\n
LOCATION: Site 2 \nSQL Server:<\/p>\n<\/li>\n
\n
NAME: CAP1<\/p>\n<\/li>\n
\n
SYSTEM: Dell PowerEdge R710<\/p>\n<\/li>\n
\n
OS: Windows Server 2008R2, SP1<\/p>\n<\/li>\n
\n
SQL SERVER: 2008<\/p>\n<\/li>\n
\n
CPU ARCH: x64 (64-bit)<\/p>\n<\/li>\n
\n
LOCATION: Site 2 \nNew Web Server:<\/p>\n<\/li>\n
\n
NAME: SVR23A<\/p>\n<\/li>\n
\n
SYSTEM: Dell PowerEdge R440<\/p>\n<\/li>\n
\n
OS: Windows Server 2016<\/p>\n<\/li>\n
\n
IIS: 10.0<\/p>\n<\/li>\n
\n
CPU ARCH: x64 (64-bit)<\/p>\n<\/li>\n
\n
LOCATION: Site 1<\/p>\n<\/li>\n<\/ul>\n
Any help or insight would be greatly appreciated.<\/p>\n