Hi All,

We are attempting to set up an on-prem Conditional Forwarder (Windows Domain) to Azure across our s2s VPN. It only works for a few seconds.

We have a DNS Private Resolver set up in Azure and a site-to-site VPN.

When I add the CF, I get the happy green checkmark after putting in the IP of the Azure DNS Private Resolver and click OK. Any nslookups work for about 5 seconds and then they start to resolve to the public IP, and it never goes back to resolving to the private.

If I add the IP of the private resolver to the nslookup, I get the correct internal resolution every time. I’ve tried it both from my PC and the DC (DNS server).

If I delete the CF and re-add it, it works for 5 seconds again and then starts resolving publicly again.

Any help, tips, or steered directions are appreciated!

5 Spice ups

If I am not wrong, you might be using the wrong tools for the job…

If you are talking about public clouds like Azure or AWS for example, there are 2 main routes in and/or out from the cloud…

  1. Via VPN
  2. Via Internet

Qn 1 : When data packets goes from DC to Azure, does it return via the same path (eg VPN) ?
Qn 2 : When data packets goes from Azure to DC, does it return via the same path (eg VPN) ?
Qn 3 : what happends to the above 2 routes if VPN is down ?
Qn 4 : is this a lab or corporate usage ? Coz I would think MS or Azure would have someone to guide for these slightly more complicate setups

What we normally do is to use a router appliance on prem and a router service on the cloud…that will direct data traffic to and from DC & Azure to use the VPN ?

Any nslookups work for about 5 seconds
Presumably you mean specifically any nslookups for dns names matching the CF domain using your internal DNS server with the CF configured - work for about 5 seconds.
Can you confirm that?

then they start to resolve to the public IP
Do you mean that then your dns server starts to not use the CF and uses a forwarder or root hints (whichever is configured) ?
The use of public/private IP does not help with the detail of the issue - IF you are saying a CF stops working then it is WHERE the dns name is resolved that is important no the end value.
Again making a wild guess the domains you are trying to conditionally forward exist externally on the internet and so that is what you are referring to as ‘public IP’.
So you have a specific domain suffix that exists on public internet but you want to forward it to your Azure dns server and not use the internet.

If you enable DNS server logging on your internal DNS server with the CF configured and then run tests - the log should as a minimum confirm that the server is not using the CF, hopefully it will detail why.

@adrianyong4136 ​ Maybe I’m misunderstanding your post, but we do use our router to connect the VPN with the Azure vnet, and we have static routes and policies set up for traffic flow. In the logs, I do see that the DNS traffic is going across the VPN, and there is return traffic on that session. The subscription in Azure is for corporate Dev, but it is heavily used, so I do not want to take it down for a test.
@matt7863 ​ Yes, when we run an nslookup from a PC on-prem that matches the CF, it only resolves using the CF for about 5 seconds. The resolution server every time is our on-prem DNS. The behavior I’m seeing is that the CF will get used for the first 5 or so seconds, and then it will be ignored. Yes, the domain is azurewebsites.net. In the CF, we are defining the FQDN of the resource in the subscription. I will try your suggestion to enable DNS server logging and see what I find.
I should also mention that when I look at the properties of the CF after it stops working, the green checkmark is gone and we have the red circle with the X. It seems like communication between is dropped, so the DNS server ignores it.

Are these parameters defined in the Conditional Fwd.Zone.?
If forwarderTimeout is very low, it may switch to resolving the query itself. You may try to increase timeout, and set UseRecursion $False for this zone if you do not want the query solved only via conditional frowarder.
-ForwarderTimeout
Specifies a length of time, in seconds, that a DNS server waits for the forwarder to resolve a query. After the DNS server tries all forwarders, it can attempt to resolve the query itself. The minimum value is 0. The maximum value is 15.

-UseRecursion
Specifies whether a DNS server attempts to resolve a query after the forwarder fails to resolve it. A value of $False prevents a DNS server from attempting resolution using other DNS servers. This parameter overrides the Use Recursion setting for a DNS server.

Okay, now this is odd.

I was unable to get any details from the DNS Server Event Log. There are so many queries going through the server that I was unable to see them in the log. By the time I disabled the log after running an nslookup from my on-prem PC, so many other queries came through that it pushed them out of the viewable list. I exported the log, but it will gave me the same set of results.

I installed Wireshark on the on-prem DNS server that is acting as the local resolver, filtered for the IP of the Azure DNS Resolver, and ran an nslookup from my PC. The resolving server was this on-prem server, and the resolution was the public IP (20.x.x.x). I see the request go out and then come back immediately in Wireshark, with the response being the private IP inside our Azure tenant. So, it seems the on-prem DNS server that is resolving the nslookup is querying the Private Resolver and getting the response that I’m expecting (10.x.x.x), but the on-prem DNS server is not using it for the nslookup response, but instead responding with the upstream DNS resolution from the internet DNS.

Odd. Maybe it is getting two responses.

Going back to your post before this - it is interesting that in the CF properties it goes red not green tick indicating a comms issue. Yet you see it work in the wireshark. And you previously stated that a direct nslookup to the azure dns server also works.

I like Erkin’s suggestions especially " -UseRecursion" so that it will not attempt other methods to resolve.
I wonder if the dns server in your Azure is not authoritative.

Does it work if you change the CF domain to the ‘domain’ and not a full ‘fqdn’ ?

When you use DNS Private Resolver, you don’t need a DNS forwarder VM, and Azure DNS is able to resolve on-premises domain names.

If you use your router to route HQ data to Azure & Azure to HQ, then by right there should not be a 2nd route going via the Internet or the usage of CF ?

The only way to enter the backend of the Azure Servers or services should only be via VPN, RDP into jumphosts (via secondary VPN or management VPN) or via web applications (if the applications have a web server).

What you are facing might be an Ox-bow lake syndrome where data is somehow taking a secondary route thus isolating the “primary” route.

For security reasons, data should not be using Internet route as does that mean people from the Internet can somehow access the servers (if they know how) ?

@erkind39 ​ The timeout is set to the default of 5. I tried setting it to 10, but the nslookup results come back in less than one second, so I didn’t think that would help. When I use the norecursion option with nslookup, I get no resolution, so I didn’t think that would help either.

Also, we’re not using a DNS Forwarder VM in Azure, just the Private Resolver.

The fact that I get no resolution without recursion is curious to me. Do CFs not cache results?

@matt7863 ​ an update to the Wireshark results - it now seems to be inconsistent whether or not I see it using the CF in Wireshark. And the message that I see when the red X appears is that the server is not authoritative for that zone, which does make sense since it’s private and not the public DNS server for azurewebstires.net. It cannot be authoritative, correct? I did try changing the CF to just the domain name but had the same results.

@adrianyong4136 ​ I had to look up Ox-bow Lake Syndrome, but I see what you are saying. Yes, the FQDN is resolvable on the Internet, but the public access is disabled. I thought that the CF would stop the name queries from using the public route.

Is it applicable for your scenario? (Split-Horizon)

When a VM inside Azure V.Nw does nslookup, does it get internal or external IPs?

Sİnce it is Site-To-Site VPN, DNS resolver cache may treat your DNS on premise as external and provide you external IP addresses? Or your OnPremise DNS Server may be setup with DNS policies such that internet based clients get external IPs, whikle office based clients get the internal IPs

Did you ever find the solution to this? We’re having the exact same problem!

So one thing I’ve found - I can get it to work if I don’t replicate the conditional forwarder through our domain.

If I just create it on one DC locally, it works. But the second I set it to replicate to other DCs, it stops working.