Hi all,
So I’m having a bit of a (large) issue here. I have two deployment servers on the deployment.local domain like so:
Server 1: DNS, DHCP, Domain Controller
HyperV: WDS with MDT
Everything has worked fine up until today where everything stopped booting to a UEFI network. The systems go onto IPv4 and after sometime come off saying ‘No offer recieved’ or similar.
I’ve tried:
Replacing Boot image
Restarting servers and switches
Restoring WDS to an earlier backup
Yet I can’t seem to get it going again. I’m trying to deploy around 100+ systems all from different suppliers that have all worked fine previously (for around 3-4 years).
3 Spice ups
Confused by what you wrote. Are WDS and MDT on the same server as DHCP? Same subnet? What about the clients?
Can you verify that you’re not out of IP addresses?
Are you using DHCP options, IP helpers, or both?
Is the WDS server authorized as a DHCP server?
Are you able to legacy boot? Do you PXE boot for anything other than deployment?
Hi Big Green Man,
Thanks for your speedy reply. Hopefully you can wake me from this nightmare.
- UEFI Boot does not work (Has worked for around 3-4 years)
- Some systems can boot to 1 of our 2 WDS Servers.
- All systems can boot to Legacy PXE
- All systems can boot from a WinPE PenDrive
- I am not using IP Helpers or DHCP Options
- DHCP and DC is on a different server, WDS is a HyperV server on that DHCP/DC Server (If that makes sense)
Everything has worked until about 11am this morning and has done for years.
So the clients, WDS server, and DHCP server are all on the same subnet/vlan then?
Sounds like DHCP is on a physical server that has the Hyper-V role installed and WDS is on a guest machine on that server? Specifics aren’t really important in this case, just whether or not they’re running on the same machine, be it physical or virtual.
Does anyone besides you administer this?
Yes, WDS, DHCP and Clients are all part of the same network (255.255.252.0). Yes DHCP is on the Physical Server and WDS is a on the HyperV role. I am the only one who administers this but did not originally set it up.
EDIT: DHCP appears to be working because the error on my clients shows the Station IP. It seems PXE isn’t responding?
So everything looks good so far. Can you verify that DHCP options 60, 66, and 67 didn’t get set somehow? Also for a couple things in WDS under the properties of your server:
PXE Response - what is this set to?
Boot - what is the boot policy set to? Be sure to include both known and unknown clients
DHCP - Are either of the boxes checked?
Advanced - What is DHCP Authorization set to?
So the clients are getting an IP address, but no bootfile?
Hi,
I’ve checked my DHCP. DHCP options 60, 66, and 67 are not set. Under WDS2 Properties both DHCP options for “Do not listen” and “Configure DHCP Options” are unchecked. Boot options are set to “Always continue” on both “Known & Unknown”. Advanced options, WDS is allowed to dynamically discover valid domain servers and is not “Do not authorize this WDS server in DHCP”.
And yes, Clients get an IP but no bootfile. Would you like to see captured packets at the time of the errors with Wireshark?
EDIT: Did I mention it was certain systems? Some of the same chassis will work. It’s split between Desktop & Laptops. Do you think maybe a boot driver is corrupt?
So literally everything that can be set is set correctly. No need for Wireshark yet. Let’s look at a couple other things:
- Were any Windows updates done before it broke? What OS is WDS running on by the way?
- What’s different about the few computers that can UEFI boot? Are they on a different switch maybe? Different model NIC? Different version of firmware?
If it were a driver, you’d still get the boot image, but it would error out when trying to connect to the deployment share. You’d already see the MDT splash screen at that point.
zuphzuph
(zuphzuph)
11
67 value for UEFI is: Boot\x64\wdsmgfw.efi
Secure boot enabled perhaps?
Hi guys.
Zuphzuph - Secure Boot hasn’t been tried yet, not all our systems have this feature to be enabled/disabled and it is required for OA3
Big Green Man - I’m glad everything so far is OK, however at the same time it deeply annoys me. Windows Updates were done, but I counter acted this by restoring a backup from months ago and the issue remained. Both the DC, DHCP & WDS are all running 2012 R2. However I quickly made a 2016 WDS Server and the issue still remained (So that rules our a Server issue?)
Systems that work/don’t work are on the same switch. I’ve tried different cables on the ones that don’t work and still have the same issue.
The weirdest thing is that I can have two systems that are EXACTLY the same and 1 will work and 1 wont. But they ALL work on a ‘WDS Key’ (A USB with the Boot image on it).
I’ve restarted every switch on our deployment network and brought them up 1 by 1. I’ve restarted our Router and both all deployment servers. It’s very odd…
At this point I’m thinking that the blame is with these individual systems rather than my network? Not sure what else to try/troubleshoot because I feel like I’ve tried literally everything 
Only other thing I have no touched is the NIC’s on the physical servers. They are in a NIC Team. I know one server was complaining about a MAC Address conflict between two NIC’s but I don’t think this is anything to worry about? Considering that this is one 1 server yet we have two WDS Servers. Sometimes redundancy doesn’t work very well when both your WDS servers stop working…
1 Spice up
So after 3 hours overtime I gave up and went home with a solution in my head if nothing was working in the morning… I came back to work this morning and everything is working again. Nothing happens during the night other than backups. I’ve raised an issue with Microsoft see if they have any reports of a similar thing happening. Hopefully I can figure out what caused this…
Well that makes no sense at all. But it also makes no sense why it stopped working in the first place, since everything you told me looks like it’s set correctly. Does anything get rebooted or shut off at night?
Nope, all that happens at a night is backups and DFS replication. I switched replication off as soon as I noticed a problem…
1 Spice up
So exactly 2 week later the same problem has come around again (Coincidence?). I honestly don’t know where to start with this again, knowing it’ll probably sort itself out later… Any suggestions?
Hi guys,
A different system on the same LAN cable as a failing one will work without issue. A failing system with a live installation of Linux will have an active internet connection and be able to ping domain.com.
So weird…
Gleaning through the thread, two things that I’m going to shine a light on for comment by people with deeper knowledge of the topics.
OP mentioned a MAC address conflict on a teamed NIC - significant or not … I don’t know.
Also, OP’s network is a /22 - Honestly I don’t know how WDS would behave in that case because I don’t have the experience … does PXE still treat the third octet as a subnet - or is it smart enough to know the difference?