I ran into a very weird situation today with our Windows Deployment Server and PXE booting a VM.

Our setup is 6 Windows Server 2019 Hyper-V servers connected to a pair of Supermicro 25G switches via Switch Embedded Teaming.

All the VMs involved in this are in a single Vlan, so the network config is flat as far as that goes.

We have a DHCP VM server, separate from our WDS VM server, and that is configured per MS’s recommendations. IE no option 66 or 67 in the DHCP, and let the WDS service handle setting up the PXE client.

If I put the DHCP VM, the WDS VM, and the new VM all on the same host, Everything works just fine.

However, is I have any one of them on separate a host, it’s like they loose communication with each other.

I also noticed that the MAC address assigned to the PXE client doesn’t show up in the SuperMicro switch’s MAC tables. Which seems to be very weird. I need to investigate this, as that seems like a very likely cause of the problem.

Note, Everything works just fine in windows, DHCP, ect, it’s just something with the PXE boot process.

Has anyone run into anything like this before?

Thanks!

6 Spice ups

Have you assigned a static MAC address to the VM? I know in VMWare it will change unless you do this.

Yes, we use static MAC for all of our VMs

OK first let me say I don’t know much about hyper-v or wds, but I do know quite a bit about dhcp and pxe booting.

When you pxe boot a target computer it sends out a Discover broadcast, any dhcp server that hears that Discover packet will respond with an Offer packet. If the Offer packet has the details in it the target computer needs it will then send a formal Request for the specific data it needs. The dhcp server when the send an Ack or Nack packet to confirm the settings have been accepted. This process is called DORA. It is all handled over broadcast communications. Now enter the WDS server running its ProxyDHCP server (udp port 4011). The ProxyDHCP server also listens for a target computer’s Discover packet. It then returns its Offer packet with dhcp option 60 set to indicate its a ProxyDHCP response not an official DHCP response (hint double check your dhcp server to make sure you are not sending dhcp option 60 out unless your dhcp server and WDS server are in the same VM. This arrant dhcp option 60 is know to gum up the works). So the client receives 2 dhcp Offer packets, one from the real dhcp server and one from WDS’ ProxyDHCP server (with dhcp option 60 set). The DORA process completes with the main dhcp server, at the end of the DORA process the target computer will reach out to the ProxyDHCP server to get the boot server and file name for booting. The next step is for the target computer to reach out to the named boot server (WDS in this case) with the requested boot file over tftp (udp port 69) to load and boot the bootstrap program.

That is how it works. So now where to look to find out where its going wrong… If I had to make a guess based on your truth table you provided if you move one of the VMs off host one then the process stops working. That is telling me for some reason hyper-v is not letting the broadcast messages leave its internal vSwitch. Normal debugging of pxe booting issues we would use wireshark on a witness computer (third computer on same subnet as the other actors) using the capture filter of “port 67 or port 68” Since the dhcp process depends on broadcast messages this witness computer will see the entire DORA process. So… If you loaded wireshark on an external computer and started the capture with the filter I provided and also load wireshark on a computer withing host one. Both computers running wireshark should see the pxe boot request from the target computer. If only the internal witness computer sees the DORA process then you know the problem is somewhere between the internal vSwitch and the external network.

@george1421

Thank you for the very in-depth reply.

Here is what I’m seeing using wireshark.

Host 1:

Wireshark VM, PXE Client VM, and Windows client VM

Host 2:

DHCP Server

PXE boot client: Wireshark sees the DHCP Discover, and DHCP Offer packets. But nothing else. Process seems to stall.

Windows Client: DHCP request proceeds and completes as normal.

That tells me that something is treating Windows DHCP requests differently than PXE DHCP Requests.

Not sure if it’s the Hyper-V switch, or the physical switch yet.

So, plugged in a hardware machine into the switch.

Hooked it up to that Vlan, and PXE booted it, and everything works just fine.

That tells me it has to be something specific to the Hyper-v PXE client.

Ok, so here’s something to make it more fun.

I fired up a Gen1 VM, and PXE booted it.

Works fine.

Then fired up the same Gen2 VM I was using before to test it.

And now it works just fine too.

Haven’t changed anything since trying it last time when it failed.

That is really strange now that everything that was failing is working now. I trust it will stop at any time without any changes is typically how it goes.

I don’t know hyper-v but a gen1 is bios based and gen2 is uefi based?

Well technically a pxe boot and windows dhcp are the same thing. For a pxe boot dhcp options 66 and 67 are set. For windows it doesn’t care so it just ignores those options. To a pxe booting client it remember the values were set and after the dhcp process it will use the values in those fields to download the boot loader.

Now your wireshark VM should have seen 2 offers. One should be from the dhcp server IP address and the other from your WDS server. If that WDS ProxyDHCP offer doesn’t get to the target computer it will never pxe boot, but it will pick up a dhcp IP address.

@george1421

Yes, It already did stop working.

In wireshark, what I see is the discover packet, then the offer packet from the DHCP server. Then nothing.

I need to re-do my testing scenarios and compare them more closely, because I don’t believe I saw the WDS server sending out a response.

So if the PXE client sees the DHCP offer packet, but it’s missing info that it needs, such as DHCP option 60 info, then does it ever send a DHCP request packet?

In regards to WDS, there should be a bootp service or something named like that. That is the WDS ProxyDHCP server that will respond to Discover packets. Now that service needs both the bios and uefi components installed in WDS. Thats about the extend of WDS I know about.

The client will send a Request to any Offer that has the fields its looking for. If a client only receives an ProxyDHCP response, it will send out another Discover until it gets an authoritative DHCP response with the fields it needs.

Now its rare but its possible if there is a dhcp-relay service between the dhcp server and the target computer, once the client receives an Offer the dhcp-relay agent will switch over to unicast messaging with the target computer. I have seen it before, but its a rare situation. Normally everything is broadcast until the target computer requests the bootloader over tftp.