Well today started like any other day, servers humming along desktops running around causing havoc and destruction and printers being well printers.

Just before lunch I thought I would check up on my VM Ware as I will need to ensure Spiceworks is running OK. Well what do find but trouble with the biggest capital T I can find. The root user can no long start and stop virtual machines, has been demoted to read only status and the system is stuck trying to enter maintenance mode.

I have added servers, deleted servers, added and removed images for creating servers and desktops but this was a new one on me. I figured the Server may have had one of those rare window moments where a restart will fix it. After a few minutes of waiting the VM system was restart and was able to log in without any issue on root. But to my dismay I was still having the same issue. I can not do anything. So I went to a admins best friend the internet and Google. After reading about 15 post i figured out how to change a forgotten password, how to change admin level when the VM system is part of a domain but a stand alone system not so much.

So I dug a little deeper and found an old friend who is retired and living on a beach somewhere online and he told me the quickest way to fix the issue. (also have a heart attack if you not ready for it.) On the server you have the option to return to default config. This will remove all VM’s and reset root to admin status. You will also need to reset the password for root once you restart the server.

So with the VM’s already shut down I hit the reset button. And the server restarted, I reset the root password and head over to my desk to connect to the VM server, NO connection (PANIC) I head over to the server and figured out that the network card was also reset. sixty seconds later the network card was reset to what it needs to be. Returned to my desk and the server is back.

It took me about 5 minutes to figure out how to get my 2 VM’s back up and running but I can say they are running smoothly right now.

Morale of the story:

  1. Always have a backup Administrator username and password stored somewhere that an admin can get to so that you don’t have to do the above.

  2. Always ensure you have regular snapshots of your VM systems to ensure if something like this happens a restore can happen - I had none - Running out of Disk space - will be setting up a new one soon.

  3. Don’t let old friends give you heart attacks by forgetting to give you important information when you are doing something for the first time.

I hope this helps someone else resolve this type of issue in the future.

Paul

@VMware

15 Spice ups

No- regular snapshots is a BAD IDEA. The snapshots are just delta files, and you will run out of space- once that happens, you are hosed. Plus, you will kill disk I/O very quickly once you wind up with nested snapshots. Snapshots should only be used to create backups, quickly test updates, etc, but should be deleted as soon as you are done- they never should be around for long.

Snapshots are NOT BACKUPS, and NEVER SHOULD BE TREATED AS SUCH- maintain backups consistently. Veeam, Unitrends, ghettovcb… anything is better than maintaining snaps.

Otherwise, glad you had some luck getting it back up and running.

12 Spice ups

Thanks for the Information about the snapshots.

Paul

1 Spice up

Agreed, snapshots are “one offs”, not things to be automated.

4 Spice ups

Friends don’t let friends consider snapshots backups.

3 Spice ups

What about replication ? Is that a backup? I always get terrible looks here when I suggest it isn’t…

But yes, nothing will destroy your drive space faster than a bunch of forgotten snapshots hanging around.

Have you figured out why your Admin settings were changed in the first place???

That is one mystery I do not think I will be able to solve any time soon. I just hope it never happens again.

2 Spice ups

Agreed, snapshots should only be short term. I have vmware set to email once a snapshot has reached 10gb and start emailing the on-call person after 20gb. Even with that i still check the storage views of the cluster to make sure i don’t have any active snapshots.

Besides slowing things down, committing a snapshot takes an equal amount of space so you can run into a situation where you are unable to commit the snapshot.

Want a snapshot to ruin your day? First create a snapshot, perform some IO intensive activity like a defrag or sdelete, and then watch your disk space disappear.

Agreed, even with replicated snapshots your still not protected. Why? Lets say you integrated VMware with your AD. An attacker gets into your network and is able to give himself the proper credentials to access the virtual centers. Said attacker then deletes all your vm’s on both servers.

Who gives you dirty looks when you suggest replication isn’t backup? Give them dirty looks right back.

Replication ISN’T BACKUP. It’s Replication- anything that gets hosed on one side will get replicated and hose the other side. The only backups are backups.

1 Spice up

Right, that’s why he should return their dirty looks.

More to the point, snapshots or not, if you couldn’t get in to the system to power them on - even if this was a backup system you are still at the stage of no hope.

But as almost everyone has already pointed out, snapshots are not designed for, nor should they be used as any form of backup, ever!

Yes you should always have at least 1 other account with root privileges too if only as a backup.

Did you check the event logs? You should be able to find some useful info there!

Quick Comment: IMO snapshots should ONLY be used for a quick way to roll back from a change that you suspect may break something very badly. You do the snaphot, make the change (asking Father God for His blessing), and if it works then delete the snapshot. Otherwise rollback.

I did a snapshot on a production machine once and forgot about it for several months. Very bad. The delta was huge and when I was finally able to address my mistake it took about 8+ hours to combine. I did not do that again. (I think…)

1 Spice up

We had one once on an Exchange server before migrating 300GB of mailboxes to it. Yeah, we built a new Exchange server and migrated the mailboxes again rather than try to commit that.

Sometimes people use redundancy and backup interchangeably. Replication is a redundancy measure.

Redundancy measures are fail safes.

Backup is an archival process to ensure a last line of defense after all fail safes are gone.

2 Spice ups

And snapshots are easy to forget you have, so if you do leave them in place, for any reason, make sure you set a reminder or two in Outlook (or whatever) to bug you to delete them.

Until I saw this comment, I had forgotten about two I had created (shame on me).

I use the method described above and have vSphere email me when a snapshot reaches 2GB.