Hi All

We recently migrated to a new Dell server to replace our outdated and underpowered trio of IBM servers. We now have two servers:

  • A Dell R720 with 2 x Hex Core Xenon E5-2630 V2 CPUs, 16 x 10k 600GB 2.5" SAS drives RAID 10, ~50GB RAM
  • An older IBM server with 2 x quad core processors, 20GB RAM, ~200GB disk space

Currently running

  • 4 VMs - SBS2011, SQL Server 2008 R2, RDS (2008 R2), Windows 7
  • VMware Data Protection, VMware VCentre

We used to have a SCSI tape backup for the old servers and migrated the SCSI card and tape drive to the new Dell server. This is connected through to the SBS2011 VM which runs Backup Exec 2014. I realise this isn’t officially supported but it had been working fine with the old servers in this configuration for over 12 months.

Now to the issue, recently during the overnight backup window the SBS server suffers an irrecoverable lockup right down to the host level (ie. I cant even get the host to reboot the VM). The only way to recover has been to shut down the other 3 VMs and then cut power to the host server. Obviously this is not good. Our support people have immediately pointed the blame at Backup Exec and turned it off, leaving us without an offsite backup. This has fixed the issue but is obviously not a long term solution.

There is twist to the story though, prior to a power outage the day after the migration all backups worked successfully without locking up. Additionally, prior to locking up one night the SBS server backup completed successfully. This indicates to me that there might be another underlying issue which is causing the problem.

What do people think? If anyone would like further information just let me know what you need!

@Dell_Technologies

5 Spice ups

Why are you running Backup Exec 2014? That is brand new and likely to have major issues.

1 Spice up

I know, I know, big no no but I’d been keeping tabs on its progress and pretty much all the feedback here on Spiceworks was quite positive. We’ve been plagued by very slow backups so I must confess that I gave in to the temptation of “100% faster backups” (not expecting it to be THAT good but anything would help).

We had 2014 running successfully for about 2 weeks prior to the server migration (and the day after) if that makes any difference.

Benjamin439,

Did you check system or application logs to see if there is any indication of the lockup? Have you looked at the Backup Exec logs to see how far into the backup it got? Of course I’d like to rule out that Backup Exec was the issue here - just need to take diagnosing this a step at a time.

You indicated the backup complete successfully prior to the lock up, or is that on a different day?

For testing backup to disk and see what happens. This would eliminate your unsupported tape backup. Is there a snapshot running at the same time.

I had a similar issue where I was backing up (With a home version backup software) to the cloud and then using Windows backup to a external device to capture the system state. Never do that again.

2 Spice ups

@BE_Elias: Unfortunately once the lockup occurs all logging ceases so when I investigated there was just a large chunk of time missing from the logs. The successful backup job was prior to a lockup (we have 5 jobs as part of our backup routine) and I ran a test job last night after a fresh install of Backup Exec without issue so I doubt it’s Backup Exec causing the problem.

@catch: There’s definitely no manual snapshot or other backup software running but I’m not sure how VMware Data Protection works (I didn’t install it). Either way it was running during the successful backup.

Thanks for the update Benjamin439, keep me posted if you encounter any other issues, happy to work with you. Glad to hear the test job ran successfully - let me know when you engage the “real” backup with results.

Back again! After a few successful test runs with one of our VMs I decided to go for broke and reinstate the full backup… which failed. Same lockup symptoms as before but I was able to establish a few more facts:

  • The first VM backed up is the same VM used for the tests and according to the logs this backup completes successfully.
  • Once the backup moves to the next VM the lockup occurs within 1-2 minutes.

To me this pretty much confirms that neither the tape drive or Backup Exec itself is to blame, it seems to be an issue with that particular Remote Agent/VM. Just to be sure I have followed Catch’s suggestion and changed from a Tape backup to a Disk backup, scheduled for tonight.

@BE_Elias: I note that the BEUtility program has a debug log capability, would it be worth activating this to gather more information on the exact point of the crash? If so could you advise which options I should select for the most benefit?

Thanks for the update. Sorry to hear about the lock up and I believe you are on the right track with trying it to disk to see what happens.

Lets wait for that result prior to enabling any debug logging. But yes it would be good to capture the lock up / crash that is that may be happening when backing up to tape, utilizing SGMON.EXE

http://www.symantec.com/business/support/index?page=content&id=TECH4986

Keep me posted.

Some small success last night, I did a backup to disk of the three VMs that didn’t appear to be causing a problem and it worked!

I now plan to put that job on hold and setup a new disk backup of just the two problem machines to rule out the tape backup as the cause once and for all!

Thanks for the update. I’ll continue to monitor this thread and offer assistance when needed.

Hmmm… the plot thickens. I did the backup of the two ‘problem’ VMs as I said and they worked. So now I’m at a bit of a loss, a single VM works to tape, all the VMs work to disk… where’s the problem?

@BE_Elias: Our support vendor has requested that I enable Backup Exec logging as I’m going to switch the three VM backup back to tape and see if that causes the issue to reappear. Obviously since this takes down the server we’d like to have everything ready to catch the problem on the first attempt.

Any other suggestions would be appreciated.

Thanks for the update. You will want to enable the debug logging to capture what is going on when you issue the backup to tape. Here is a technote on how to enable the right debugging for VM situation:

http://www.symantec.com/business/support/index?page=content&id=TECH63926

Had another lockup last night backing up two VMs and a file share to tape. From the looks of it the VMs completed successfully and the file share (a Synology NAS) failed.

@BE_Elias: I didn’t see your reply before I left work so I wasn’t able to follow the latest instructions however I did activate SGMON as per your earlier reply and have a log file from that. Its quite large (approx 1GB), how do I get it to you for further investigation?

Hi Benjamin,

Not a problem we can start with the SGMON log and go from there. Can you please email me your contact info (name, company name, phone, email. Also let me know what time zone you are in and what the best time to contact you would be). Please email this information to: elias@symantec.com.
I will get a courtesy support ticket open for you and have someone from our support department give you call. When you two connect, they will be able to give you the information you need to upload the log file so they can take a look and figure out what is going on.

Easy done, sent via email

1 Spice up

Got it, in work. :slight_smile:

Can you do a backup to disk, find the average finish time and schedule a tape backup of that an hour later? Yeah it’s not efficient but it’s something.

Still no go I’m afraid, the configuration change suggested caused lockups on two other VMs and eventually a purple-screen-of-death on the ESXi host. The tech is calling me back later today.

@Dennis: I’ve already setup a disk-to-disk fallback for tonight. The biggest issue isn’t so much the failed backup, its the fact that it causes a lockup of the DC so no one can do anything when they get to work.