I’ve run into a RAID issue that I haven’t been able to figure out and need some help debugging.

I’ve put together three Dell R630 servers with six Samsung 870 EVO 2TB SSD disks in a RAID 6 (four data disks), on a PERC H730 mini.

The problem I’ve run into is that the write performance is horrible.

When loading a larger file onto the server I’m at first getting ~300MB/sec in write performance (source bound) with <10% disk activity time, and response times <5ms, but after a while (one minute or less) the disk activity jumps to 100%, write goes down to 30-50MB/sec and response time goes up to 200-300ms - and the whole system becomes extremely sluggish and barely usable. (There been times when disk response time reaches over 1000ms.)

I have the same issue on all three servers, with the same configuration on them. Which I think rules out any particular disk being problematic. I’ve also checked the SMART values on them and find nothing indicating any errors or issues.

The virtual disk is configured with Read Policy: Adaptive Read Ahead, Write Policy: Write Back and Disk Cache Policy: Enabled.

I’ve tried to change Write Policy into Write Through, but it didn’t change anything (except write being a bit slower in general).

When I also disabled the Disk Write Policy, the issue kicked in instantly and the write performance barely ever went above 30MB/sec, with disk activity level being constantly at 100%.

I’ve ensured that the PERC has the most recent firmware (25.5.9.0001).

Currently I have no good ideas on what could be causing this, and would be very happy to get some guidance on narrowing it down.

6 Spice ups

Using consumer SSDs can give unreliable results in enterprise systems.

I never understand why people choose drives to save money on their systems, they lack enterprise features.

When using SSDs, the RAID controller should be configured for write-through and caching should be disabled.

The issue could be that your SSDs don’t have the same cache as enterprise drives, so can’t sustain the writes you expect, even in a RAID set.

What is the RAID stripe size and what is the servers intended use?

7 Spice ups

Perhaps I should have added a comment that I’m well aware those are consumer drives, but those servers are built on a very tight budget so I didn’t have much of a choice.

The stripe size is 64KB.

The servers will host virtual machines with databases that are read intensive but not write intensive. As such, the 30MB/sec write performance once the cache is filled won’t in itself be an issue during normal operations (the cache will cover most cases), but the whole system becoming extremely slow and sluggish as soon as a larger amount of data being written is indeed an issue.

2 Spice ups

My advice would be to reduce the CPU cores or the ram before you replace drives with consumer ones, or reduce the overall server count, until budget allows. Buy used if costs are tight, servers don’t have to be new.

As you add more VMs, the IO will be more random, only adding to the issue you face and without the drives having any sort of protection, in the event of an outage, you risk data corruption.

What are the CPUs in use?

I would consider RAID10 as the RAID level over R6 as it has a high penalty on writes, you do lose capacity, but you’ll get better speeds and no write penalty.

For large files, or where there will be a lot of large writes, a 128k block size may be better, if it’s limited large files, 64k may be fine.

5 Spice ups

Agreed with Rod’s take on the situation - I believe you are running into performance limitations of the SSD disks installed.
Here is a write-up comparing consumer and enterprise SSD’s to help you understand the difference and make an informed decision on keeping the current SSD’s or replacing them with the proper hardware for your workload.
Enterprise SSDs: Everything You Need to Know | Crucial.com

3 Spice ups

The key takeaway being what I mentioned at the start

Conversely, relying on consumer SSDs for data center and enterprise applications can lead to performance bottlenecks, reliability issues, and potential data loss.

4 Spice ups

No reply so far has approached the actual issue presented.

No SSD drive, consumer or enterprise grade, used alone or in RAID, should be limited to a write performance of 30MB/sec.

Still, that’s what I’m getting.

What speed would you be expecting ?
Does the server have iDRAC with verbose logs enabled ?

Why RAID 6 ?

Is the DB server running on bare metal or as a VM ? How are you getting reading of 30MBps ?

1 Spice up

I would strongly agree…

To a business point of view, these “savings” sometimes makes no sense at all ??

The business wants to make profit of like $1,000,000 a year… then why spend $20k a year on server hardware and $100K a year on software (assuming the average IT cost of 15% annually, not including staff) ? Worse is that that $20K on hardware may be so unstable that it may go down any moment ? Would the servers down time cost the business $$$ ?

Worse is we are talking about getting 3 Physical servers (R630) which EOL in 2019 ? Then thrown in 8TB of SSD storage ?
I just configured a Dell R450, Xeon 16 core, 128GB RAM, 3x 4TB RI SSDs, dual PSU and 84 months NDB onsite Pro-support…$15,998…thats before any Enterprise discounts if you go thru the Dell Enterprise reps ?
Its literally $2k a year per server ??

2 Spice ups

I would think is limitations to the RAID adapter and also the overall performance if they are hosts running a few VMs ?

Worse if copying data from one VM to another on same host then the SSDs are actually reading off the same RAID array then writing to another VM on same array ? Else could be bottleneck on NICs etc ?

300MBps dropping to 30MBps can be normal in the above scenario taking that there are no bottlenecks with the RAID adapter and consumer SSDs.

1 Spice up

Do believe you have been presented with the actual issue by @Rod-IT , @bob2213 and @adrian_ych
Consumer grade, which you are aware of

1 Spice up

Spending more money will fix most issues quickly and easily, and he’s aware of that. Repeating it isn’t helpful.

Going back to the original question - Have you tried a single drive to see if you can replicate results? e.g. will one drive fall off to the 30MBps after a sustained write at roughly the same amount of transfer? Does it behave the same when the OS running is different (boot a live iso of something)

If so, especially in different hardware, then, well, the drive is the limit and you’re hosed, right?

2 Spice ups

Being dismissive of what has been given in terms of advice is hardly helping matters either.

2 Spice ups

Actually, write performance can degrade if the cache in the SSD can’t keep up, this is especially true the older and fuller the drive gets.

Since you don’t have enterprise drives, you can’t compare this.

As far as fixes, did you do as I asked and disable the RAID cache and enable write-through, did it help? You said you tried write-though but you didn’t disable the RAID cache when you did this.

So if the answer is given, we should dismiss it because it doesn’t fit the answer the poster wants?

2 Spice ups

Thanks @furicle.

Solving a problem by throwing money at it is easy if you have that money available, but if you don’t you don’t.

The drives are entirely new and I would expect a sustained write performance of at least 300MB/sec.

That expectation being realistic is confirmed here and here.

After further debugging and testing of the drives, it seems I have received a batch of non-authentic Samsung drives. They seem to be some pretty well made replicas, and if it wasn’t for the poor sustained write performance I would not have been able to tell them apart.

The packaging and casing is deceptive. They do have the claimed storage space. They don’t seem to have any bad blocks. Read performance is good. As long as you don’t fill the cache, the write performance is also good.

They properly identify themselves as the correct Samsung drives. However, the firmware on the drives is not authentic! (Wanted to make that a link to the authentic identifiers, but the forum only allowed me to put two links in the post.)

I bought them from an online store and paid regular prices for them. So had no reason to suspect anything. The question is if the store even know themselves what they are selling. I have now reached out to them about it.

A regular user might never encounter any issues with those drives. They seem to be that well made.

Seems like a weird thing to encounter, but then again counterfiet goods aren’t terribly rare…

Also, as an aside here, those drives ONLY have high write speeds when you use samsung drivers, not the stock windows drivers…

(Guessing the samsung drivers are properly handling block erase interleave geometry for the drives, and generic drivers do not because they don’t understand the geometry)

2 Spice ups

I’ve never seen this before, not with Samsung and reliable stores. Is the place you bought them a known good site, have you used them before?

1 Spice up

From a year ago, but you know all these drives went…somewhere

1 Spice up

Sorry…i meant where did you get the readings of 300MBps or 30MBps ?

Can you verify with the Samsung site using the S/N ? Usually replica would have same S/N for the batch of SSDs ?

$$$ Not availabile is not the same as not willing to spend ? Or even penny wise pound foolish ? Coz you are talking about 3 servers with 12TB SSDs each ??

1 Spice up

RAID:
“redundant array of independent disks”
Not
“redundant array of inexpensive disks”

Just throwing that in…

2 Spice ups