Hey folks, I’m at my wits end. I’ve been screwing with proxmox for years now, but I’m at a tipping point. I’ve just used consumer SSDs in it to run my VMs off of - but I just realized after a dozen or so crashes over the last week that I think the SSDs are the culprit. (Really, really terrible write speeds leading to kernel crashes I believe).
I’ve never gotten an enterprise SSD, if that’s even what I need. Any recommendations? New? Used? Brands?
Appreciate it
Really. Anything branded from Samsung or Crucial(Micron) is going to be fine. They are the top producers of NAND, produce high quality products, and stand behind warranties. But you are gonna pay out the nose for the privilege of enterprise grade hardware.
You might just be buying lower quality consumer SSD’s though, since even they should be able to handle a surprising amount of abuse.
How do you know you’re getting higher quality? When you’re looking at them they all seem the same
I recently upgraded three of my proxmox hosts with SSDs to make use of ceph. While researching I faced the same question - everyone said you need an enterprise SSD, or ceph would eat it alive. The feature that apparently matters the most in my case is Power Loss Protection (PLP). It’s not even primarily needed to protect from an possible outage, but it forces sync writes instead of relying on a cache for performance.
There are some SSDs marketed for usage in data centers, these are generally enterprisey. Often they are classified for “Mixed Use” (read and write) or “Read Intensive”. Other interesting metrics are the Drive Writes Per Day (DWPD) and obviously TBW and IOPS.
At the end I went with used Samsung PM883.
But before you fall into this rabbit hole, you might check if you really need an enterprise SSD. If all you’re doing is running a few vms in a homelab, I would expect consumer SSDs to work just fine.
Well I have the exact same use case and I just checked and yup, 3 out of 4 drives failed in a year. Those were shitty WD blues though, so I think it’s time to shell out real money
To expand on @doeknius_gloek’s comment, those categories usually directly correlate to a range of DWPD (endurance) figures. I’m most familiar with buying servers from Dell, but other brands are pretty similar.
Usually, the split is something like this:
(Consumer SSDs frequently have endurances only in the 0.1 - 0.3 DWPD range for comparison, and I’ve seen as low as 0.05)
You’ll also find these tiers roughly line up with the SSDs that expose different capacities while having the same amount of flash inside; where a consumer drive would be 512GB, an enterprise RI would be 480GB, and a MU/WI only 400GB. Similarly 1TB/960GB/800GB, 2TB/1.92TB/1.6TB, etc.
If you only get a TBW figure, just divide by the capacity and the length of the warranty. For instance a 1.92TB 1DWPD with 5y warranty might list 3.5PBW.
Got it. So I’m thinking my ZFS is what killed these poor drives, who didn’t sign up for that sort of life. I think short term I’ll run over to best buy and get a decent 1 or 2 TB drive to migrate things to just to keep it running (and not use ZFS). From what I’m reading on other forums - yeah ZFS was the killer here.
Long term, maybe enterprise drives, or really deciding if my app server even needs a pool. I did that last time as a “I don’t want to run out of storage for a while” but I’m seeing 4TB drives now for a few hundred bucks. Not cheap, but much cheaper than the 2k they were just a few years ago. I don’t store anything on the app servers, just containers and vms.
Read the data sheets.
You’re mostly going to be concerned with IOPS and endurance for VM hosting.
Endurance rating in TBW or PBW is the main indicator I look for. That and a DRAM cache.