The IBM FlashSystem V9000 compression is a lot higher than EMC XtremIO and Hitachi.
IBM claims to be around 4.5:1, whilst EMC and Hitachi guarantee around 2:1 for Oracle workloads on AIX.
The IBM guarantee is for 90 days from implementation whilst the EMC guarantee is there for the duration of the maintenance.
How true is this IBM compression figure for general Oracle on AIX?
To IT Central Station and prospective all-flash (AFA) buyers:
Thank you for reaching out on this topic, these vendor’s claims are confusing to many non-initiated buyers. On face value, it appears that some technologies could perform much better based on their messaging. I posted a blog on this topic here: community.hds.com
Note that the basis for vendor’s data reduction claim do vary greatly as some vendors choose to include benefits from thin provisioning and snapshots in their factoids (aka: alternative facts). Keep in mind that any “up to” reduction value message is just that; a value achieved in lab or unique workload and not representative of average.
In this case, both EMC and Hitachi choose to represent average compression results from prior deployment; no deduplication value is included. IBM provided best case as this V9000 model does not support deduplication; consequently, it creates the impression that this V9000 system delivers superior result with deduplication supported!
The variation in data reduction (compression and/or deduplication) result is mainly a function of the data set, not the vendor’s technology as engineers are limited to similar latency overhead. Here is a sample of typical Hitachi Storage Virtualization Operating System (SVOS) compression and deduplication results by workload that Gartner validated:
[cid:image002.png@01D27726.4E691DA0]
Hitachi benchmarked internally both IBM and EMC AFAs on eight different workloads and data reduction results came in within 5% of each other’s. Data reduction services performance will also be a function of the data chunk size; as an example, a 4KB chunk size engine can achieve about 5-10% better results versus an 8KB engine. Note that this extra savings will come at the performance and memory cost.
So how do you proceed as a prospective buyer to assess the value of each technology? I recommend using vendor’s data reduction estimator tool on your own data set and make your own conclusion. Hitachi estimator tool can be downloaded here: hcpanywhere.hds.com ; data reduction estimators are built-in with the similar algorithm as their AFA counterpart.
As for performance impact, I would recommend reading this blog: community.hds.com . In the real world, you can’t have your cake and eat too…
Full disclosure, I do work for Hitachi Data systems and support Hitachi enterprise flash business.
I hope this information helps.
Patrick Allaire
IBM V9000 it is just bundled SVC + V900 flash. We use SVC Real Time Compression for several years now mainly with Oracle databases on AIX and Solaris. We do not use V9000 because SVC is more stable and flexible solution.
We have purchased HDS G600 lately so I am going to have a direct comparison between SVC RTC vs G600 within two months.
You may find this helpful:
IBM uses improved compression algorithm proposed by Lempel and Ziv LZ78 while HDS uses derivative of the LZ77. Since the compression algorithm is very similar in both cases you should expect pretty similar compression ratio for the same kind of data.
IBM Real Time Compression is very handy and we use it widely for warehouse databases with ratios around 3:1. It is very flexible and reliable solution but I would never put OLTP systems on it.
The main reason is that IBM RTC adds quite significant delay (1-2ms at last depending on actual load and the data itself) and it is a performance bottleneck for IBM nodes. In case of SVC with DH8 nodes the difference was 50k IO/s with RTC vs 250k without but since IBM uses dedicated hardware for compression the RTC traffic do not influence the rest of the traffic so you can have 50k RTC and 200k non-RTC traffic simultaneously. V9000 surely will act similar check it during the POC.
IBM RTC is "per lun" approach and has a limitation of max 512 volumes per SVC iogroup. You should check how it is in case of V9000.
HDS G-series has compression on FMD DC2 modules only while IBM compression is "disk independent"
HDS compression is "global" and you cannot disable it.
HDS compression is always on so is the compression influence on G-series performance. The good news is that HDS FMD DC2 compression is built in flash modules and unlike IBM RTC introduces very small delay. Hitachi G-series claim to be able perform with hundreds of thousands of IO/s with 0,2-0,5ms response time. That is what we are going to test the following two months.
Good luck with your POC.
Vinesh,
I held off on responding to this question initially, but wanted weigh-in seeing some comments that have come up.
We do not have an Oracle Workload so I cannot compare apples to apples, we are a pure SQL shop for our ERP and other database workloads. So, I will refrain from posting a reply to this specific question on Oracle, but respond more to the V9000 aspect you have stated.
Before I continue, I like to approach the question with the true end goal if you are able to discuss it. What is more important, data reduction, latency, burst IO, or a balance of all? There was no mention of performance in your question, but comparing a V9000 to an XTremeIO makes me ask how you are comparing to different purposed arrays. The V9000 is an all-purpose AFA with limited data reduction features, the XTremeIO is a niche product designed as an extremely low latency product, but has many limitations outside of smaller datasets which do not dedupe well.
We were in the middle of a SAN Refresh and going through the motions of what was the right fit for our specific workload on an AFA; IBM approached us initially with a V7000 as their recommendation, EMC with the XTremeIO. After the initial environmental questions IBM went on their merry way, they were then promptly back pitching us a V9000 after speaking to some of their more seasoned engineers (They claimed).
IBM’s claim on the V9000 compression is quite boastful, but unrealistic if you have a disparate dataset, many have already spoken on this. The V9000 was pitched to us to replace an aging EMC VNX. EMC pitched the XTremeIO, NEITHER of them was the chosen successor because of the issues with scalability (XTremeIO) or performance based off poor sizing recommendations from their SE’s (V9000).
The V9000 is heavily reliant on very appropriate spec’ing and there is no true way of answering the generic question below without understanding your dataset. You have to get a signed off guarantee after they run the “Comprestimator” against your workload (which you mentioned they did). We also checked with them for SQL for our ERP and our VM workload as well. And as mentioned, they cover you only for the first 90 days, then you are on your own. They promise sight unseen of the same 2:1 as with competitors mentioned in other posts. But here are the issues that came up long before they even could offer a PoC in our datacenter.
• They miscalculated our entire workload, multiple times
o They asked for forgiveness and promised to get it straight, they still got it wrong and they never left the runway.
o They had the appropriate data reduction, but not performance, no dedupe and only compression on a V9000 promising they would have it soon, but at the cost of waiting for the next gen to mitigate the latency impact to our workload, it showed an expectation of 2.5 – 3 ms service times for our ETL job and some other tasks that run, while the sales team kept promising <1ms. This would have really poorly impacted our ERP transacting users and was only caught by my company’s due diligence, they were hoping it would work for 90 days and then be out of the guarantee. This was the immediate deal breaker, BUT THEY GUARANTEED their data efficiency, be cautious for your enterprise’s sake.
o Reliability has come up as a contentious point in many different discussions with my peers in the industry locally, there have been a few recorded large outages and now there is an issue of brand damage locally.
Otherwise, their generalizations are just that. EMC and Hitachi play the other game of assume the worst, and anything better is gravy, but seem to keep the aspect of performance in mind.
As to the EMC (Now Dell) XTremeIO, be VERY wary of that solution.
Capacity:
• For datasets under the size of an EMC Half or Full Brick, they will give amazing low latency, but nothing else if your data is not working with the XTremeIO’s requirements.
o Upgrades are super expensive and come in half-brick and full brick sizes.
Scalability:
• EMC has since been pushing alternates since the Dell acquisition and we even had people pushing the Unity AFA since it’s more scalable and a Dell rebranded VNX (if you take your time to look under the hood); you cannot scale to larger drives with the XTremeIO and EMC has been exceptionally tight lipped on this.
• Scale out is your only option, no scale up as a result
• Attempts to change the architecture to newer NAND capacities to address scalability have issues.
o Some data destructive code upgrades and they still cannot change the XtremeIO to larger capacity drives last time I checked
• Take into account 8 gig fibre channel and no TCL larger flash drive support due to the above.
ROI and TCO:
• This is a very pointed niche device for small, specific datasets and will offer extremely low latency and high performance at the cost of zero scale-up and extremely expensive scale-out.
• TCO is extremely high in cost per TB as a result and even the ROI is quite hard to justify unless you have the appropriate dataset to fit onto it needing that latency.
I have no experience with Toshiba to speak on their behalf and am neutral on most items regarding this.
So, without understanding the more intimate use of your work set, which is more important, data efficiency, overall capacity, or performance, this is the best I can answer. But I can tell you for a less burdening SQL solution compared to most Oracle loads, we still chose a different vendor; neither IBM or EMC based off the information gathered in the industry and candid talks with their SEs during our selection process.
If you aren’t as worried about Total Cost of Ownership or Return on Investment, or just need killer speed, there are many other solutions out there for a cost. My enterprise is exceptionally TCO focused and while latency is quite sensitive to our end users, cost per TB is extremely important, so I understand where the data efficiency question comes in. Without dedupe on a V9000 it really destroyed the cost as it required many more TB of physical disk compared to an XTremeIO for us, but our entire load didn’t fit into a full brick and the scale out made it cost prohibitive and completely unjustified for what we would receive.
I hope my two cents helped, if you have any further questions to my specific experience including our choice solution, do not hesitate to ask.
I do not work for any storage vendor and have nothing to gain by giving honest experiences of any array.
Jason Melmoth
Infrastructure Architect
Worst storage I have ever tested in 15 years.
Very slow with compression enabled (300.000 IOPS without compression, 150.000 IOPS with compression).
Faster with write cache disabled (!!!???)
We had also a crash and a downtime of 8 hours!
Be careful with v9000.... Do not buy it without a POC.
Hi,
We have run the comprestimator tool and this is how IBM have come forward with the guarantee. The POC is in the process of being setup up.
This is the first time I have posted on this site and I am truly grateful to all of the feedback. If there is any more wrt V9000, please feel free to share all of your experiences, likes, dislikes gotchas etc.
Thank you.
I agree with Martin that every database is going to give you a different result so I would recommend using the comprestimator tool to check what you can expect. Just done a sizing exercise for a customer using this tool and it is very accurate.Not a AIX environment though but 3 x SQL databases where we are getting 4:1.Only ever seen 4.5 : 1 on a PURE m20 array with Oracle on Windows.
www-304.ibm.com
IBM doesn´t support deduplication in V9000 yet. It´s in roadmap for this 2017. As Yannis said, you can run Comprestimator (free tool) in servers attached to LUNs and check the compression/Thin Provisioning ratio achieved. Ratios achieved using this tool are like a contract, IBM guarantee this ratios in a SVC or V9000 real deployment.
Only thing I've heard of the V9000 doing well is latency (microsecond) vs XtremeIO's "sub-millisecond" latency. IBM's previous compression is based on RISC cards and compression alone was about 2.3-3:1 for my RAC.
Nimble's compression + dedupe works very well (we moved off of an older IBM 840) for us at 2.4:1. Features alone (0 copy clones, snapshots, etc) increased our efficiency to give up some latency. info.nimblestorage.com If you dedupe your RAC, you really need to buy multiple Nimble arrays to maintain fault zones (if that was the goal of your RAC).
I totally agree that it is difficult to estimate the reduction level. And i totally agree that with Mr. Martin Pouillaude that several factors can can influence the reduction level. What i would recommend is that either an IBM Business Partner or an IBM Engineer to run a tool/utility that IBM has and is called Comprestimator. This tool/utility will provide a more accurate prediction of the reduction level. This tool is not installing something on production Oracle. It just analyzes the filesystem and uses the same algorithms that are used when compression is running on the hardware (V9000). The numbers that the customer will get from Comprestimator are the values that IBM can commit in written as well.
It’s always hard to predict precisely the data reduction level that can be obtained on Oracle databases, one key factor is to understand if the database is already compressed or even encrypted, if this the case the 4.5:1 ratio would be extremely hard to achieve.
IBM’s documentation state a reduction factor of 4:1 for database :
www.ibm.com
A 4.5:1 number seems a bit on the optimistic range. Generally I would recommend to do a POC on the production data if possible to understand the accurate number, and of course not count Thin Provisioning as data reduction.