I have experience working as an equity research associate.
Question 1- What is the best storage, or combination of storage systems, to use for AI machine learning? I've been asking around and have been hearing that using Pure Storage and NetApp as a hybrid provides the best combination from a performance, scalability, and high availability standpoint with Pure Storage providing the first while NetApp provides the other two. Do you agree? And if not, what do you recommend?
Question 2- I'm not a Pure Storage user but have been hearing that performance becomes subpar after two years of usage and storage increasing to 50%+ of capacity. From your experience, can you attest to this claim?
Thanks! I appreciate the help.
Your questions lack context. What is the best storage for AI machine learning?
For what purpose are you utilizing AI machine learning? As a tool to better place data? As a tool to build algorithms for an application?
AI machine learning in storage is primarily a tool for the storage system to:
- Predict drive failures or issues
- Data placement based on history (machine learning)
- Troubleshooting
Based on field history, HPE Nimble has been doing the best job at this.
"I’m not a Pure Storage user but have been hearing performance becomes subpar after two years of usage and storage increasing to 50% + of capacity. From your experience can you attest to this claim?"
Again, your question lacks context. How are you measuring performance, latency, IOPS, throughput, number of concurrent applications pounding on the storage? Like any storage system, it depends.
High write applications will wear out the flash drives faster. Is this a failure issue? Are users noticing a decline in performance?
What policies were set in the storage system? How is performance prioritized? Is deduplication turned on?
When on, it slows performance especially as the amount of data increases.
As you can see, answers to broad questions are difficult to be had. To get useful answers, you must qualify your questions with a lot more detail.
I find that Pure provides all three, performance, scalability, and HA. Our Pure arrays do not show any performance reduction at 50%+ capacity or after 2 years of usage.
-First, Pure Storage is not hybrid, so both of them cannot be compared because it has a different strategy and technology.
I would choose Pure Storage because it has different levels with other brands.
-Pure Storage business model is Evergreen, it makes your storage lifetime last longer than other brands, with the compression and inline dedupe that always from the first time the data has been read, we get an experience with the optimal data reduction.
-Based on my experience in using storage, Pure Storage has really different ways of using it. It's more simple than the other options and I have been proving that there is no downtime in upgrading software and hardware of Pure Storage.
Question 1: Pure Storage would without a doubt be my go-to for machine learning and AI initiatives. Flash Array X and Flashblade provide
scalability, performance and high availability 1000%
Question 2: This is 1000% untrue. The performance is not related to the capacity until you breach system space. You would need to eclipse 80-85%
capacity before system actions would begin impacting performance. I have customers using flash array m's that are 6 years old and still scream.
Based on experience with PureStorage, EMC, IBM, and HPE. working in Financial Services, VMware Hypervisors, Microsoft and Open Source and data management.
1) PureStorage. Price, Simplicity, easy administration
2) Fake News. We have PureStorage:
// X10, 56% Capacity (Data Reduction 5.1 to 1) VMware
// X20, 83% Capacity (Data Reduction 7.2 to 1) Oracle VM / AIX VIOS
// X50, 72% Capacity (Data Reduction 4.4 to 1) Hyper-V / VMware
Performance has not decreased, capacity has not increased
Question 1 : Its not only the storage make which is important but more are the type of disks being used. Now, most companies,
because users are so unpredictable go for two tier or three tier storage. Automatic Tiering distributes the loads where the
need is.
Now, specifically for AI, I do not see the need for specifics unless you are some advanced research center. Basically, AI
includes a DB and can be supported by Tier 2!
Question 2: This is strange and the easiest one to blame is a device which cannot fight back!!
It does not depend on the storage itself but also on storage management from the admin side.
Also, depends on the thresholds setup configs when setting up.
Wondering, who said Pure is hybrid?