What is our primary use case?
- Research data
- Departmental file shares
- Data centre storage: NFS
We have two data centres in our university. We have Cisco UCS, Pure Storage, and are heavily virtualised with VMware. PowerScale is our unstructured data storage platform. It provides scaled-out storage and our high-level NFS across applications. It also provides all the storage for our researchers and business areas, as well as students, on the network.
With the exception of block workloads, which is primarily VMware, Oracle Databases, etc., everything else it is on PowerScale. It definitely has allowed us to consolidate the ease of management.
How has it helped my organization?
With the quotas having fewer large pools of storage in the data centres, we typically only have one or two Isilon clusters. That gives us the ability to multi-tenant or allocate data to different applications and isolate workloads. It is very efficient when managing that volume of storage. We are not tuning it every day or week. The only time that we are really doing anything with it is if we're planning an upgrade of some sort several times a year. Outside of that, it just does what we want it to do.
We automate the vast majority of the things that we do on the Isilon clusters: provisioning of storage, allocation of storage, management of quotas wrapped into tens of thousands of students, and managing permissions. That's the level of support they have for their built-in API's, which is probably a huge game changer for us in the way that we manage the storage. It makes it far more efficient inside of PowerScale.
Compared to doing it manually, what we have been able to automate using the API is saving us at least tens of hours a month versus when we used to get service requests. We have even been able to delegate out to different areas. If we have an area with whom we do file shares, we delegate out the ability for them to create new shares and manage their permissions themselves.
The solution allows us to manage storage without managing RAID groups or migrating volumes between controllers. We see this in the big refresh that we did earlier in the year. After you have clicked the "Join" button and joined, you go to the old node and click remove, then wait for it to finish. You don't have to configure anything when you add new node types, they are automatically configured. You can tune them and override things if you want, but there is no configuration required.
PowerScale has enabled us to maximise the business value of our data and gain new insights from it. It gives us the ability to have our data stored and presented via whatever protocol is required. Now, we can look at all these different protocols without having to move or duplicate the data.
The solution allows you to focus on data management, rather than storage management, so you can get the most out of your data. We looked at the types of data that we have on the cluster, then we just target it based on the requirements. We don't have to worry about building up different capabilities, arrays, RAID types, etc. We just have the nodes, and through simple policy, can manage it as data rather than managing it as different RAID pools and capacity levels. If someone needs some data storage, then we ask what their requirements are and we just target based on that. Therefore, we manage it as a workload rather than a disk type.
What is most valuable?
Their SmartQuotas feature is probably the thing that we use most heavily and consistently. Because it is a scaled-out NAS product, you end up with clusters of multiple petabytes. This allows you to have quotas for people and present smaller chunks of storage to different users and applications, managing oversubscription very easily.
We use the policy-based file placement, so we have multiple pools of storage. We use the cold space file placement to place, e.g., less-frequently accessed or replicated data onto archive nodes and more high-performance research data onto our high-performance nodes. It is very easy to use and very straightforward.
The node pools give us the ability to non-disruptively replace the whole cluster. With our most recent Gen6 upgrade, we moved from the Gen5 nodes to the Gen6 nodes. In January this year, we ended up doing a full replacement of every component in the system. That included storage nodes, switching, etc., which we were able to replace non-disruptively and without any outages to our end users or applications.
We use the InsightIQ product, which they are now deprecating and moving into CloudIQ. The InsightIQ product has been very good. You can break down the cost performance right down to protocol latency by workstation. When we infrequently do have issues, we use it to track down those issues. It also has a very good file system reporting.
For maximising storage utilisation, it is very good. As you add more nodes in a cluster, you typically get more effective utilisation. It is incredibly flexible in that you can select different protection levels for different files, not necessarily for file systems or blocks of storage, but actually on a per file basis. Occasionally, if we have some data that is not important, we might need to use a lower protection. For other data that is important, we can increase that. However, we have been very happy with the utilisation.
Dell EMC keeps adding more features to the solution’s OneFS operating system. In terms of group work, we have used it for about 13 years. The core feature set rollup has largely stayed the same over that time. It has been greatly improved over that time as well. So, it has always been that storage NFS sandbox, and they've broadened their scope for NFS v4, SMB3 Multi-channel, etc. They are always bringing up newer protocols, such as S3. Typically, those new features, such as S3, don't require new licensing. They are just included, which is nice.
Over the years, the improvements to existing protocols have been important to us. When we first started using it, they were running open source sandbox for their SMB implementation under the covers and they used a built-in NFS server in a free VSD. Whereas, with the new implementations that they introduced for OneFS 7 have had huge increases in performance and been very good, though there's not necessarily any new features. We even use HDFS on the Isilons as well at the moment. The continued improvement has been really beneficial.
It is incredibly easy to use the solution for deploying and managing storage at the petabyte scale. With CIFS and IBM Spectrum Scale, there just isn't the horizontal concern. I couldn't think of an easier way to deploy Petabyte NAS storage than using Dell EMC PowerScale.
What needs improvement?
The replication could lend itself to some improvement around encryption in transit and managing the racing of large volumes of data. The process of file over and file back can be tedious. Hopefully, you never end up going into a DR. If you do go into a DR, you know the data is there on the remote site. However, in terms of the process of setting up the replicates and filing them back, that is just very tedious and could definitely do with some improvement.
There is a lack of object support, which they have only just rectified.
For how long have I used the solution?
What do I think about the stability of the solution?
The stability has been exceptional. I've been very happy with the stability of it. In the last six years, we have pretty much been disruption free. Prior to that, we have had one or two issues, which we worked with their support to fix.
We had a major refresh at the start of the year when we replaced one petabyte at one site and a half a petabyte at another site. This completely replaced everything and took us about a month. It was finished with one staff member overseeing the process, moving the data and roping in one or two other staff at different times to help with the physical backing.
They are quite heavy, so you always want to have two or three people involved. It has very minimal staff management required. For example, once the hardware is racked, it needs just one operator who joins the nodes, waiting for the data to move over. Internally, this is non-disruptive to the user.
Firing up the old nodes, that is more of a management thing.
What do I think about the scalability of the solution?
Pretty much everyone touches the solution in some way or another. It has been a bit different right now with COVID-19, since a lot of people have been recently working remotely. In any given day, probably 12,000 people have been using it. That is just going by the number of active connections that we have from staff, students, and researchers at any time.
We can't see anyway that we would ever reach the limits of the product in terms of scalability and our workloads. We have no concerns around scalability.
It has a back-end network that it's managing to get switches with enough ports to plug the nodes in, if you want to go big. That is the most complicated part, not the actual management of storage. As you add more nodes, that management overhead remains largely the same.
For larger scalability, I would be very comfortable with it. We would just have to do some good site planning to ensure that we have enough room for it.
Our usage is pretty extensive. It touches on almost every area of our organization. With the introduction object and support for Red Hat OpenShift, which they're releasing in OneFS 9.0, we are very keen to explore and extend the usage in those areas. That is part of the reason why we are upgrading our test cluster on OneFS 9.0 to specifically evaluate use with Red Hat OpenShift and Kubernetes in clouds. It definitely has a very strong place now in the data centre, and we don't see it going away anytime soon, as we see more workloads going onto it.
How are customer service and technical support?
The support has been mixed. If you get through to the right engineers, you can get problems resolved incredibly quickly. If you don't, you can go around in circles for a long time. We do typically have to escalate support tickets through account managers to get them positioned correctly. However, once that happens, issues are resolved pretty quickly and we're generally happy.
The technical support is average. There are certainly not the best that we have ever dealt with, but far from the worst ones. I would not recommend the product based on their tech support alone.
Which solution did I use previously and why did I switch?
Going back 13 years prior, we used to have a lot of Microsoft and Linux-based file servers all over the place. They were all siloed with a lot of wasted capacity. Consolidating all those down into a small handful of Isilon clusters has dramatically reduced the amount of silos that we have in the organization. In terms of reducing waste from having storage stuck in one silo or isolated area, it has made a huge improvement.
We have previously used IBM Spectrum, and I don't think you can buy anymore. Briefly, eight years ago, we moved a large portion of the workload off Isilon onto Spectrum. That was the biggest regret that I have had in my career. We couldn't get back on the Isilon fast enough. It was a commercial decision to move away from Isilon, which wasn't the cheapest. However, it was far more mature than the IBM product. Spectrum cost us so much that what we saved in capital expenditure we then lost in productivity, overhead, and maintenance. It was just a disaster. The support that we received from IBM was the worst support I have ever received. I've been in this industry and job for about 17 years now, and I have never had a worst support experience that I've had from IBM. It was a nightmare.
When we needed to get the issue with Spectrum fixed, there was no doubt about getting PowerScale. We couldn't get back on PowerScale fast enough. We just made that happen, and as soon as we did, all the fires were put out.
About 13 years ago, we were using six terabyte nodes back. Now, they're obviously a lot bigger than that. While scalability was definitely a key interest, the main driver for us was the ease of management to sort of consolidate all the separate file servers with their own operating systems and RAID arrays, and consolidating them into one pool of storage where we could allocate quotas and still manage capacity effectively, but centralize it and reduce waste. The ability to scale out was just icing on the cake, and definitely something we were very interested in. It's something we've utilised quite heavily over time, but the ease of management was the main driver.
How was the initial setup?
The initial setup has always been straightforward. The process of creating a new cluster is largely the same now as it was 13 years ago. You get your first node, then connect the serial port to it. You answer about 10 questions, then you're ready to go. The rest of the nodes are added by clicking a button. It's incredibly easy to set up, and it says a lot that the process has been the same for about 13 years. There's not really much to improve or simplify, because it is already incredibly simple.
Assuming the hardware was racked, you could have the cluster setup and your minimum three nodes joined within half an hour to 45 minutes.
The process of adding a node is very straightforward: It is pressing a button. This can take five minutes, then the process is complete. Once you have added new nodes, you can then remove old nodes.
Understand your workload. Make sure you size and cost it correctly for the amount of metadata you expect to see on it. Don't undersize your SSD.
For the whole replacement this year, I got one of our junior staff members, who had have never actually used our PowerScale, to do the whole upgrade process. I just pointed him in the right direction. Because it was very easy, he managed to do it without any issues.
What about the implementation team?
We don't use any professional services. We always do it in-house.
Two people are needed for racking hardware. Only one person is needed to deploy it, as that process is very straightforward.
What was our ROI?
The solution has simplified management by consolidating our workloads. Rather than managing all the different workloads on different storage arrays, Windows Servers, etc., we just have one place per data centre where we manage all their unstructured data, saving us time.
PowerScale has reduced the number of admins that we need. It has allowed our admins to focus on adding value through automating tasks and streamlining operations for our customers, rather than focusing on the day-to-day and tuning RAID profiles. We can use our APIs to automate workflows for customers and have quicker turnaround times.
What's my experience with pricing, setup cost, and licensing?
The solution is expensive; it is not the cheapest solution out there. If you look at it from a total cost of ownership perspective, then it is a very compelling solution. However, if you're looking at just dollar per terabyte and not looking at the big picture, then you could be distracted by the price. It is not an amazing price, but it's pretty good. It is also very good when you consider the total cost of ownership and ease of management.
We added on a deduplication license. That is the only thing that we have added. That was a decision where it was cheaper for us to license the deduplication than it was to buy more storage, so we went with that approach. We just did an analysis and found this was the case.
We haven't really hit a workload or situation that we have had any issues catering for. Certainly with the huge number of different node types now, we could position any sort of performance from very cheap, deep archive through to high performance, random workloads. I feel like we could respond very quickly to any business requirement that came up assuming they had budget. Even if we didn't have budget, largely with the way our clusters are configured, we typically mix in high and low performance. We won't buy top of the line, high performance, but we will buy basic H500 nodes, which are a large amount of self-spinning disks. That is what we standardize for our high performance tier.
Which other solutions did I evaluate?
13 years ago, it was called Isilon Systems. They were a start up in Seattle, while we are in Australia. We were importing the hardware directly. At that time, there was nothing really else that we were looking at. We were just caught up in revolutionising the way we would be managing one pool storage. Then, six to eight year ago, when we had that little stint on IBM Spectrum, we didn't go to market. We very heavily evaluated the IBM product and NetApp in cluster mode as an alternative. We did rule out NetApp from a management perspective as far too difficult to manage. The Spectrum product that we saw on paper and from our evaluation of loaned hardware seemed like it was going to be on par with Isilon. Little did we know the nightmare that would ensue from that.
The biggest lesson that we learned was from moving away from it onto the IBM product. The maturity of a product is very directly correlated to the amount of time you spend managing it, as it is a very mature product. We have been using it for 13 years, and the core has a very solid, mature foundation that has been built over that time.
We have dealt with Nimble Storage in the past. I would recommend Nimble Storage based on their support (at that time), as they had exceptional support. However, Dell EMC support is no worse than Cisco or any of the other vendors that we have had to deal with, but it is nothing special.
What other advice do I have?
Just don't underestimate how important a mature product is compared to something leading edge or new.
PowerScale's positioned primarily to receive the call within that data centre. We have PowerScale heavily centralized, both in our IT department and on our campuses. We don't really have any storage from PowerScale in the cloud or our edge because we have very good network connectivity. In terms of the right tiers of storage, the level of flexibility that we have for adding different types of storage with different characteristics to our existing cluster now is the best it's ever been in the 13 years that we've managed it.
Between CloudIQ and DataIQ, they're replacing their legacy InsightIQ product. We haven't moved to CloudIQ yet to start looking at it.
Early on, since we have been using the solution for 13 years, if you added a new node type, then you would have to add three physical nodes to start a new pool and only end up with 66 percent utilisation on that storage pool. Whereas, in the Gen6 hardware, you can have more smaller nodes in one rackmount chassis. Now, you can add a new storage type and gain much better storage efficiency off the bat.
The S3 protocol specifically comes in OneFS 9.0. We have a test cluster for it, which we are in the process of upgrading to have a look at their S3 support. However, I haven't used it yet. Typically, we use something like MinIO, which is an open source object gateway, and put that in front of the PowerScale cluster.
On the archive side, we still have the A200 nodes. While you can go with the A2000s or go deeper than that, we can manage pretty much anything thrown our way by not going too extreme in our pools by positioning data effectively. I think it's very good.
I would rate the solution as a nine out of 10.
Which deployment model are you using for this solution?
On-premises
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.