PowerScale (formerly Isilon) is effectively a giant NAS. We have two clusters, one for production workloads and one for Disaster Recovery and Business Continuity purposes. These clusters are installed in separate data-centers, physically located in two different places in the country. Both clusters were deployed at the same time when we first adopted the solution, and we have been growing them at an almost equal rate ever since.
Our production cluster is attached to our High-Performance Computing (HPC) environment, and this was the primary use case in the beginning: to provide scale-out storage for the Bioinformatics team, who do omics analysis on plant and seafood organisms that we do scientific research on. As time went on, we expanded our use of the platform for other user groups in the organization.
Eventually, PowerScale became the de-facto solution for anything related to unstructured data or file-based storage. Today, we also use the platform to host users’ home directories, large media files, and really any kind of data that doesn't really fit anywhere else, such as in a SharePoint library or a structured database. Nowadays, almost everyone in the organisation is a direct or indirect user of the platform. The bulk of the storage, however, continues to be consumed by our HPC environment, and Bioinformaticians continue to be our largest users. But we also have data scientists, system modellers, chemists, and machine-learning engineers, to name a few.
Our company has multiple sites throughout the country and overseas, with the two primary data-centers supporting our Head Office and most of the smaller sites. Some of these sites, however, have a need for local storage, so our DR/BCP PowerScale cluster receives replicated data from both our production cluster as well as these other file servers.
Before PowerScale we used to have a different EMC product. I believe it was VNX 5000, which is primarily a block storage array with some NAS functionality. We did not have a HPC environment, however we did have a group of servers that performed approximately the same function.
Back in those days, raw storage had to be partitioned into multiple LUNs, and presented as several independent block devices because of size limitations of the storage array. When one of these devices started to run out of space, it was extremely cumbersome and time-consuming to shift data away from it, which slowed down our science. We wanted a solution that would free our users from the overhead of all of that data wrangling. Isilon was a good fit because it enabled us to effectively consolidate five separate data stores into a single filesystem, providing a single point of entry to our data for all of our users.
PowerScale helped us consolidate our former block storage into a full-fledged, scale-out, file storage platform with great success. We then decided to expand our use cases further, replacing some of the ancillary Windows File Servers that provided network file shares in our Head Office. We now have a single platform for all our unstructured data needs at our main locations.
We have not explored using PowerScale cloud-enabling features yet, but it is in our roadmap. The fact that those features exist out of the box, and can be enabled as required is another reason the platform is so versatile.
The switch to PowerScale was transformative. Before we implemented it, users had to constantly move their data between different storage platforms, which was time consuming and a high barrier of entry for getting the most of our centralized compute. Distributed, parallel processing is challenging enough, to add data wrangling on top of it created massive cognitive overload. Scientists are always under pressure to deliver on time, and deadlines are unforgiving. The easier we can make leveraging technology for them, the better.
We officially launched our current HPC environment shortly after we introduced Isilon, supporting approximately 20 users. Today, that number has grown 17500% to over 350 users across all of our sites. In an organization with nearly 1,000 employees, that's more than a third of our workforce! I credit PowerScale as one of the critical factors responsible for that growth. PowerScale simplified data management because it allows you to present the same data via multiple different protocols (eg: SMB, NFS, FTP, HTTP, etc), tremendously reducing our users’ cognitive overhead.
Before adopting PowerScale, we also faced capacity constraints in our environment. I had to constantly ask end-users to clean up and remove files they no longer needed. Our block data stores were constantly sitting at around 90% utilization. Expanding the storage array was not only expensive: every time that we wanted to provision additional space we had to decide if it was justified to re-architect the environment versus adding yet another data store. And going with the later option meant going back to our users again to free up space before more capacity could be added. All of this wasted massive amounts of time, that could have otherwise been spent running jobs and doing science.
Once we introduced scale-out storage, capacity upgrades and expansion became straightforward. The procurement process was simplified because now we can easily project when we will hit 90% storage utilization, and our users have visibility of how much storage they are individually consuming thanks to accounting-only quotas, which help keeping storage usage down. PowerScale provides a lot of metrics out of the box, which are easy to navigate and visualize using InsightIQ, and most recently DataIQ.
I can certainly recommend PowerScale for mission-critical workloads, it is a powerful but simple platform with little administration overhead. We use it in production for a variety of use cases, and it would be hard for our organization to operate effectively without it.