What are the benefits and drawbacks of using cloud storage?
Cloud storage has increasingly become more popular than using local storage. What are the pros and cons of using cloud storage? Are there scenarios where local storage is preferable?
UNIX Security Consultant at a retailer with 1,001-5,000 employees
Real User
Top 20
2020-06-18T10:10:12Z
Jun 18, 2020
There are quite a few. Classical storage is either block/SAN (iSCSI/FC/direct attached) or shared/NAS (NFS/CIFS). Cloud storage is different in a lot of ways:
1) it is generally distributed/scale-out as opposed to scale-up. You can increase capacity and performance by simply adding nodes. In SAN/NAS there are limits to increasing capacity/storage. You can add more disk shelves, but the controllers might become saturated, in which case you scrap your whole storage and migrate to a newer more powerful one. In cloud storage, you just add nodes as needed.
2) Cloud storage is generally offered via HTTP. Since most modern business applications communicate via HTTP anyway (think SOAP/REST/etc.), this comes naturally. HTTP is a very well understood protocol and you have a much better tooling for it. You have decent load-balancers, it works very well in a distributed fashion and it is native to the internet. In some scenarios, you can even expose the cloud storage directly to the customer without anything in front of it. Think of a billing platform that contains the PDFs of all the bills, or a collaboration platform where you have contact photos, or a simple website where you keep the json with the customer preferences directly in an object instead of a database. You don't have to write a PHP or Java module that gets the file from a filesystem to serve it to the customer, it's already available via HTTP in the cloud storage and you just include the link to it.
3) Software defined storage. This is where things get really interesting. Where I work we have 30 employees managing 500 physical servers, 40 all flash storage systems, and more than 30k VMs. We are the team behind 700 developers and everything runs on the infrastructure that we provide, with hardware from tens of major brands. For each stack deployment (dev, qa, prod) consisting of as much as 500 vms they would need the creation of 500VMs, the allocation of 1000LUNs, 20VLANS, 1000 FW rules. Nobody can do that in a matter of minutes. Nobody can choose which storage is the most appropriate for their current deployment in a matter of minutes. Something like this would take as much as 1 week in classic ITOps. SDS allows them to claim (within a budget) their needed hardware (storage being the point here), without caring about balancing the load between storage arrays/clusters, without caring about anything. They just put their Persistent Volume Claims in the YAML file and it magically happens without the intervention of the SysAdmins. For us it's also great, we just give them reports on their hardware resource usage every few weeks.
4) As a result of the ones above you have the best one ever: write once, run anywhere. A well enough written application that was designed to be cloud agnostic, will be migrate-able anywhere your business needs desire. Today internally, tomorrow in AWS, and in Azure the week after.
There are also disadvantages to scale-out storage. Let's take them one at a time:
1) Performance. A properly tuned non-shared FS on a modern all flash array via NVME will be spectacular in performance. It will easily get 400% better performance (response times and IOPS). There are a number of factors at play, but suffice to say that NVMEoF and even SCSI/FC will always beat TCP/IP over Ethernet by large margins. Obviously, as the applications get more and more modular and as microservices take over, latency will matter more and more. SSL has latency implications (15% in our experience).
2) Integrations. The SAN and NAS are well understood subjects for more than 20 years (even 30 if you think of NFS). The Operating System takes care of them flawlessly and transparent to the application. In the case of a cloud storage, your application needs to implement it. Instead of doing fopen()/mmap() on a file, you resolve hostnames, open sockets, use SSL libraries and use HTTP abstractions over abstractions, allocate buffers, etc. Some of the complexity and responsibility moves to the developer of the application, to the code quality, and to the wiseness of his library choices. But by now, in 2020, this is a well understood stage.
3) Cost. As a result of Performance, cloud might be more expensive. If you have a small deployment, a small application or just want to experiment with new technologies cloud can be a lot cheaper too. I have a friend that wrote an application for forecasting natural gas usage nation-wide. It is used by most of the gas traders in Romania and most of the suppliers. It is quite intensive when he trains the model, but otherwise the resource usage is small since there are 500 traders in about 40 companies. His cloud bill is less than $1000/month. He doesn't require routers and servers and/or hosting. He doesn't manage anything. He writes code and publishes it. It makes a lot of sense for him. OTOH, at my current employer, we buy hardware worth around $5-10M/year (we do get massive discounts, but that's another story). If we were to move to cloud, the bills would probably be around $20M/year and a lot of performance sensitive applications would require rewrites due to the increased latency of the storage. It's a mixed bag. That's why you can use cloud-storage (generally S3) in your own premise with spectacular results from vendors like MinIo, OpenIO and PureStorage.
4) Security. Since you can directly expose the storage, it's very easy to forget to secure it. Quite a lot of security problems have been reported in recent years of entire databases exposed online that were accidentally discovered and stolen.
Cloud Storage solutions provide users with scalable and flexible options to store and manage data efficiently. Organizations can access, share, and secure data seamlessly, enhancing productivity and collaboration.
Cloud Storage offers storage solutions over the internet where data is stored on remote servers belonging to third-party service providers. Customers can scale storage capacity on demand without investing in physical infrastructure. This elasticity allows for better cost...
There are quite a few. Classical storage is either block/SAN (iSCSI/FC/direct attached) or shared/NAS (NFS/CIFS). Cloud storage is different in a lot of ways:
1) it is generally distributed/scale-out as opposed to scale-up. You can increase capacity and performance by simply adding nodes. In SAN/NAS there are limits to increasing capacity/storage. You can add more disk shelves, but the controllers might become saturated, in which case you scrap your whole storage and migrate to a newer more powerful one. In cloud storage, you just add nodes as needed.
2) Cloud storage is generally offered via HTTP. Since most modern business applications communicate via HTTP anyway (think SOAP/REST/etc.), this comes naturally. HTTP is a very well understood protocol and you have a much better tooling for it. You have decent load-balancers, it works very well in a distributed fashion and it is native to the internet. In some scenarios, you can even expose the cloud storage directly to the customer without anything in front of it. Think of a billing platform that contains the PDFs of all the bills, or a collaboration platform where you have contact photos, or a simple website where you keep the json with the customer preferences directly in an object instead of a database. You don't have to write a PHP or Java module that gets the file from a filesystem to serve it to the customer, it's already available via HTTP in the cloud storage and you just include the link to it.
3) Software defined storage. This is where things get really interesting. Where I work we have 30 employees managing 500 physical servers, 40 all flash storage systems, and more than 30k VMs. We are the team behind 700 developers and everything runs on the infrastructure that we provide, with hardware from tens of major brands. For each stack deployment (dev, qa, prod) consisting of as much as 500 vms they would need the creation of 500VMs, the allocation of 1000LUNs, 20VLANS, 1000 FW rules. Nobody can do that in a matter of minutes. Nobody can choose which storage is the most appropriate for their current deployment in a matter of minutes. Something like this would take as much as 1 week in classic ITOps. SDS allows them to claim (within a budget) their needed hardware (storage being the point here), without caring about balancing the load between storage arrays/clusters, without caring about anything. They just put their Persistent Volume Claims in the YAML file and it magically happens without the intervention of the SysAdmins. For us it's also great, we just give them reports on their hardware resource usage every few weeks.
4) As a result of the ones above you have the best one ever: write once, run anywhere. A well enough written application that was designed to be cloud agnostic, will be migrate-able anywhere your business needs desire. Today internally, tomorrow in AWS, and in Azure the week after.
There are also disadvantages to scale-out storage. Let's take them one at a time:
1) Performance. A properly tuned non-shared FS on a modern all flash array via NVME will be spectacular in performance. It will easily get 400% better performance (response times and IOPS). There are a number of factors at play, but suffice to say that NVMEoF and even SCSI/FC will always beat TCP/IP over Ethernet by large margins. Obviously, as the applications get more and more modular and as microservices take over, latency will matter more and more. SSL has latency implications (15% in our experience).
2) Integrations. The SAN and NAS are well understood subjects for more than 20 years (even 30 if you think of NFS). The Operating System takes care of them flawlessly and transparent to the application. In the case of a cloud storage, your application needs to implement it. Instead of doing fopen()/mmap() on a file, you resolve hostnames, open sockets, use SSL libraries and use HTTP abstractions over abstractions, allocate buffers, etc. Some of the complexity and responsibility moves to the developer of the application, to the code quality, and to the wiseness of his library choices. But by now, in 2020, this is a well understood stage.
3) Cost. As a result of Performance, cloud might be more expensive. If you have a small deployment, a small application or just want to experiment with new technologies cloud can be a lot cheaper too. I have a friend that wrote an application for forecasting natural gas usage nation-wide. It is used by most of the gas traders in Romania and most of the suppliers. It is quite intensive when he trains the model, but otherwise the resource usage is small since there are 500 traders in about 40 companies. His cloud bill is less than $1000/month. He doesn't require routers and servers and/or hosting. He doesn't manage anything. He writes code and publishes it. It makes a lot of sense for him. OTOH, at my current employer, we buy hardware worth around $5-10M/year (we do get massive discounts, but that's another story). If we were to move to cloud, the bills would probably be around $20M/year and a lot of performance sensitive applications would require rewrites due to the increased latency of the storage. It's a mixed bag. That's why you can use cloud-storage (generally S3) in your own premise with spectacular results from vendors like MinIo, OpenIO and PureStorage.
4) Security. Since you can directly expose the storage, it's very easy to forget to secure it. Quite a lot of security problems have been reported in recent years of entire databases exposed online that were accidentally discovered and stolen.