What is our primary use case?
The primary use case is data platforms, specifically data warehousing. It involves restoring and moving data within the platform to prepare it for analysis, routing activities, or serving as the backbone for applications.
Snowflake also advertises different workstreams, but my customers mostly use it as their core platform to ingest data and serve the onward goals of the wider company.
What is most valuable?
The most valuable feature of Snowflake is consumption-based costs, which means that you only pay for the storage and compute you use. There's a complete separation of storage and computing, so you don't need to add another server to increase storage or computing. From a costing perspective, it's well-positioned.
Snowflake's time travel is also incredibly useful, and they have a function called "UNDROP," where you can undo a table drop. Data sharing and replication for Snowflake are strong, and they have a data marketplace with public and private data sets available for sharing. Companies can put their data on the marketplace, and anyone can use it by starting the payment model. The data is provided live straight to you, and it appears as if it were just another database in your own environment.
What needs improvement?
The main thing I'm excited to see at some point with Snowflake, hopefully - I've not seen anything coming out of it yet - is Git integration into the worksheets and the UI. Sometimes it can be tricky to manage multiple environments if you're purely using Snowflake as your scripting and pipeline environment. This is handleable, so if you use third-party tools like DBT, Matillion, etc., those can help. But if you're looking purely within Snowflake itself, it'd be great to have some form of Git support.
For the future releases, I would love it if they one day decided to implement their own GUI-based transformation tool environment. I know that many competitors like Azure have to Sign Up, and Azure Data Factory can sit in. However, Azure is a very different beast that serves all sorts of different processes, and an argument could be made for whether it's the best to each of those or not. Specifically within Snowflake, I would love it if they could get some form of orchestration built-in for transformation that doesn't have to be controlled directly through code all the time.
For how long have I used the solution?
I have been using Snowflake for five years.
What do I think about the stability of the solution?
It is an incredibly stable solution. It will only go down if your cloud provider itself goes down. So, let's say your Snowflake is hosted in Azure London. If the Azure London data center goes down, I would only see Snowflake going down. If that does happen, Snowflake does have plenty of options for failback replication and rollover backups.
So we have quite a few customers that, for example, need their data restored in AWS London, and they've got a backup or a replication stored in Azure London. If AWS London goes down, then Azure London one will kick in and become the primary account, and all of the URLs, etcetera, remain the same because they've set up failover URLs and connections for it. At least for the end customer, there's no change. It's only for the architecture and developers behind the scene who then have to double-check things and do all the normal due diligence. But it runs very smoothly
What do I think about the scalability of the solution?
It is a highly scalable solution. There is no limit on storage or computing. They have everything on consumption-based pricing, but you can have what's known as a multi-cluster warehouse. So, warehouses are what you use for the compute.
The multi-cluster warehouses will sit there originally as a single cluster. But then, if there are enough concurrent queries taking place in that warehouse, it can, as it needs, just spin up another one from another one and another one to meet those current needs. And as soon as they can dive down again, it can switch those clusters off again one by one. And you can create as many clusters, warehouses, as many as you need. There is no scaling issue at all. I've seen it most, like, 10,000 queries a second, and it's run very, very smoothly.
How are customer service and support?
The customer service and support team is very useful and strong. They've got support built directly into the Snowflake UI. So wherever you are on the platform, and you see an issue, you can click into the support area and submit your ticket, including direct things like the query ID that you're using or multiple query IDs and all that stuff.
I find Snowflake to be very responsive, and if you submit a top-level ticket, you can get a response very quickly. The lowest tier of tickets might take 48 hours sometimes, but overall, they are very helpful.
Which solution did I use previously and why did I switch?
I personally don't see any of the competing cloud platforms coming close right now to what Snowflake offers. An argument could be made with GCP and Datadog are getting closer. Also, a new AWS Redshift is on the horizon, like a whole new AWS Redshift 2.0. But right now, I've not seen anything that comes close. Snowflake, to my understanding, is the only platform that fully separates your storage and computing, essentially. And it's the only platform I've seen with things like time travel. It's got a whole bunch of great features that I don't know if other tools also have, but it supports semi-structured data. It supports automated tasks, alerts, and reporting. And the data sharing is a massive one. GCP now also has its own data-sharing potential, where you can share data with other GCP accounts. I've not used it myself, but to my knowledge, whilst they have the sharing, they don't have anything that even comes close to the Snowflake data marketplace that allows customers to sell or share their data outside the wider world. And it doesn't have anything that comes close to the kind of private equipment where customers might share their own data internally or to their own. And I think there was one more thing.
Snowflake also have some really good support for Python, Scalar, and Java through what they call Snowpark, which was launched last year. But more recently, this year, it was announced they're really pushing forward with their StreamLINK integration. It will allow customers to host applications on Snowflake and share those applications with other users in a very similar kind of marketplace environment they use for data sharing. I don't think there's anything that any of the other competitors have right now.
How was the initial setup?
The deployment model is delivered as a service. So the most deployment you have to do yourself is by deciding which cloud provider and region you want it to be hosted in. But Snowflake will actually host it themselves, so there's no deployment beyond clicking from a dropdown and clicking okay, and it'll magically appear.
Moreover, it's very easy to maintain because it's delivered entirely as a service. Snowflake takes care of all the patches, upgrades, maintenance, security tweaks, etc.
What was our ROI?
We have many long-term customers who have been using Snowflake for years, and they wouldn't continue to use it if they weren't seeing a strong return on investment.
What other advice do I have?
There are many options for starting a Snowflake deployment, but I recommend working with a partner who can provide best practices and guidance. It could be through Snowflake directly or another service partner. Working with a partner can save you time and prevent mistakes down the road.
Overall, I would rate the solution a ten out of ten.
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner