One of the most popular comparisons on IT Central Station is Apache Hadoop vs Snowflake.
People like you are trying to decide which one is best for their company. Can you help them out?
What is the biggest difference between Apache Hadoop and Snowflake? Which of these two solutions would you recommend to a colleague evaluating data warehouse systems and why?
Thanks for helping your peers make the best decision!
Interactive querying as a consumption pattern is something Snowflake handles much better than Hadoop and related query engine options - Impala, Presto, Drill etc. Heavy data scientists query workload can be an expensive query pattern on Snowflake and Hadoop can provide a more cost-efficient solution. Hadoop is also still relevant as a back-end data processing engine, instead of leveraging Snowflake for data transformation due to higher cost as well as limited procedural language capabilities (javascript based stored procedures). Snowflake fares much better than Hadoop in terms of administrative complexity.
Apache Hadoop is for data lake use cases. But getting data out of Hadoop for meaningful analytics is indeed need quite an amount of work. by either using spark/Hive/presto and so on. The way i look at Snowflake and Hadoop is they complement each other. For data lake you can use hadoop and then for datawarehouse companies can use snowflake. Depending on the size of the company you can turn snowflake into a data lake use case too. Snowflake is SQL friendly and you don't need to carry out any circus to get the data in and out of snowflake.