Try our new research platform with insights from 80,000+ expert users

Databricks vs Google Cloud Dataflow comparison

 

Comparison Buyer's Guide

Executive SummaryUpdated on Oct 8, 2024
 

Categories and Ranking

Databricks
Ranking in Streaming Analytics
1st
Average Rating
8.2
Number of Reviews
82
Ranking in other categories
Data Science Platforms (1st)
Google Cloud Dataflow
Ranking in Streaming Analytics
8th
Average Rating
7.8
Number of Reviews
10
Ranking in other categories
No ranking in other categories
 

Mindshare comparison

As of November 2024, in the Streaming Analytics category, the mindshare of Databricks is 14.0%, up from 9.6% compared to the previous year. The mindshare of Google Cloud Dataflow is 8.3%, up from 6.6% compared to the previous year. It is calculated based on PeerSpot user engagement data.
Streaming Analytics
 

Featured Reviews

Dunstan Matekenya - PeerSpot reviewer
Jul 10, 2024
Process large-scale data sets and integrates with Apache Spark with notebook environment
I primarily use Databricks to process large-scale data sets with Apache Spark. My main use case is processing large data sets, such as 600 GB or 800 GB Databricks integrates natively with Apache Spark, which I use as a processing engine for large-scale datasets. This native integration is one of…
Tamer Talal - PeerSpot reviewer
Feb 14, 2024
A tool useful for data transmission and data storage that needs to improve its authentication area
I use the solution in my company for data transmission and data storage One of the good features of the product is the overall capacity that it provides to its users. Though the speed of the product is good, the main feature of the product that I like is its capacity. The authentication part of…

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"The most valuable features of the solution are the hardware and the resources it quickly provides without much hassle."
"The most valuable aspect of the solution is its notebook. It's quite convenient to use, both terms of the research and the development and also the final deployment, I can just declare the spark jobs by the load tables. It's quite convenient."
"Databricks is based on a Spark cluster and it is fast. Performance-wise, it is great."
"The load distribution capabilities are good, and you can perform data processing tasks very quickly."
"Specifically for data science and data analytics purposes, it can handle large amounts of data in less time. I can compare it with Teradata. If a job takes five hours with Teradata databases, Databricks can complete it in around three to three and a half hours."
"Databricks is a unified solution that we can use for streaming. It is supporting open source languages, which are cloud-agnostic. When I do database coding if any other tool has a similar language pack to Excel or SQL, I can use the same knowledge, limiting the need to learn new things. It supports a lot of Python libraries where I can use some very easily."
"The solution offers a free community version."
"Databricks gives us the ability to build a lakehouse framework and do everything implicit to this type of database structure. We also like the ability to stream events. Databricks covers a broad spectrum, from reporting and machine learning to streaming events. It's important for us to have all these features in one platform."
"The product's installation process is easy...The tool's maintenance part is somewhat easy."
"It is a scalable solution."
"The most valuable features of Google Cloud Dataflow are scalability and connectivity."
"I don't need a server running all the time while using the tool. It is also easy to setup. The product offers a pay-as-you-go service."
"The most valuable features of Google Cloud Dataflow are the integration, it's very simple if you have the complete stack, which we are using. It is overall very easy to use, user-friendly friendly, and cost-effective if you know how to use it. The solution is very flexible for programmers, if you know how to do scripts or program in Python or any other language, it's extremely easy to use."
"The best feature of Google Cloud Dataflow is its practical connectedness."
"The solution allows us to program in any language we desire."
"Google Cloud Dataflow is useful for streaming and data pipelines."
 

Cons

"The solution could improve by providing better automation capabilities. For example, working together with more of a DevOps approach, such as continuous integration."
"The query plan is not easy with Databrick's job level. If I want to tune any of the code, it is not easily available in the blogs as well."
"I have had some issues with some of the Spark clusters running on Databricks, where the Spark runtime and clusters go up and down, which is an area for improvement."
"The connectivity with various BI tools could be improved, specifically the performance and real time integration."
"I would like it if Databricks adopted an interface more like R Studio. When I create a data frame or a table, R Studio provides a preview of the data. In R Studio, I can see that it created a table with so many columns or rows. Then I can click on it and open a preview of that data."
"Databricks would benefit from enhanced metrics and tighter integration with Azure's diagnostics."
"Databricks could improve in some of its functionality."
"Instead of relying on a massive instance, the solution should offer micro partition levels. They're working on it, however, they need to implement it to help the solution run more effectively."
"Google Cloud Data Flow can improve by having full simple integration with Kafka topics. It's not that complicated, but it could improve a bit. The UI is easy to use but the experience could be better. There are other tools available that do a better job."
"I would like Google Cloud Dataflow to be integrated with IT data flow and other related services to make it easier to use as it is a complex tool."
"The technical support has slight room for improvement."
"The deployment time could also be reduced."
"When I deploy the product in local errors, a lot of errors pop up which are not always caught. The solution's error logging is bad. It can take a lot of time to debug the errors. It needs to have better logs."
"Google Cloud Dataflow should include a little cost optimization."
"They should do a market survey and then make improvements."
"The authentication part of the product is an area of concern where improvements are required."
 

Pricing and Cost Advice

"The price is okay. It's competitive."
"Databricks' cost could be improved."
"We find Databricks to be very expensive, although this improved when we found out how to shut it down at night."
"It is an expensive tool. The licensing model is a pay-as-you-go one."
"The billing of Databricks can be difficult and should improve."
"We have only incurred the cost of our AWS cloud services. This is because during this period, Databricks provided us with an extended evaluation period, and we have not spent much money yet. We are just starting to incur costs this month, I will know more later on the full cost perspective."
"We implement this solution on behalf of our customers who have their own Azure subscription and they pay for Databricks themselves. The pricing is more expensive if you have large volumes of data."
"The cost for Databricks depends on the use case. I work on it as a consultant, so I'm using the client's Databricks, so it depends on how big the client is."
"On a scale from one to ten, where one is cheap, and ten is expensive, I rate Google Cloud Dataflow's pricing a four out of ten."
"Google Cloud Dataflow is a cheap solution."
"The price of the solution depends on many factors, such as how they pay for tools in the company and its size."
"On a scale from one to ten, where one is cheap, and ten is expensive, I rate the solution's pricing a seven to eight out of ten."
"Google Cloud is slightly cheaper than AWS."
"The solution is not very expensive."
"The tool is cheap."
"The solution is cost-effective."
report
Use our free recommendation engine to learn which Streaming Analytics solutions are best for your needs.
814,649 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
16%
Computer Software Company
12%
Manufacturing Company
9%
Healthcare Company
6%
Financial Services Firm
17%
Computer Software Company
12%
Retailer
12%
Manufacturing Company
11%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

Which do you prefer - Databricks or Azure Machine Learning Studio?
Databricks gives you the option of working with several different languages, such as SQL, R, Scala, Apache Spark, or Python. It offers many different cluster choices and excellent integration with ...
How would you compare Databricks vs Amazon SageMaker?
We researched AWS SageMaker, but in the end, we chose Databricks. Databricks is a Unified Analytics Platform designed to accelerate innovation projects. It is based on Spark so it is very fast. It...
Which would you choose - Databricks or Azure Stream Analytics?
Databricks is an easy-to-set-up and versatile tool for data management, analysis, and business analytics. For analytics teams that have to interpret data to further the business goals of their orga...
What do you like most about Google Cloud Dataflow?
The product's installation process is easy...The tool's maintenance part is somewhat easy.
What needs improvement with Google Cloud Dataflow?
The authentication part of the product is an area of concern where improvements are required. For some common users, the solution's authentication part is difficult to use. The scalability of the p...
 

Also Known As

Databricks Unified Analytics, Databricks Unified Analytics Platform, Redash
Google Dataflow
 

Learn More

 

Overview

 

Sample Customers

Elsevier, MyFitnessPal, Sharethrough, Automatic Labs, Celtra, Radius Intelligence, Yesware
Absolutdata, Backflip Studios, Bluecore, Claritics, Crystalloids, Energyworx, GenieConnect, Leanplum, Nomanini, Redbus, Streak, TabTale
Find out what your peers are saying about Databricks vs. Google Cloud Dataflow and other solutions. Updated: October 2024.
814,649 professionals have used our research since 2012.