Try our new research platform with insights from 80,000+ expert users

Cloudera DataFlow vs Databricks comparison

 

Comparison Buyer's Guide

Executive Summary
 

Categories and Ranking

Cloudera DataFlow
Ranking in Streaming Analytics
14th
Average Rating
7.2
Number of Reviews
4
Ranking in other categories
No ranking in other categories
Databricks
Ranking in Streaming Analytics
1st
Average Rating
8.2
Number of Reviews
82
Ranking in other categories
Data Science Platforms (1st)
 

Mindshare comparison

As of November 2024, in the Streaming Analytics category, the mindshare of Cloudera DataFlow is 1.4%, down from 1.6% compared to the previous year. The mindshare of Databricks is 14.0%, up from 9.6% compared to the previous year. It is calculated based on PeerSpot user engagement data.
Streaming Analytics
 

Featured Reviews

Júlio César Gomes Fonseca - PeerSpot reviewer
Jun 23, 2023
A stable solution that helps develop quality modules but needs to improve its programming language
Sometimes I need this workflow to make my modules, not for campaign preparation. It is solely focused on developing quality modules for direct telecommunication companies In Cloudera DataFlow, I can't say which is the most valuable feature because we use all modules. We need to compare each…
Dunstan Matekenya - PeerSpot reviewer
Jul 10, 2024
Process large-scale data sets and integrates with Apache Spark with notebook environment
I primarily use Databricks to process large-scale data sets with Apache Spark. My main use case is processing large data sets, such as 600 GB or 800 GB Databricks integrates natively with Apache Spark, which I use as a processing engine for large-scale datasets. This native integration is one of…

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"The most effective features are data management and analytics."
"DataFlow's performance is okay."
"This solution is very scalable and robust."
"The initial setup was not so difficult"
"Databricks is hosted on the cloud. It is very easy to collaborate with other team members who are working on it. It is production-ready code, and scheduling the jobs is easy."
"Easy to use and requires minimal coding and customizations."
"I like how easy it is to share your notebook with others. You can give people permission to read or edit. I think that's a great feature. You can also pull in code from GitHub pretty easily. I didn't use it that often, but I think that's a cool feature."
"Databricks has improved my organization by allowing us to transform data from sources to a different format and feed that to the analytics, business intelligence, and reporting teams. This tool makes it easy to do those kinds of things."
"Can cut across the entire ecosystem of open source technology to give an extra level of getting the transformatory process of the data."
"Databricks allows me to automate the creation of a cluster, optimized for machine learning and construct AI machine learning models for the client."
"One of the features provides nice interactive clusters, or compute instances that you don't really need to manage often."
"Databricks gives us the ability to build a lakehouse framework and do everything implicit to this type of database structure. We also like the ability to stream events. Databricks covers a broad spectrum, from reporting and machine learning to streaming events. It's important for us to have all these features in one platform."
 

Cons

"It's an outdated legacy product that doesn't meet the needs of modern data analysts and scientists."
"Although their workflow is pretty neat, it still requires a lot of transformation coding; especially when it comes to Python and other demanding programming languages."
"It is not easy to use the R language. Though I don't know if it's possible, I believe it is possible, but it is not the best language for machine learning."
"There would also be benefits if more options were available for workers, or the clusters of the two points."
"It's not easy to use, and they need a better UI."
"The biggest problem associated with the product is that it is quite pricey."
"Databricks could improve in some of its functionality."
"A lot of people are required to manage this solution."
"The integration and query capabilities can be improved."
"The solution could improve by providing better automation capabilities. For example, working together with more of a DevOps approach, such as continuous integration."
"There could be more support for automated machine learning in the database. I would like to see more ways to do analysis so that the reporting is more understandable."
 

Pricing and Cost Advice

"DataFlow isn't expensive, but its value for money isn't great."
"I would rate the tool’s pricing an eight out of ten."
"Databricks' cost could be improved."
"The solution uses a pay-per-use model with an annual subscription fee or package. Typically this solution is used on a cloud platform, such as Azure or AWS, but more people are choosing Azure because the price is more reasonable."
"The billing of Databricks can be difficult and should improve."
"There are different versions."
"The price is okay. It's competitive."
"We have only incurred the cost of our AWS cloud services. This is because during this period, Databricks provided us with an extended evaluation period, and we have not spent much money yet. We are just starting to incur costs this month, I will know more later on the full cost perspective."
"The licensing costs of Databricks is a tiered licensing regime, so it is flexible."
report
Use our free recommendation engine to learn which Streaming Analytics solutions are best for your needs.
814,649 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Computer Software Company
19%
Financial Services Firm
16%
University
11%
Manufacturing Company
8%
Financial Services Firm
16%
Computer Software Company
12%
Manufacturing Company
9%
Healthcare Company
6%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
No data available
 

Questions from the Community

What do you like most about Cloudera DataFlow?
The most effective features are data management and analytics.
What is your primary use case for Cloudera DataFlow?
We use Cloudera DataFlow for stream analytics.
Which do you prefer - Databricks or Azure Machine Learning Studio?
Databricks gives you the option of working with several different languages, such as SQL, R, Scala, Apache Spark, or Python. It offers many different cluster choices and excellent integration with ...
How would you compare Databricks vs Amazon SageMaker?
We researched AWS SageMaker, but in the end, we chose Databricks. Databricks is a Unified Analytics Platform designed to accelerate innovation projects. It is based on Spark so it is very fast. It...
Which would you choose - Databricks or Azure Stream Analytics?
Databricks is an easy-to-set-up and versatile tool for data management, analysis, and business analytics. For analytics teams that have to interpret data to further the business goals of their orga...
 

Also Known As

CDF, Hortonworks DataFlow, HDF
Databricks Unified Analytics, Databricks Unified Analytics Platform, Redash
 

Overview

 

Sample Customers

Clearsense
Elsevier, MyFitnessPal, Sharethrough, Automatic Labs, Celtra, Radius Intelligence, Yesware
Find out what your peers are saying about Cloudera DataFlow vs. Databricks and other solutions. Updated: October 2024.
814,649 professionals have used our research since 2012.