Try our new research platform with insights from 80,000+ expert users

Cloudera Data Science Workbench vs Databricks comparison

Sponsored
 

Comparison Buyer's Guide

Executive SummaryUpdated on Dec 5, 2024
 

Categories and Ranking

IBM SPSS Statistics
Sponsored
Ranking in Data Science Platforms
9th
Average Rating
8.0
Reviews Sentiment
6.9
Number of Reviews
37
Ranking in other categories
Data Mining (3rd)
Cloudera Data Science Workb...
Ranking in Data Science Platforms
22nd
Average Rating
7.0
Reviews Sentiment
6.9
Number of Reviews
2
Ranking in other categories
No ranking in other categories
Databricks
Ranking in Data Science Platforms
1st
Average Rating
8.2
Reviews Sentiment
7.0
Number of Reviews
84
Ranking in other categories
Streaming Analytics (1st)
 

Mindshare comparison

As of December 2024, in the Data Science Platforms category, the mindshare of IBM SPSS Statistics is 2.7%, up from 2.7% compared to the previous year. The mindshare of Cloudera Data Science Workbench is 1.5%, down from 1.8% compared to the previous year. The mindshare of Databricks is 19.2%, up from 18.7% compared to the previous year. It is calculated based on PeerSpot user engagement data.
Data Science Platforms
 

Featured Reviews

Md Masudul Hassan - PeerSpot reviewer
Comprehensive data analysis capabilities with a user-friendly interface, providing an efficient and reliable platform for researchers and analysts
I believe that offering short-term SPSS licenses, perhaps when customer sourcing is available, could make it more affordable. These licenses shouldn't include features tailored for universities or large sales organizations. Instead, they could offer discounts or additional facilities for smaller entities to access the software. In developing countries, it would be beneficial to provide certain features to users at no cost initially, while also customizing pricing options. For example, offering basic features to the first hundred users can help them become familiar with the software and its capabilities. This approach encourages users to upgrade to higher tiers as they become more experienced and require additional functionality.
Ismail Peer - PeerSpot reviewer
Useful for data science modeling but improvement is needed in MLOps and pricing
If you don't configure CDSW well, then it might be not useful for you. Deploying the tool can vary in complexity, but most of the time, it's relatively simple and straightforward. Triggering a job from data to production is easy, as the platform automates the deployment process. However, ensuring optimal resource allocation is essential for smooth operations.
Dunstan Matekenya - PeerSpot reviewer
Process large-scale data sets and integrates with Apache Spark with notebook environment
Databricks integrates natively with Apache Spark, which I use as a processing engine for large-scale datasets. This native integration is one of its strengths. Another strength is that the platform makes it very easy to manage resources. For example, setting up a cluster of five or fifteen nodes is straightforward with Databricks. The notebook environment is also excellent, making it easy to perform various tasks.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"SPSS is quite robust and quicker in terms of providing you the output."
"The most valuable features are the solution is easy to use, training new users is not difficult, and our usage is comprehensive because the whole service is beneficial."
"Capability analysis is one of the main and valuable functions. We also do some hypothesis testing in Minitab and summary stats. These are the functions that we find very useful."
"The most valuable feature is the user interface because you don't need to write code."
"It is a modeling tool with helpful automation."
"IBM SPSS Statistics depends on AI."
"Most of the product features are good but I particularly like the linear regression analysis."
"The most valuable features mainly include factor analysis, correlation analysis, and geographic analysis."
"I appreciate CDSW's ability to logically segregate environments, such as data, DR, and production, ensuring they don't interfere with each other. The deployment of machine learning is fast and easy to manage. Its API calls are also fast."
"The Cloudera Data Science Workbench is customizable and easy to use."
"Databricks provides a consistent interface for data engineers to work with data in a consistent language on a single integrated platform for ingesting, processing, and serving data to the end user."
"It's very simple to use Databricks Apache Spark."
"The most valuable feature of Databricks is the integration with Microsoft Azure."
"Databricks is based on a Spark cluster and it is fast. Performance-wise, it is great."
"It's great technology."
"The tool helps with data processing and analytics with large-scale data or big data since it is associated with managing data at a large scale."
"Databricks helps crunch petabytes of data in a very short period of time."
"I like how easy it is to share your notebook with others. You can give people permission to read or edit. I think that's a great feature. You can also pull in code from GitHub pretty easily. I didn't use it that often, but I think that's a cool feature."
 

Cons

"IBM SPSS Statistics could improve the visual outputs where you are producing, for example, a graph for a company board of directors, or an advert."
"The solution needs to improve forecasting using time series analysis."
"Needs more statistical modelling functions."
"SPSS is a tool that's been around since the late 60s, and it's the universal worldwide standard for quantitative social science data analysis. That said, it does seem a bit strange to me that the graphical output functions are so clunky after all these years. The output of charts and graphs that SPSS produces is hideous."
"Each algorithm could be more adaptable to some industry-specific areas, or, in some cases, adapted for maintenance."
"If there is any self-generation data collection plan (DCP), it would be helpful in gathering data. It would also be useful if there is a function to scale it up to, let's say, UiPath and have it consolidate and integrate into a UiPath solution."
"The product should provide more ways to import data and export results that are user-friendly for high-level executives."
"I'd like to see them use more artificial intelligence. It should be smart enough to do predictions and everything based on what you input."
"Running this solution requires a minimum of 12GB to 16GB of RAM."
"The tool's MLOps is not good. It's pricing also needs to improve."
"Databricks would benefit from enhanced metrics and tighter integration with Azure's diagnostics."
"Scalability is an area with certain shortcomings. The solution's scalability needs improvement."
"It would be great if Databricks could integrate all the cloud platforms."
"Generative AI is catching up in areas like data governance and enterprise flavor. Hence, these are places where Databricks has to be faster."
"The solution could be improved by adding a feature that would make it more user-friendly for our team. The feature is simple, but it would be useful. Currently, our team is more familiar with the language R, but Databricks requires the use of Jupyter Notebooks which primarily supports Python. We have tried using RStudio, but it is not a fully integrated solution. To fully utilize Databricks, we have to use the Jupyter interface. One feature that would make it easier for our team to adopt the Jupyter interface would be the ability to select a specific variable or line of code and execute it within a cell. This feature is available in other Jupyter Notebooks outside of Databricks and in our own IDE, but it is not currently available within Databricks. If this feature were added, it would make the transition to using Databricks much smoother for our team."
"I would like it if Databricks adopted an interface more like R Studio. When I create a data frame or a table, R Studio provides a preview of the data. In R Studio, I can see that it created a table with so many columns or rows. Then I can click on it and open a preview of that data."
"Implementation of Databricks is still very code heavy."
"Instead of relying on a massive instance, the solution should offer micro partition levels. They're working on it, however, they need to implement it to help the solution run more effectively."
 

Pricing and Cost Advice

"Our licence is on a yearly renewal basis. While pricing is not the primary concern in our evaluation, as products are assessed by whether they can meet our user needs and expertise, the cost can be a limiting factor in the number of licences we procure."
"The price of this solution is a little bit high, which was a problem for my company."
"It's quite expensive, but they do a special deal for universities."
"While the pricing of the product may be higher, the accompanying service and features justify the investment."
"SPSS is an expensive piece of software because it's incredibly complex and has been refined over decades, but I would say it's fairly priced."
"We think that IBM SPSS is expensive for this function."
"The pricing of the modeler is high and can reduce the utility of the product for those who can not afford to adopt it."
"The price of IBM SPSS Statistics could improve."
"The product is expensive."
"The price of Databricks is reasonable compared to other solutions."
"The solution is based on a licensing model."
"The cost is around $600,000 for 50 users."
"The billing of Databricks can be difficult and should improve."
"I do not exactly know the costs, but one of our clients pays between $100 USD and $200 USD monthly."
"The basic version of this solution is now open-source, so there are no license costs involved. However, there is a charge for any advanced functionality and this can be quite expensive."
"The pricing depends on the usage itself."
"The solution uses a pay-per-use model with an annual subscription fee or package. Typically this solution is used on a cloud platform, such as Azure or AWS, but more people are choosing Azure because the price is more reasonable."
report
Use our free recommendation engine to learn which Data Science Platforms solutions are best for your needs.
824,053 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
17%
Computer Software Company
9%
University
8%
Manufacturing Company
8%
Financial Services Firm
36%
Manufacturing Company
12%
Healthcare Company
9%
Computer Software Company
6%
Financial Services Firm
16%
Computer Software Company
11%
Manufacturing Company
9%
Healthcare Company
6%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
No data available
 

Questions from the Community

What do you like most about IBM SPSS Statistics?
The software offers consistency across multiple research projects helping us with predictive analytics capabilities.
What is your experience regarding pricing and costs for IBM SPSS Statistics?
The cost of IBM SPSS Statistics is managed by organizations, not individual researchers. It is a very expensive produ...
What needs improvement with IBM SPSS Statistics?
IBM SPSS Statistics does not keep you close to your data like KNIME. In KNIME, at every stage, you can see the result...
What do you like most about Cloudera Data Science Workbench?
I appreciate CDSW's ability to logically segregate environments, such as data, DR, and production, ensuring they don'...
What needs improvement with Cloudera Data Science Workbench?
The tool's MLOps is not good. It's pricing also needs to improve.
What is your primary use case for Cloudera Data Science Workbench?
We have different use cases. Our banking use case uses machine learning to identify customer life events and recommen...
Which do you prefer - Databricks or Azure Machine Learning Studio?
Databricks gives you the option of working with several different languages, such as SQL, R, Scala, Apache Spark, or ...
How would you compare Databricks vs Amazon SageMaker?
We researched AWS SageMaker, but in the end, we chose Databricks. Databricks is a Unified Analytics Platform designe...
Which would you choose - Databricks or Azure Stream Analytics?
Databricks is an easy-to-set-up and versatile tool for data management, analysis, and business analytics. For analyti...
 

Also Known As

SPSS Statistics
CDSW
Databricks Unified Analytics, Databricks Unified Analytics Platform, Redash
 

Learn More

Video not available
 

Overview

 

Sample Customers

LDB Group, RightShip, Tennessee Highway Patrol, Capgemini Consulting, TEAC Corporation, Ironside, nViso SA, Razorsight, Si.mobil, University Hospitals of Leicester, CROOZ Inc., GFS Fundraising Solutions, Nedbank Ltd., IDS-TILDA
IQVIA, Rush University Medical Center, Western Union
Elsevier, MyFitnessPal, Sharethrough, Automatic Labs, Celtra, Radius Intelligence, Yesware
Find out what your peers are saying about Cloudera Data Science Workbench vs. Databricks and other solutions. Updated: December 2024.
824,053 professionals have used our research since 2012.