Databricks is very useful and can handle thousands of different use cases. The use cases are all over the place.
CEO - Founder / Principal Data Scientist / Principal AI Architect at Kanayma LLC
Saves time and effort; thousands of applicable use cases
Pros and Cons
- "Databricks has improved my organization by allowing us to transform data from sources to a different format and feed that to the analytics, business intelligence, and reporting teams. This tool makes it easy to do those kinds of things."
- "In the next release, I would like to see more optimization features."
What is our primary use case?
How has it helped my organization?
Databricks has improved my organization by allowing us to transform data from sources to a different format and feed that to the analytics, business intelligence, and reporting teams. This tool makes it easy to do those kinds of things.
What is most valuable?
The most valuable Databricks feature for us is that it does not require us to configure clusters. It automatically configures the clusters to the right size, the right number of clusters, the right number of nodes per cluster, et cetera.
What needs improvement?
The area in which this product can be improved is optimization. In the next release, I would like to see more optimization features.
Buyer's Guide
Databricks
November 2024
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
816,406 professionals have used our research since 2012.
For how long have I used the solution?
I have been using Databricks for a couple of years.
What was our ROI?
I would say the ROI for this solution is expressed mainly in terms of effort and time.
What's my experience with pricing, setup cost, and licensing?
I would advise that they train themselves before using Databricks. They should figure out which advantages Databricks has over just plain Spark and use it to the best advantage that they can.
What other advice do I have?
I am currently implementing the latest version of Databricks.
The Databricks solution is deployed through Cloud.
I would rate the Databricks solution a nine.
Which deployment model are you using for this solution?
Private Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Vice President at a tech services company with 51-200 employees
Very easy to use and requires minimal coding and customizations
Pros and Cons
- "Easy to use and requires minimal coding and customizations."
- "Doesn't provide a lot of credits or trial options."
What is our primary use case?
Our primary use case of this product is for our customers who are running large systems and looking for an API -- a quick, easy integration with their own system. We use Databricks to create a secure API interface. I'm vice president of data science and we are customers of Databricks.
What is most valuable?
Databricks is quite easy to use and requires less coding and customizations than a solution like AWS SageMaker which I'd previously used on a lot of projects. Databricks enables more people to efficiently build and host their ML code. Another great aspect is that MLflow is already integrated with Databricks which makes a big difference. It enables us to track and monitor all our different experiments. We have mostly used the MLflow part and generic notebooks with the ML building machine learning model, as well as using Pytorch for some of our medical imaging. We were able to quickly deploy both these features without requiring anything extra.
What needs improvement?
I'm struggling a little because I wanted to do some POC solutions. I present a lot of projects in various forums and seminars and there aren't a lot of credits and trial options with Databricks. Even if we want to explore, we're not able to and that's a challenge. The solution is quite expensive.
For how long have I used the solution?
I've been using this solution for a year.
What do I think about the stability of the solution?
It's currently stable although we have not yet tested it with a huge volume of data. We'll focus on the performance and model serving capability in the near future. We're still carrying out performance testing, developing the models and figuring out the infrastructure.
What do I think about the scalability of the solution?
Scalability is quite good because we just used 128 GB of resources. It's quite easy to scale.
How was the initial setup?
It was relatively simple, we didn't face any challenges. Deployment takes around two days.
Which other solutions did I evaluate?
We did a PSU in Azure ML Studio which is quite a good solution, easy to deploy and use. It's almost a no-code platform. We've also found Azure ML Studio to be quite cost-effective.
What other advice do I have?
I would recommend trying Databricks because it's cloud agnostic. A lot of customers currently use Azure but want to build something on their own down the track. Databricks makes that easy with its integration with other cloud customers. If somebody wants to build something on their infrastructure or their own virtual cloud, this is a good platform.
I rate the solution eight out of 10 because of the issue I'm having with a lack of trial options.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Databricks
November 2024
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
816,406 professionals have used our research since 2012.
Enterprise Data Architect at a financial services firm with 51-200 employees
Assists with quickly computing a considerable amount of historical data and helps us with data ingestion
Pros and Cons
- "Its lightweight and fast processing are valuable."
- "The Databricks cluster can be improved."
What is our primary use case?
Our primary use case for this solution is for data ingestion and the DQ rules we are implementing. We deploy the solution on Azure cloud.
How has it helped my organization?
Whenever we send data to downstream applications for creating a file, multiple business rules are involved, and this solution assists with quickly computing a considerable amount of historical data.
What is most valuable?
Its lightweight and fast processing are valuable.
What needs improvement?
The product could include some UI features to improve the ease of use, like drag and drop for a few aggregated functions. Additionally, the Databricks cluster can be improved.
For how long have I used the solution?
We have been using Databricks for approximately two years and are currently using the latest version.
What do I think about the stability of the solution?
The solution is very stable. However, sometimes it intermittently restarts. I rate the stability an eight out of ten.
What do I think about the scalability of the solution?
The solution is scalable, and we are trying to implement more use cases with Databricks in our organization as we advance. I rate the scalability an eight out of ten.
How are customer service and support?
I rate customer service and support a nine out of ten.
How would you rate customer service and support?
Positive
How was the initial setup?
The initial setup was not very complex. We deploy the solution manually and the time required depends on the complexity of the business logic. I rate it an eight out of ten.
What about the implementation team?
We implemented the solution through an in-house team.
What other advice do I have?
I rate the solution an eight out of ten.
Disclosure: My company has a business relationship with this vendor other than being a customer: Gold Partners
Data Engineer Analyst at Metyis
Highly scalable, easy to use, and performs well
Pros and Cons
- "The most valuable feature of Databricks is the notebook, data factory, and ease of use."
- "When I used the support, I had communication problems because of the language barrier with the agent. The accent was difficult to understand."
What is our primary use case?
I am using Databricks in my company.
What is most valuable?
The most valuable feature of Databricks is the notebook, data factory, and ease of use.
For how long have I used the solution?
I have been using Databricks for approximately nine months.
What do I think about the stability of the solution?
The performance and stability of Databricks are good. It is quick and I have not had problems.
What do I think about the scalability of the solution?
Databricks is highly scalable.
We have 200 people using the solution in my organization.
How are customer service and support?
When I used the support, I had communication problems because of the language barrier with the agent. The accent was difficult to understand.
Which solution did I use previously and why did I switch?
I have not worked with another solution prior to Databricks.
What's my experience with pricing, setup cost, and licensing?
The price of Databricks is reasonable compared to other solutions.
What other advice do I have?
I rate Databricks an eight out of ten.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Global Data Architecture and Data Science Director at FH
Flexible with support for several programming languages, good visualization and workload management functionality
Pros and Cons
- "Databricks gives you the flexibility of using several programming languages independently or in combination to build models."
- "Databricks requires writing code in Python or SQL, so if you're a good programmer then you can use Databricks."
What is our primary use case?
The primary use is for data management and managing workloads of data pipelines.
Databricks can also be used for data visualization, as well as to implement machine learning models. Machine learning development can be done using R, Python, and Spark programming.
What is most valuable?
Databricks gives you the flexibility of using several programming languages independently or in combination to build models.
The quick visualization of the data is very good.
The workload management functionality works well.
What needs improvement?
Databricks requires writing code in Python or SQL, so if you're a good programmer then you can use Databricks.
For how long have I used the solution?
I have been using Databricks since 2017. I am no longer using it personally, although my team is, and will continue to do so in the future.
What do I think about the stability of the solution?
Databricks is quite popular these days and it appears to be stable. I have not found any issues with stability.
What do I think about the scalability of the solution?
Databricks is scalable, regardless of which cloud provider is being used. It is supported on Microsoft Azure, AWS, and they have their own cloud as well.
For a small workload, Databricks may not be worth the costs. However, for larger workloads, Databricks is a very good solution.
In my previous organization, there were between 10 and 15 users.
How are customer service and technical support?
The technical support is handled by Microsoft partners and because we had premium support, it was easy to get. That said, I did not require any support.
Which solution did I use previously and why did I switch?
I have not used tools that are similar to Databricks for workload management, but Azure ADFv2, Google BigQuery, SAS are some the most powerful tools in this space, that I have used in the past. I have also heard of Dataiku and other tools but I have not used them. The only things that I have used are tools written in Python or scripting languages.
How was the initial setup?
There is no installation required.
What's my experience with pricing, setup cost, and licensing?
Databricks uses pay-per-use model, where you can use as much compute as you need. I think that the cost can be reduced, given that there are more users on the platform, although it is not as expensive as some other solutions like SAS.
What other advice do I have?
As we transition to the Azure cloud, I expect that we will be using Databricks for workloads.
This is a product that I recommend for those who want to scale and have a good budget. It is good for automating a data pipeline and managing workloads. My advice for anybody who is starting to use it is to take the proper training.
Overall, based on my uses, I think that this product is pretty good.
I would rate this solution an eight out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Lead Architect at Birlasoft IndiaLtd.
Data analytics platform that supports large volumes of data and related activities
Pros and Cons
- "This solution offers a lake house data concept that we have found exciting. We are able to have a large amount of data in a data lake and can manage all relational activities."
- "The connectivity with various BI tools could be improved, specifically the performance and real time integration."
What is most valuable?
This solution offers a lake house data concept that we have found exciting. We are able to have a large amount of data in a data lake and can manage all relational activities. All asset complaints properties are available and this is very useful to ensure the quality of all data.
What needs improvement?
The connectivity with various BI tools could be improved, specifically the performance and real time integration. There is also some improvement required in the semantic layers to manage the data match as well as the data warehouse features.
In a future release, we would like to have features to better manage all ML development activities.
For how long have I used the solution?
I have been using this solution for three years.
What do I think about the stability of the solution?
This is a stable solution, especially compared to other technology on the market.
What do I think about the scalability of the solution?
It is a scalable solution but this depends on the platform that is being used. If you use a cloud platform such as Azure, it offers scalability. However, some platforms will not support scalability using Databricks.
We have around 20 users in our development team using Databricks.
How are customer service and support?
The customer service and support for this solution is good.
How would you rate customer service and support?
Positive
How was the initial setup?
The initial setup is pretty simple and requires minimal configuration compared to other technology.
What's my experience with pricing, setup cost, and licensing?
I would rate the pricing for this solution a four out of five. This does depend on the environment or the infrastructure that one is using. There is a difference in pricing between using Azure or being on-premises.
Which other solutions did I evaluate?
Azure Synapse is a competitor that we evaluated but it is not mature enough to provide better performance than Databricks. We choose Databricks due to the ability to have a lot of data in Data Lakes and the Data Warehouse. We are also able to run data science activities using ML flow.
What other advice do I have?
If you are looking for custom model development and a lot of data management in a cloud agnostic manner, then Databricks is a good solution.
I would rate this solution an eight out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Pre-sale Leader, Big Data Enterprise Solutions at Ness Technologies
Easy to load and query data with SQL support, but it is difficult to deploy and the interface could be improved
Pros and Cons
- "The most valuable feature is the ability to use SQL directly with Databricks."
- "I have seen better user interfaces, so that is something that can be improved."
What is our primary use case?
My division works with Big Data and Data Science, and Databricks is one of the tools for Big Data that we work with. We are partners with Microsoft and we began working with this solution for one specific project in the financial industry.
What is most valuable?
The most valuable feature is the ability to use SQL directly with Databricks. That is the most relevant thing for my current project.
After deployment, it is easy to load files and query data.
What needs improvement?
I have seen better user interfaces, so that is something that can be improved.
It was quite hard to deploy.
For how long have I used the solution?
I have been using Databricks for about one year.
What do I think about the stability of the solution?
We have not found any bugs yet, although it is only the beginning of our work. I do not have enough information to say for sure.
What do I think about the scalability of the solution?
We have about 200 employees but it is only a small group using Databricks. We are at the beginning so scaling is not something we have had to do.
How are customer service and technical support?
We have not had to contact technical support because we are Microsoft partners and I am calling a colleague of mine who is helping me directly.
Which solution did I use previously and why did I switch?
I have used Snowflake and one of the differences is that Snowflake is much easier to deploy.
How was the initial setup?
The first deployment is difficult. It is not straightforward and you have to think about a lot of stuff. It is not really like a SaaS deployment and there are a lot of steps that you have to take.
What about the implementation team?
We have our own team, which includes colleagues from Microsoft. Because the current project is a large client, they would like to see this project succeed.
What's my experience with pricing, setup cost, and licensing?
We find Databricks to be very expensive, although this improved when we found out how to shut it down at night.
What other advice do I have?
Our client is a bank and some of the information can be shared outside of the organization, whereas some of the data is confidential and private. Using a purely on-premises solution would have made it more difficult to share information with the outside, which is one of the reasons that they wanted a cloud-based deployment.
My advice for anybody who is considering this solution is that it is very good for unstructured or semi-structured data. If, however, you have structured data then I would recommend a columnar database like Snowflake or Vertica. These solutions are easier to deploy.
This is a good solution that is working well, but I don't think that it is really a SaaS.
I would rate this solution a seven out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: My company has a business relationship with this vendor other than being a customer: Implementer
Practice Head, Data & Analytics at a tech vendor with 10,001+ employees
Key feature is ability to make changes in structure or data size and align for subsequent consumption
Pros and Cons
- "Can cut across the entire ecosystem of open source technology to give an extra level of getting the transformatory process of the data."
- "Implementation of Databricks is still very code heavy."
What is our primary use case?
We have a team that works on Databricks for our clients. We are customers of Databricks.
What is most valuable?
Databricks can cut across the entire ecosystem of open source technology which gives an extra level in terms of getting the transformatory process of the data. The solution is primarily open source and they have bolstered its components to make it more fit for purpose for a complete Azure Data platform. The solution is responsible for the core transformatory activities. While Azure Data Factory is very good for pulling in the data, doing the basic standardization and profiling, Databricks is more about making fundamental changes in structure or in size of the data and aligning it for subsequent consumption, or for the final layer on Synapse. It also has the power to complement and work with Spark and elements related to Python.
What needs improvement?
In my view, the fundamental approach of implementing Databricks is still very code heavy, more than you find in Azure Data Factory and other technologies like Informatica or SQL Server Integration Service. From my perspective, that could be improved. I'd also like to have the ability to facilitate predictive analytics within the solution.
For how long have I used the solution?
I've been using the solution for a year and a half.
What do I think about the stability of the solution?
Stability of the product is good, whether it's handling large volumes, diverse elements of data or processing data at speed. It has stood the test of time. It's a solution that really lends itself to that higher level of stability, versatility and diversity in terms of its capability to process different forms of data.
What's my experience with pricing, setup cost, and licensing?
The cost of the solution is slightly on the high side so it's important to use it efficiently.
What other advice do I have?
Use the solution wisely and in tandem with Azure Data Factory. Apply the prism in your overall design of the pipelines of the flow, to utilize to its potential. Databricks offers significant capability to the transformatory and data tranching capabilities in terms of diverse variety to Azure Data Stack per se. In terms of the license, ensure that the customer is getting what they paid for so that the value for money is realized.
I rate the solution eight out of 10.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros
sharing their opinions.
Updated: November 2024
Popular Comparisons
Microsoft Azure Machine Learning Studio
KNIME
Alteryx
Amazon SageMaker
Dataiku
RapidMiner
IBM SPSS Statistics
Dremio
IBM Watson Studio
IBM SPSS Modeler
Anaconda
Domino Data Science Platform
Starburst Enterprise
H2O.ai
Cloudera Data Science Workbench
Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Which do you prefer - Databricks or Azure Machine Learning Studio?
- How would you compare Databricks vs Amazon SageMaker?
- Which would you choose - Databricks or Azure Stream Analytics?
- Which product would you choose for a data science team: Databricks vs Dataiku?
- Which are the best end-to-end data science platforms?
- What enterprise data analytics platform has the most powerful data visualization capabilities?
- What Data Science Platform is best suited to a large-scale enterprise?
- How can ML platforms be used to improve business processes?
- When evaluating Data Science Platforms, what aspect do you think is the most important to look for?