My division works with Big Data and Data Science, and Databricks is one of the tools for Big Data that we work with. We are partners with Microsoft and we began working with this solution for one specific project in the financial industry.
Pre-sale Leader, Big Data Enterprise Solutions at Ness Technologies
Easy to load and query data with SQL support, but it is difficult to deploy and the interface could be improved
Pros and Cons
- "The most valuable feature is the ability to use SQL directly with Databricks."
- "I have seen better user interfaces, so that is something that can be improved."
What is our primary use case?
What is most valuable?
The most valuable feature is the ability to use SQL directly with Databricks. That is the most relevant thing for my current project.
After deployment, it is easy to load files and query data.
What needs improvement?
I have seen better user interfaces, so that is something that can be improved.
It was quite hard to deploy.
For how long have I used the solution?
I have been using Databricks for about one year.
Buyer's Guide
Databricks
February 2025

Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: February 2025.
838,713 professionals have used our research since 2012.
What do I think about the stability of the solution?
We have not found any bugs yet, although it is only the beginning of our work. I do not have enough information to say for sure.
What do I think about the scalability of the solution?
We have about 200 employees but it is only a small group using Databricks. We are at the beginning so scaling is not something we have had to do.
How are customer service and support?
We have not had to contact technical support because we are Microsoft partners and I am calling a colleague of mine who is helping me directly.
Which solution did I use previously and why did I switch?
I have used Snowflake and one of the differences is that Snowflake is much easier to deploy.
How was the initial setup?
The first deployment is difficult. It is not straightforward and you have to think about a lot of stuff. It is not really like a SaaS deployment and there are a lot of steps that you have to take.
What about the implementation team?
We have our own team, which includes colleagues from Microsoft. Because the current project is a large client, they would like to see this project succeed.
What's my experience with pricing, setup cost, and licensing?
We find Databricks to be very expensive, although this improved when we found out how to shut it down at night.
What other advice do I have?
Our client is a bank and some of the information can be shared outside of the organization, whereas some of the data is confidential and private. Using a purely on-premises solution would have made it more difficult to share information with the outside, which is one of the reasons that they wanted a cloud-based deployment.
My advice for anybody who is considering this solution is that it is very good for unstructured or semi-structured data. If, however, you have structured data then I would recommend a columnar database like Snowflake or Vertica. These solutions are easier to deploy.
This is a good solution that is working well, but I don't think that it is really a SaaS.
I would rate this solution a seven out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: My company has a business relationship with this vendor other than being a customer: Implementer

Data Scientist at a energy/utilities company with 10,001+ employees
Has a good feature set but it needs samples and templates to help invite users to see results
Pros and Cons
- "Imageflow is a visual tool that helps make it easier for business people to understand complex workflows."
- "The product needs samples and templates to help invite users to see results and understand what the product can do."
What is our primary use case?
I am a data scientist here and that is my official role. I own the company. Our team is quite small at this point. We have around five people on the team and we are working with about five different businesses. The projects we get from them are massive undertakings. Each of us on the team takes multiple roles in our company and we use multiple tools to help best serve our clients. We are trying to look at creative ways that different solutions can be integrated and we try to understand what products we can use to create solutions for client companies that will be effective in meeting their needs.
We are personally using Databricks for certain projects where we want to consider creating intelligent solutions. I have been working on Databricks as part of my role in this company, trying to see if there are any kind of standard products that we can use with it to create solutions. We know that Databricks integrates with Airflow, so that is something that we are exploring right now as a potential solution for enabling a creative response. We are exploring the cloud as an option. Databricks is available in Azure and we are currently figuring out the viability of using that as a cloud platform. So we are exploring the way Databricks and Azure integrate at the same time to give us this type of flexibility.
What we use it for right now is more like asset management. If we have a lot of assets and we get a lot of real-time data, we certainly want to do some processing on some of this data, but you do not want to have to work on all of it in real-time. That is why we use Databricks. We push the data from Azure through Databricks and work on the data algorithm in Databricks and execute it from Azure with probably an RPA (Robotic Process Automation) or something of that sort. It intelligently offloads real-time processing.
What is most valuable?
Of the available feature set, I like the Imageflow feature a lot. It is very interesting. It gives me clarity on the execution of a process. I can draw the complete flow from start to finish in the exact way that I want it to execute. It is more visual and it is also easier for the people in businesses where I make presentations to understand.
When I demonstrate a process to a business and show them the approach I am taking using code and technical language, then of course not many are going to understand that. But when I show them the process in terms of the graphical layout Imageflow helps provide, then they will be able to understand it much easier. They understand why I am choosing a particular way of executing the process and why I am taking certain steps in the way I have chosen to do it. The point is to help other people understand the solution more clearly.
What needs improvement?
I think the automatic categorization of variables needs to be improved. The current functionality is not always efficiently identifying the features of the data that is collected. Probably that is the only thing I can think of. Apart from that, I have not explored the product enough yet to go into more depth because there is only one asset project that I have taken on right now. Because I own this company, I have been doing more to run it than to explore this product very deeply. But when you get any form of data inside there, if it could understand what type of variables there are and what features the data has, it would help massively in taking processing to the next step. If it does not exactly identify the variables you may have to modify them a little. Apart from working with Databricks to understand its capabilities, I am also trying to learn Apache Spark right now. Some members of my team want to work with Apache Spark as a solution and at this point, we are evaluating both and we are planning to use Spark or Databricks.
As far as what might be added, some custom algorithm samples would be useful. All of the other products of this type — Azure, AWS, SageMaker — they all have customizable algorithms. You have the capability to implement a sort of workflow from that by modifying things in the sample and changing it to fit your purposes. Probably that is something that might help in doing some small NDP (Near-Data Processing) development. It might not help in the project directly, but it will help while we work on some NDP development of our own so that we can quickly evaluate how something is going to work. Templates or other samples could make working on things easier.
That would also help massively in getting people to understand the potential of what the product can actually do. But I also think not many people would strongly agree with this. Many people go to the first solution they can think of that they know very well already in the IT field even if they could imagine that something could be better.
To get the value out of this technology, people will need to come to accept it. Technical people will accept Databricks more if they understand that this is something that they can use and start working on without a lot of experience. Adopting it will take time for new users who have no experience. But to feel like they can have success with a product, they have to execute something in a very short time and see how it can work. When you talk about AI — or really when you talk about anything new — people do not initially want to invest the time in discovery. These processes do take time to learn, but with templates or samples, you get to see immediately what the possibilities are and what you might get out of it. Then when they try something of their own and are able to get it working in less than a week's time, they will be encouraged to look into the product and the technology some more.
For how long have I used the solution?
We have been using the Databricks product for approximately three months.
What do I think about the stability of the solution?
It is very hard to comment on the stability right now. We will need more time to experience the product in actual usage to render any opinions about stability accurately at that level.
What do I think about the scalability of the solution?
We have not really gotten to the point of scaling and testing scalability at this point. We only have two people involved with the product. One is a data scientist and one is a data engineer.
How was the initial setup?
The initial setup was not complex at all. The documentation is good. It is clear and not very difficult to understand. Because the documentation is good, the installation is fine.
We did the implementation by ourselves — within our team and with the help of the documentation. But I would not say that we have already deployed the model yet. This is an ongoing process, as there are certain inputs that changed over time.
So we have not implemented the product completely, but we have gotten to advance with the product and our understanding of it. It is good, but our company is still trying to get much better data from it. At this point, it is like the data is just junk and more junk. So we are now working toward that goal of improving the result. Whenever the data result gets better, we'll try to implement the workflow to see how it performs. I would say it will probably take two to three months more before we actually get good data.
Which other solutions did I evaluate?
I did have some experience with SageMaker before looking at Databricks, but apart from we have not been looking into any of the other solutions that are available. We were just exploring a few of the different solutions that the members of the team already have experience with. Most of the team came to our company with some experience using Azure, and most of them came with experience in EBS (Elastic Block Store) and some of them come with experience on various other platforms. We wanted to mine that knowledge and just explore some of these possibilities to see which one works with all of us as a team.
What other advice do I have?
On a scale from one to ten where one is the worst and ten is the best, I would rate Databricks overall as around a 7 or 7.5. If we had more experience with it and could be sure we had a solid understanding of what it could do and the reliability, I might recommend it with a better score. I do not think I should give it more than a seven for now.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Databricks
February 2025

Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: February 2025.
838,713 professionals have used our research since 2012.
Lead Analytics at a manufacturing company with 10,001+ employees
Useful machine learning and easy to scale
Pros and Cons
- "In the manufacturing industry, Databricks can be beneficial to use because of machine learning. It is useful for tasks, such as product analysis or predictive maintenance."
- "The stability of the clusters or the instances of Databricks would be better if it was a much more stable environment. We've had issues with crashes."
What is our primary use case?
Our team is currently utilizing machine learning for various applications, and a few members are also exploring Databrick's use for ML operations.
What is most valuable?
In the manufacturing industry, Databricks can be beneficial to use because of machine learning. It is useful for tasks, such as product analysis or predictive maintenance.
For how long have I used the solution?
I have been using Databricks for approximately six months
What do I think about the stability of the solution?
The stability of the clusters or the instances of Databricks would be better if it was a much more stable environment. We've had issues with crashes.
What do I think about the scalability of the solution?
The scalability of Databricks is good as long as you have a data lake, and it's easy to scale.
We have approximately 50 users using this solution in my company.
How are customer service and support?
We have a different team who handles the support. I do not have contact with Databricks support.
Which solution did I use previously and why did I switch?
I have not used a similar solution to Databricks.
What was our ROI?
I have seen an ROI using Databricks.
What's my experience with pricing, setup cost, and licensing?
I rate the price of Databricks as eight out of ten.
What other advice do I have?
Having a good understanding of physical security in relation to cybersecurity in an OT (Operational Technology) environment would be beneficial, and utilizing an existing data lake prior to implementing a Databricks initiative would greatly aid in its success.
I rate Databricks an eight out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Lead Architect at Birlasoft IndiaLtd.
Data analytics platform that supports large volumes of data and related activities
Pros and Cons
- "This solution offers a lake house data concept that we have found exciting. We are able to have a large amount of data in a data lake and can manage all relational activities."
- "The connectivity with various BI tools could be improved, specifically the performance and real time integration."
What is most valuable?
This solution offers a lake house data concept that we have found exciting. We are able to have a large amount of data in a data lake and can manage all relational activities. All asset complaints properties are available and this is very useful to ensure the quality of all data.
What needs improvement?
The connectivity with various BI tools could be improved, specifically the performance and real time integration. There is also some improvement required in the semantic layers to manage the data match as well as the data warehouse features.
In a future release, we would like to have features to better manage all ML development activities.
For how long have I used the solution?
I have been using this solution for three years.
What do I think about the stability of the solution?
This is a stable solution, especially compared to other technology on the market.
What do I think about the scalability of the solution?
It is a scalable solution but this depends on the platform that is being used. If you use a cloud platform such as Azure, it offers scalability. However, some platforms will not support scalability using Databricks.
We have around 20 users in our development team using Databricks.
How are customer service and support?
The customer service and support for this solution is good.
How would you rate customer service and support?
Positive
How was the initial setup?
The initial setup is pretty simple and requires minimal configuration compared to other technology.
What's my experience with pricing, setup cost, and licensing?
I would rate the pricing for this solution a four out of five. This does depend on the environment or the infrastructure that one is using. There is a difference in pricing between using Azure or being on-premises.
Which other solutions did I evaluate?
Azure Synapse is a competitor that we evaluated but it is not mature enough to provide better performance than Databricks. We choose Databricks due to the ability to have a lot of data in Data Lakes and the Data Warehouse. We are also able to run data science activities using ML flow.
What other advice do I have?
If you are looking for custom model development and a lot of data management in a cloud agnostic manner, then Databricks is a good solution.
I would rate this solution an eight out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Owner at a marketing services firm with 1-10 employees
The data governance has been absolutely efficient in between other kinds of solutions
Pros and Cons
- "Databricks' Lakehouse architecture has been most useful for us. The data governance has been absolutely efficient in between other kinds of solutions."
- "I would like it if Databricks made it easier to set up a project."
What is our primary use case?
We use Databricks for video streaming and security purposes.
What is most valuable?
Databricks' Lakehouse architecture has been most useful for us. The data governance has been absolutely efficient in between other kinds of solutions.
What needs improvement?
I would like it if Databricks made it easier to set up a project. The use case determines which services we are going to use. You have the application engine, and you generate a potential budget for your workloads, so you can understand what you are going to do, what you are going to use, and what you will invest in.
Because I'm deploying on the Google Cloud Platform, measuring the investment, value, and use case is extremely difficult. So I leave it and move on without the risk. It would be easier if I had one page where you can see three columns: one for the use cases of a specific architecture, a second one for the prices based on the volume of data or machine time, and the third column for the budget. That would make it easier to know if I am using the appropriate architecture for the right solution.
I have seen something like that in Microsoft Azure, but obviously Microsoft Azure costs a lot of money. Amazon has something like that, but it's very complicated to use.
For how long have I used the solution?
We've been using Databricks for about five years.
What do I think about the stability of the solution?
Databricks is very stable and powerful.
What do I think about the scalability of the solution?
It was simple to make Databricks scalable. We found that we could set up an alert to tell us if we needed more resources, money, or time from our team. We're alerted when the system detects some trigger for any use of the instance. If you have another alert from your side, that would be extremely useful because it takes a lot of time to develop that kind of trigger.
How are customer service and support?
Databricks technical support was lovely. We don't need it so much, but the few questions we had were answered immediately.
How was the initial setup?
I am not a data engineer because I just started data science at the company, but it was straightforward and clear for the architect to set up. He provided me with that idea because he realized it would take time if we had use cases. You can select and change the data or add some modules or products. You have all the technology to do so.
What other advice do I have?
I rate Databricks eight out of 10. I like to move my customers into Databricks, but I take care of the internal system infrastructure so they can continue to use familiar software or operating systems and databases. They have a lot of doubts because they don't know the solution. We need to train them, explain things, and show the solution's potential value.
Generally, companies try to keep the same flavor when they migrate. For example, if they are using many Microsoft products, they want to work with Azure. If they are open to other options, they go with GCP or AWS. However, Databricks doesn't have enough customers here in my market because it's not a visible brand. Azure, GCP, and AWS are highly visible here, so the local teams are friendly with the three brands.
Which deployment model are you using for this solution?
Private Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Other
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Cloud & Infra Security, Group Manager at a tech vendor with 10,001+ employees
A scalable solution to quickly process and analyze streams of information
Pros and Cons
- "Databricks helps crunch petabytes of data in a very short period of time."
- "Costs can quickly add up if you don't plan for it."
What is our primary use case?
We are working with Databricks and SMLS in the financial sector for big data and analytics. There are a number of business cases for analysis related to debt there. Several clients are working with it, analyzing data collected over a period of time and planning the next steps in multiple business divisions.
My organization is a professional consulting service. We provide services for the other organizations, which implement and use them in a production environment. We manage, implement, and upgrade those services, but we don't use them.
What is most valuable?
Databricks helps crunch petabytes of data in a very short period of time for data scientists or business analysts. It helps with fraud analysis, finance, projections, etc. I like it.
This is exactly the purpose of big data and analytics. It provides the mechanism to process and analyze a stream of information. It's best for share analysis and stream analysis.
What needs improvement?
Costs can quickly add up if you don't plan for it.
For how long have I used the solution?
I've been using Databricks for just over a year.
What do I think about the stability of the solution?
Databricks is stable. It also helps that their support is included as part of the service.
What do I think about the scalability of the solution?
Databricks is scalable. The only issue is how much money you have for it. For example, if you need to run 100 servers, there's an eight-course with 256 gigabytes of RAM. You run out of money easily. It's charged to your credit card or your account, and you'll have to pay for it if you don't plan for that in advance.
How are customer service and technical support?
Databricks technical support is excellent. They provided their responses on time, and they're useful. However, I don't have extensive experience with them.
Which solution did I use previously and why did I switch?
I have used different Microsoft solutions before.
How was the initial setup?
The initial setup depends on the readiness of the team working with Databricks. There is no one template saying that it's easy, and it isn't easy. It can be complex to set up if you don't have a really good plan.
You can get in this environment at least for a test. You can do it in the lab, follow it step by step, and that'll take about an hour. The difficulty depends on the business requirements.
If it's for training purposes, you can do it in about half an hour, and you're good to go. If you need it to support a business, it will be much more rigorous because multiple divisions would be interested in running their own environment, working with their data.
What's my experience with pricing, setup cost, and licensing?
The price is okay. It's competitive.
What other advice do I have?
If you're thinking of implementing Databricks, I would recommend working with professionals. It'll help you save time. Also, plan the work and work the plan. Otherwise, it'll be a waste of time and money.
On a scale from one to ten, I would give Databricks a nine.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Associate Manager at a consultancy with 501-1,000 employees
Efficient, high data volume processing, and easy to use
Pros and Cons
- "The main features of the solution are efficiency."
- "There should be better integration with other platforms."
What is our primary use case?
We use this solution to process data, for example, data validation.
What is most valuable?
The main features of the solution are efficiency.
We were trying to process 300 million records over 10 years. If you are processing that high number of records through the ADF pipeline with, for example, Azure, it took approximately six hours. In order to reduce the burden on our ADF pipeline, we wrote a simple code in this solution where we can read and write to the file into the temporary Storage Explorer. By going through this solution, we were able to complete the processing of the data in half an hour.
The technology that allows us to write scripts within the solution is extremely beneficial. If I was, for example, able to script in SQL, R, Scala, Apache Spark, or Python, I would be able to use my knowledge to make a script in this solution. It is very user-friendly and you can also process the records and validation point of view.
The ability to migrate from one environment to another is useful.
What needs improvement?
There should be better integration with other platforms.
For how long have I used the solution?
I have been using this solution for two years.
What do I think about the scalability of the solution?
I have approximately 20 users using this solution in my organization. We have plans to increase our usage in the future.
How was the initial setup?
There is no installation required. It is easy to use, for example, in Azure it is available, you subscribe, and use it.
What's my experience with pricing, setup cost, and licensing?
The solution uses a pay-per-use model with an annual subscription fee or package. Typically this solution is used on a cloud platform, such as Azure or AWS, but more people are choosing Azure because the price is more reasonable.
What other advice do I have?
I rate Databricks a nine out of ten.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Lead Data Architect at a government with 1,001-5,000 employees
Good integration with majority of data sources through Databricks Notebooks using Python, Scala, SQL, R.
Pros and Cons
- "The initial setup is pretty easy."
- "Overall it's a good product, however, it doesn't do well against any individual best-of-breed products."
What is our primary use case?
We used Databricks in AWS on top of s3 buckets as data lake. The primary use case was providing consistent, ACID compliant data sets with full history and time series, that could be used for analytics.
How has it helped my organization?
Databricks (delta lake) and the underlying files storage (data lake) is in the centre of the organisation's enterprise data hub. Most of our data is structured (csv files), have some semi-structured (json files) but we are beginning to ingest unstructured (pdf files) and use Natural Language Processing (Textract) to obtain insights driven by key words.
What is most valuable?
The Databricks notebooks with SQL and Python provide good intuitive development environment. The Delta Lake, the reading of underlying file storage, the delta tables mounted on top of data lake (AWS in our case) are providing full ACID compliance, good connectivity and interoperability.
The initial setup is fairly straightforward. The stability is good.
What needs improvement?
The product is quite ambitious. It's trying to become a centralized platform for all data ingestion, transformation, and analytics needs. It may encounter a stiff competition from best of breed solutions powered by open source software.
Overall it's a good product, however, it might get challenged over time with with individual best-of-breed products.
For example in the area of Data Science, RStudio seems to be the industry standard at the moment. RStudio IDE is richer, there are a more out of the box functionalities like a push-button publishing, etc. It's more difficult to run R within Databricks. Especially when it comes to synchronizing the R packages, it legs behind. It's not even supporting the latest version of R 1.3. I believe eventually all analytics will converge into data science. The analytics of the future will be data science, because predicting the future will be one of the most prevalent use cases. The stuff we used to do before, slicing and dicing, drilling through, trend analysis, etc. will become redundant operations after the analytics toolsets become powered by AI/ML and fully automated. Unless the organisations acquire these platforms that can cater for machine learning and artificial intelligence, including natural language processing they will have a hard time surviving.
With Databricks I would like to see more integration with and accommodation of open-source products. This could be controversial, as it could question the whole configuration and the purpose of the product. I'm pretty sure Microsoft is trying to position it in a monopoly market as they did with Windows and MS Office so that they could charge the premium. We are beginning to see the similar product strategy behind Databricks.
For how long have I used the solution?
I've been working with Databricks for about two years.
What do I think about the stability of the solution?
From what I know and from what I've heard, talking to our data operations team, it is stable and it's quite powerful.
What do I think about the scalability of the solution?
Obviously running on top of Spark, ensures fast processing and elasticity for coping with big data volumes, up to 2 petabytes. You can spin up the cluster very quickly, and shut it down. It's elastic.
How are customer service and technical support?
Excellent customer service from Databricks. Very proactive, constantly attuned to customer needs, even connected us with other customers for knowledge exchange.
Which solution did I use previously and why did I switch?
I am an IT Consultant and in the past have used different solutions for ETL on top of databases, particularly if we are talking about data warehousing. However, in the last 6 years I have seen large client using a mixture of open source and proprietory technologies, like Informatica stack with data lake in AWS, or Kafka Confluence with MQ Series on top of mainframes and data lake in AWS, Databricks and Azure data lake, etc.
How was the initial setup?
It was pretty easy to set up. At least, that is my understanding. I'm not the data engineer though. I don't actually do installs and configurations. I explore features and build them in my architecture designs.
What about the implementation team?
We implemented Databricks through vendor, and the vendor was pretty good.
What was our ROI?
Don't really know.
What's my experience with pricing, setup cost, and licensing?
I can't speak on pricing of the solution. It's not an aspect of the solution I deal with directly.
Which other solutions did I evaluate?
The options were Talend, EMC Isilon, native AWS services, and others.
What other advice do I have?
In the current capacity as and Architect and the end user of Databricks I would say I do have confidence that Databricks can provide a wealth of functionalities to start with.
My advice to future adopters of Databricks would be to be careful about the overall architectural roadmap for this application, adopt a flexible, modular, microservices like architecture whose components could be replaced in the future should they deem inadequate to cater for evolving business needs.
Which deployment model are you using for this solution?
Hybrid Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.

Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros
sharing their opinions.
Updated: February 2025
Popular Comparisons
Teradata
Dremio
Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Which do you prefer - Databricks or Azure Machine Learning Studio?
- How would you compare Databricks vs Amazon SageMaker?
- Which would you choose - Databricks or Azure Stream Analytics?
- Which product would you choose for a data science team: Databricks vs Dataiku?
- Which ETL or Data Integration tool goes the best with Amazon Redshift?
- What are the main differences between Data Lake and Data Warehouse?
- What are the benefits of having separate layers or a dedicated schema for each layer in ETL?
- What are the key reasons for choosing Snowflake as a data lake over other data lake solutions?
- Are there any general guidelines to allocate table space quota to different layers in ETL?
- What cloud data warehouse solution do you recommend?