We use this solution for streaming analytics. We use machine learning functions that output to the API and work directly with the database.
Data Science Developer at a tech services company with 501-1,000 employees
Good performance and support for big data, built-in machine learning libraries are powerful
Pros and Cons
- "Databricks is based on a Spark cluster and it is fast. Performance-wise, it is great."
- "It should have more compatible and more advanced visualization and machine learning libraries."
What is our primary use case?
How has it helped my organization?
Prior to using Azure Databricks in the cloud, we had Databricks installed in clusters. Since our implementation, the performance has increased and our cost has been reduced.
What is most valuable?
Databricks is based on a Spark cluster and it is fast. Performance-wise, it is great.
This solution has very good machine learning libraries built-in.
The support for big data is good.
What needs improvement?
Databricks should have more libraries for predictive analysis and machine learning.
It should have more compatible and more advanced visualization and machine learning libraries. As it is now, I have to try a customer algorithm in order for things to be compatible.
I would like to see more deep learning analytics.
Buyer's Guide
Databricks
November 2024
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
816,406 professionals have used our research since 2012.
For how long have I used the solution?
I have been using Databricks for about one year.
What do I think about the stability of the solution?
This is a cluster-based solution, so it is stable.
What do I think about the scalability of the solution?
We started using Databricks with a small PoC application, and then we developed it into a larger one. It's scalable, and it's a simple process to scale.
We have eight people in our team who are using this solution. We do not plan to increase usage at this time.
How are customer service and support?
I did not contact technical support myself, but when one of our team members contacted them they were given good answers. I would say that the support is good.
How was the initial setup?
It is not difficult to deploy this solution because it is well documented. We followed the normal steps that included all of the APIs.
What's my experience with pricing, setup cost, and licensing?
I do not exactly know the costs, but one of our clients pays between $100 USD and $200 USD monthly.
What other advice do I have?
Databricks has been good and I like it. However, it would be improved with the enhancement of the machine learning libraries, and with the inclusion of visualization libraries.
I would rate this solution an eight out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Advanced Analytics Lead at a pharma/biotech company with 1,001-5,000 employees
Better tailored code and automation capabilities needed, but easy to use
Pros and Cons
- "The solution is easy to use and has a quick start-up time due to being on the cloud."
- "The solution could improve by providing better automation capabilities. For example, working together with more of a DevOps approach, such as continuous integration."
What is our primary use case?
Databricks can be used for large-scale data pre-processing and data transformations.
What is most valuable?
The solution is easy to use and has a quick start-up time due to being on the cloud.
What needs improvement?
The solution could improve by providing better automation capabilities. For example, working together with more of a DevOps approach, such as continuous integration. There is a lot of code from places, such as GitHub, but it is not tailored for Databricks. It requires a lot of effort to bring the code to a level where it can be used with Databricks capabilities.
For how long have I used the solution?
I have been using Databricks for two months.
What do I think about the stability of the solution?
The solution is stable.
What do I think about the scalability of the solution?
Databricks is scalable.
How are customer service and technical support?
We did not have a need to use technical support.
How was the initial setup?
The installation is straightforward, and it took approximately one hour.
What about the implementation team?
We did the implementation and maintenance of the solution ourselves using approximately three engineers.
What's my experience with pricing, setup cost, and licensing?
The solution requires a subscription.
Which other solutions did I evaluate?
We are evaluating other solutions.
What other advice do I have?
I would recommend this solution for those wanting to process large data sets, but if it is to be used for smaller data sets, I would not recommend it.
I rate Databricks a five out of ten.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Databricks
November 2024
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
816,406 professionals have used our research since 2012.
Associate Manager at a consultancy with 501-1,000 employees
Efficient, high data volume processing, and easy to use
Pros and Cons
- "The main features of the solution are efficiency."
- "There should be better integration with other platforms."
What is our primary use case?
We use this solution to process data, for example, data validation.
What is most valuable?
The main features of the solution are efficiency.
We were trying to process 300 million records over 10 years. If you are processing that high number of records through the ADF pipeline with, for example, Azure, it took approximately six hours. In order to reduce the burden on our ADF pipeline, we wrote a simple code in this solution where we can read and write to the file into the temporary Storage Explorer. By going through this solution, we were able to complete the processing of the data in half an hour.
The technology that allows us to write scripts within the solution is extremely beneficial. If I was, for example, able to script in SQL, R, Scala, Apache Spark, or Python, I would be able to use my knowledge to make a script in this solution. It is very user-friendly and you can also process the records and validation point of view.
The ability to migrate from one environment to another is useful.
What needs improvement?
There should be better integration with other platforms.
For how long have I used the solution?
I have been using this solution for two years.
What do I think about the scalability of the solution?
I have approximately 20 users using this solution in my organization. We have plans to increase our usage in the future.
How was the initial setup?
There is no installation required. It is easy to use, for example, in Azure it is available, you subscribe, and use it.
What's my experience with pricing, setup cost, and licensing?
The solution uses a pay-per-use model with an annual subscription fee or package. Typically this solution is used on a cloud platform, such as Azure or AWS, but more people are choosing Azure because the price is more reasonable.
What other advice do I have?
I rate Databricks a nine out of ten.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Lead Data Architect at a government with 1,001-5,000 employees
Good integration with majority of data sources through Databricks Notebooks using Python, Scala, SQL, R.
Pros and Cons
- "The initial setup is pretty easy."
- "Overall it's a good product, however, it doesn't do well against any individual best-of-breed products."
What is our primary use case?
We used Databricks in AWS on top of s3 buckets as data lake. The primary use case was providing consistent, ACID compliant data sets with full history and time series, that could be used for analytics.
How has it helped my organization?
Databricks (delta lake) and the underlying files storage (data lake) is in the centre of the organisation's enterprise data hub. Most of our data is structured (csv files), have some semi-structured (json files) but we are beginning to ingest unstructured (pdf files) and use Natural Language Processing (Textract) to obtain insights driven by key words.
What is most valuable?
The Databricks notebooks with SQL and Python provide good intuitive development environment. The Delta Lake, the reading of underlying file storage, the delta tables mounted on top of data lake (AWS in our case) are providing full ACID compliance, good connectivity and interoperability.
The initial setup is fairly straightforward. The stability is good.
What needs improvement?
The product is quite ambitious. It's trying to become a centralized platform for all data ingestion, transformation, and analytics needs. It may encounter a stiff competition from best of breed solutions powered by open source software.
Overall it's a good product, however, it might get challenged over time with with individual best-of-breed products.
For example in the area of Data Science, RStudio seems to be the industry standard at the moment. RStudio IDE is richer, there are a more out of the box functionalities like a push-button publishing, etc. It's more difficult to run R within Databricks. Especially when it comes to synchronizing the R packages, it legs behind. It's not even supporting the latest version of R 1.3. I believe eventually all analytics will converge into data science. The analytics of the future will be data science, because predicting the future will be one of the most prevalent use cases. The stuff we used to do before, slicing and dicing, drilling through, trend analysis, etc. will become redundant operations after the analytics toolsets become powered by AI/ML and fully automated. Unless the organisations acquire these platforms that can cater for machine learning and artificial intelligence, including natural language processing they will have a hard time surviving.
With Databricks I would like to see more integration with and accommodation of open-source products. This could be controversial, as it could question the whole configuration and the purpose of the product. I'm pretty sure Microsoft is trying to position it in a monopoly market as they did with Windows and MS Office so that they could charge the premium. We are beginning to see the similar product strategy behind Databricks.
For how long have I used the solution?
I've been working with Databricks for about two years.
What do I think about the stability of the solution?
From what I know and from what I've heard, talking to our data operations team, it is stable and it's quite powerful.
What do I think about the scalability of the solution?
Obviously running on top of Spark, ensures fast processing and elasticity for coping with big data volumes, up to 2 petabytes. You can spin up the cluster very quickly, and shut it down. It's elastic.
How are customer service and technical support?
Excellent customer service from Databricks. Very proactive, constantly attuned to customer needs, even connected us with other customers for knowledge exchange.
Which solution did I use previously and why did I switch?
I am an IT Consultant and in the past have used different solutions for ETL on top of databases, particularly if we are talking about data warehousing. However, in the last 6 years I have seen large client using a mixture of open source and proprietory technologies, like Informatica stack with data lake in AWS, or Kafka Confluence with MQ Series on top of mainframes and data lake in AWS, Databricks and Azure data lake, etc.
How was the initial setup?
It was pretty easy to set up. At least, that is my understanding. I'm not the data engineer though. I don't actually do installs and configurations. I explore features and build them in my architecture designs.
What about the implementation team?
We implemented Databricks through vendor, and the vendor was pretty good.
What was our ROI?
Don't really know.
What's my experience with pricing, setup cost, and licensing?
I can't speak on pricing of the solution. It's not an aspect of the solution I deal with directly.
Which other solutions did I evaluate?
The options were Talend, EMC Isilon, native AWS services, and others.
What other advice do I have?
In the current capacity as and Architect and the end user of Databricks I would say I do have confidence that Databricks can provide a wealth of functionalities to start with.
My advice to future adopters of Databricks would be to be careful about the overall architectural roadmap for this application, adopt a flexible, modular, microservices like architecture whose components could be replaced in the future should they deem inadequate to cater for evolving business needs.
Which deployment model are you using for this solution?
Hybrid Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Business Intelligence and Analytics Consultant at a tech services company with 201-500 employees
Easy to switch loads between clusters and automation is easy using the API
Pros and Cons
- "Automation with Databricks is very easy when using the API."
- "Some of the error messages that we receive are too vague, saying things like "unknown exception", and these should be improved to make it easier for developers to debug problems."
What is our primary use case?
I am a developer and I do a lot of consulting using Databricks.
We have been primarily using this solution for ETL purposes. We also do some migration of on-premises data to the cloud.
What is most valuable?
The most valuable feature is the ability to switch loads between multiple clusters.
Automation with Databricks is very easy when using the API.
The ability to write code and SQL in the same interface is useful.
It is easy to connect notebooks to a cluster.
There are a large number of inbuilt functions that help to make things easier.
What needs improvement?
Some of the error messages that we receive are too vague, saying things like "unknown exception", and these should be improved to make it easier for developers to debug problems. As it is now, we have to go into the driver logs to identify the error messages properly.
There is not much information about Databricks available online, such as cost. Whenever we want to find the actual costing, we have to send an email to Databricks, so having the information available on the internet would be helpful.
I would like to see integration with Power BI or Tableau for the business users. They may use Databricks to check on things, but it will be a little bit complicated for them. The GUI interfaces for Tableau and Power BI are ones that they are used to, so the integration would help.
For how long have I used the solution?
I have been using Databricks for about five and a half years.
What do I think about the stability of the solution?
We have found that in the development environment, Databricks is pretty stable. We have had problems where something works in development but does not work in production, and this can happen when the version is updated and certain features have been deprecated. This means that more testing is required before moving to production, but this is the only drawback that we have seen.
Basically, when we move across version we have found issues, but otherwise, it's pretty stable.
What do I think about the scalability of the solution?
Scalability is one of the main features of Databricks. We have used datasets that are one hundred megabytes in size up to one terabyte, and we can manage, so it's easily scalable.
We have a large company with between 400 and 500 people using this solution.
How are customer service and technical support?
We have not reached out for technical support on Databricks.
How was the initial setup?
I found the initial setup easy because I had previously worked on Spark.
If somebody goes through the training, which is available on the website, then it should be straightforward. I don't think that it is very hard.
When it comes to developing things based on use cases, it can take between three days and two weeks, plus two to three days for testing and deploying it. I would say that for an entire use case, it will take a maximum of three weeks.
What other advice do I have?
My advice for developers who are interested in working with this solution is to first go through the Spark architecture.
I would rate this solution a nine out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Data Architect at a tech services company with 201-500 employees
A reliable solution for processing and transforming data
Pros and Cons
- "The fast data loading process and data storage capabilities are great."
- "There are no direct connectors — they are very limited."
What is our primary use case?
We specialize in project consulting for our clients. Whenever we get the opportunity, we recommend Databricks to them.
What is most valuable?
The fast data loading process and data storage capabilities are great.
Based on the data loads and the performance, you can easily scale up the clusters.
What needs improvement?
Sometimes we experience issues connecting our database to Databricks. There are no direct connectors — they are very limited. This should be addressed and corrected in the next release.
Reading past data can also be tricky as there is no data spectrum like you would find with Snowflake and other solutions.
For how long have I used the solution?
We have been using Databricks for one and a half years.
What do I think about the scalability of the solution?
Both the scalability and the stability of Databricks is good.
How are customer service and technical support?
Technical support is good but I have not interacted with them directly. We have a point of contact. We used to interact with tech support on a regular basis and they would respond quickly. We would get a response on the same day based on the priority level. Keep in mind, my company is in a partnership with them which could be a factor in their quick response time.
How was the initial setup?
The initial setup was not very complex. We had it up and running in no time; it's a quick process.
What about the implementation team?
We have just one solution architect and one data architect who handle all maintenance-related issues.
What other advice do I have?
I would recommend purchasing a package that includes technical support. Compared to other companies, they offer great support to their clients.
On a scale from one to ten, I would give Databricks a rating of eight.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Chief Data Scientist at Ngenux
Effective integration, helpful support, and simple cloud implementation
Pros and Cons
- "Databricks integrates well with other solutions."
- "Databricks doesn't offer the use of Python scripts by itself and is not connected to GitHub repositories or anything similar. This is something that is missing. if they could integrate with Git tools it would be an advantage."
What is our primary use case?
We use Databricks for experimentation. For example, we do ML model building and training that is connecting to our data which resides in Azure. It offers very good integration with Azure. We've deployed some of our model inference tools in Databricks.
What is most valuable?
Databricks integrates well with other solutions.
What needs improvement?
Databricks doesn't offer the use of Python scripts by itself and is not connected to GitHub repositories or anything similar. This is something that is missing. if they could integrate with Git tools it would be an advantage.
Along with having connections to different databases for Git tools, adding libraries for easy access would be a benefit. As data scientists, we connect to different databases and different sources of data, having a library would be useful.
For how long have I used the solution?
I have been using Databricks for approximately one year.
What do I think about the stability of the solution?
The solution is stable. We did not face any downtime.
What do I think about the scalability of the solution?
Databricks is scalable. It operates three times faster than any of the other ecosystems which we have experimented on.
We have approximately five data scientists using this solution in my organization. We are a small company and as we grow, all our data scientists would be using this platform. We plan to increase usage.
How are customer service and support?
The technical support is good. We didn't need a lot of support. There were a few times we needed some help on how to do certain operations.
How was the initial setup?
The installation was straightforward because it is on the cloud. The full deployment took approximately one week.
What about the implementation team?
We did the implementation of Databricks in-house. It only requires one person for the maintenance of the solution.
What other advice do I have?
My advice to others wanting to implement this solution is to use a cloud environment. For example, we are using Azure with Databricks. It is much better than doing an on-premise implementation.
I rate Databricks an eight out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Data Platform Architect at a tech services company with 51-200 employees
Provides seamless integration capabilities, but the cluster management features need improvement
Pros and Cons
- "Databricks is a robust solution for big data processing, offering flexibility and powerful features."
- "The product could be improved regarding the delay when switching to higher-performing virtual machines compared to other platforms."
What is our primary use case?
We use the product as a data science platform that enables me to handle and analyze large datasets efficiently.
What is most valuable?
Databricks can switch easily between cloud providers, such as Azure and GCP. It allows seamless integration with various data platforms and cloud providers, facilitating better data handling and analysis.
What needs improvement?
The product could be improved regarding the delay when switching to higher-performing virtual machines compared to other platforms like Snowflake. The ease and speed of managing clusters can also be enhanced, especially when scaling up resources. They could add more advanced data storage solutions like Iceberg and Delta files.
For how long have I used the solution?
I have been using Databricks for approximately two years.
What do I think about the stability of the solution?
I rate the product stability a seven out of ten.
What do I think about the scalability of the solution?
I rate the product scalability an eight.
How are customer service and support?
The technical support services are good.
How would you rate customer service and support?
Positive
How was the initial setup?
The initial setup was straightforward. However, configuring policies could have been simpler.
What's my experience with pricing, setup cost, and licensing?
The product pricing is moderate.
Which other solutions did I evaluate?
I evaluated other options, including Snowflake, before choosing Databricks.
What other advice do I have?
Databricks is a robust solution for big data processing, offering flexibility and powerful features. While there are areas for improvement, especially in performance and cluster management, it remains a highly valuable tool in my data science toolkit.
I rate it a seven.
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Last updated: Jul 16, 2024
Flag as inappropriateBuyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros
sharing their opinions.
Updated: November 2024
Popular Comparisons
Microsoft Azure Machine Learning Studio
KNIME
Alteryx
Amazon SageMaker
Dataiku
RapidMiner
IBM SPSS Statistics
Dremio
IBM Watson Studio
IBM SPSS Modeler
Anaconda
Domino Data Science Platform
Starburst Enterprise
H2O.ai
Cloudera Data Science Workbench
Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Which do you prefer - Databricks or Azure Machine Learning Studio?
- How would you compare Databricks vs Amazon SageMaker?
- Which would you choose - Databricks or Azure Stream Analytics?
- Which product would you choose for a data science team: Databricks vs Dataiku?
- Which are the best end-to-end data science platforms?
- What enterprise data analytics platform has the most powerful data visualization capabilities?
- What Data Science Platform is best suited to a large-scale enterprise?
- How can ML platforms be used to improve business processes?
- When evaluating Data Science Platforms, what aspect do you think is the most important to look for?