I primarily use the solution in two conditions: machine learning and big data computing.
Data engineer
A stable solution that can be scaled depending on the project, but the price could be cheaper
Pros and Cons
- "The setup was straightforward."
- "The pricing of Databricks could be cheaper."
What is our primary use case?
What needs improvement?
The pricing of Databricks could be cheaper. The solution can also improve by providing more intelligence to the coder.
For how long have I used the solution?
I have been using Databricks for the past two years.
What do I think about the stability of the solution?
The solution is stable. I would rate the stability a seven out of ten.
Buyer's Guide
Databricks
November 2024
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
816,406 professionals have used our research since 2012.
What do I think about the scalability of the solution?
The scalability depends on the project. At present, around 20 people use the solution in my company.
How are customer service and support?
How was the initial setup?
The setup was straightforward. It also depends on the projects.
What about the implementation team?
The deployment process was automated.
Which other solutions did I evaluate?
Evaluating solutions is not my work. I depend on Databricks.
What other advice do I have?
I rate Databricks a seven out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Director of Data (Engineering & Science) at a tech services company with 11-50 employees
An easy-to-use solution useful to run patch jobs
Pros and Cons
- "The ease of use and its accessibility are valuable."
- "The integration and query capabilities can be improved."
What is our primary use case?
Our primary use case for the solution is to run batch jobs.
What is most valuable?
The ease of use and its accessibility are valuable.
What needs improvement?
The solution can be improved by expanding its integration capabilities and providing the ability to query external vendors directly.
For how long have I used the solution?
We have been using the solution for a little less than a year, and we deploy it on the Amazon cloud.
What do I think about the stability of the solution?
The solution is stable.
What do I think about the scalability of the solution?
The solution is scalable, and there are approximately seven developers and two DevOps employees utilizing the solution.
How are customer service and support?
We have had a good experience with customer service and support. I rate them a nine out of ten.
How would you rate customer service and support?
Positive
How was the initial setup?
The initial setup for the solution is a bit complex.
What's my experience with pricing, setup cost, and licensing?
I wouldn't consider it a costly solution. Like all other solutions, it depends on how you use them. If you provision sparked clusters much larger than what you need, it becomes costly. For example, it is not more costly than EMR, the AWS equivalent, and from my perspective, it is much better.
What other advice do I have?
I rate the solution a nine out of ten. The solution is good, but the integration and query capabilities can be improved.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Databricks
November 2024
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
816,406 professionals have used our research since 2012.
Project Manager at MAQ Software
Integrates well, is scalable, and high availability
Pros and Cons
- "The most valuable feature of Databricks is the integration with Microsoft Azure."
- "Databricks can improve by making the documentation better."
What is our primary use case?
I am using Databricks for creating business intelligence solutions.
What is most valuable?
The most valuable feature of Databricks is the integration with Microsoft Azure.
What needs improvement?
Databricks can improve by making the documentation better.
For how long have I used the solution?
I have been using Databricks for approximately one year.
What do I think about the stability of the solution?
Databricks is stable.
What do I think about the scalability of the solution?
The scalability of Databricks is good.
We have approximately 500 users using this solution in my organization.
How are customer service and support?
I have not used the support from Databricks.
Which solution did I use previously and why did I switch?
We previously used Microsoft stacks. We chose Databricks because the processing power was better and it was a better fit for our use case.
How was the initial setup?
The initial setup of Databricks was not straightforward. We had to do trial and error and we learned as we went along.
I rate the initial setup of Databricks a four out of five.
What about the implementation team?
We did the implementation of Databricks in-house. The solution requires ongoing maintenance.
What other advice do I have?
I would recommend this solution to others.
My advice to others is for them to first do a small proof of concept and then see how it works out and then take it from there.
I rate Databricks an eight out of ten.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Co - Founder & Chief Data Officer -CDO at Data360
Allows us to automate the creation of a cluster, optimized for machine learning, and construct AI machine learning models for the client
Pros and Cons
- "Databricks allows me to automate the creation of a cluster, optimized for machine learning and construct AI machine learning models for the client."
- "There could be more support for automated machine learning in the database. I would like to see more ways to do analysis so that the reporting is more understandable."
What is our primary use case?
I use this for database machine learning, to construct different models for supermarkets, drug store management, and market involvement to identify business opportunities for clients.
We provide different statistical models and use different algorithms depending on the client.
I was a Lead Data Scientist in different companies. I implement data and build and optimize processes using machine learning techniques, aided by science and advanced analytics.
What is most valuable?
Databricks allows me to automate the creation of a cluster, optimized for machine learning and construct AI machine learning models for the client.
What needs improvement?
There could be more support for automated machine learning in the database. I would like to see more ways to do analysis so that the reporting is more understandable.
What do I think about the stability of the solution?
It's stable.
What do I think about the scalability of the solution?
It's scalable.
How are customer service and support?
I would rate technical support 4 out of 5.
How was the initial setup?
Setup isn't difficult. We used about 15 people for deployment and maintenance. We have data scientists and statisticians using this solution and doing different analyses.
What other advice do I have?
I would rate this solution 9 out of 10.
My advice is to use the different high analytics methodology, plan for the project, and recognize the different activities for the design.
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor. The reviewer's company has a business relationship with this vendor other than being a customer: Partner
Head of Data & Analytics at a tech services company with 11-50 employees
Helpful integration with Python and notebooks, but it should be more user-friendly and less complicated to use
Pros and Cons
- "The integration with Python and the notebooks really helps."
- "Databricks is not geared towards the end-user, but rather it is for data engineers or data scientists."
What is our primary use case?
We are a consulting house and we employ solutions based on our customers' needs. We don't generally use products internally.
I am a certified data engineer from Microsoft and have worked on the Azure platform, which is why I have experience with Databricks. Now that Microsoft has launched Synapse, I think that there will be more use cases.
What is most valuable?
You can spin up an Azure Databricks clustered, and integrating with it is seamless.
The integration with Python and the notebooks really helps.
What needs improvement?
There is definitely room for improvement.
This is the type of solution where you need to have people with technical expertise to use it. Other products are self-service and can be employed by end-users. Databricks is not geared towards the end-user, but rather it is for data engineers or data scientists. I'm not sure whether Databricks is working towards it, or not.
It would be nice if it were more user-friendly, where you don't have to rely on Power BI or a visualization tool. I know that there is integration in the notebook where you can do it, but still, the relationships and semantics make it more difficult. It would be better to do it right in Databricks. You could put them within the portal and I don't have to log out and bring that into Power BI and then visualize.
What do I think about the stability of the solution?
We have not done any major implementation yet, although I think it's stable to an extent. I can't comment on it in terms of benchmark and experiencing any issues. It works seamlessly in the places where I've used it.
What do I think about the scalability of the solution?
Our implementations have been small and we haven't needed to scale as of yet.
Databricks can help you to build a data lake, and it's something that they need to help make more popular. People are slowly understanding it because if you look, there are lots of data lakes that people are trying to create. I'm not intimate with it, but the concept seems complicated. I think they need to write up something where videos can explain it better. What I have seen on YouTube is quite complicated for an end-user to understand.
How was the initial setup?
The initial setup is easy. It's not difficult when you are used to Azure.
What's my experience with pricing, setup cost, and licensing?
I am based in South Africa, where it is expensive adapting to the cloud, and then there is the price for the tool itself.
The cost is difficult to estimate. I've got customers who went to the cloud and then they realized that the costs were more, compared to what they used to be on-premises. Also, because our exchange rate is so weak, I would always advocate that prices being lower is better, although I don't know how feasible it is.
What other advice do I have?
From a purely technical perspective, I would rate Databricks and eight out of ten. However, there is a failure in terms of user adoption. After I look at other products, including Synapse, those are better. I still feel that Databricks is quite complicated for the average person.
I would rate this solution a five out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Engineer at a tech services company with 10,001+ employees
An easy initial setup with a good time travel feature, but needs better model scoring
Pros and Cons
- "The time travel feature is the solution's most valuable aspect."
- "Databricks is an analytics platform. It should offer more data science. It should have more features for data scientists to work with."
What is our primary use case?
We use the solution for multiple items. We use lots of data crunching, development, and algorithms on it.
What is most valuable?
The time travel feature is the solution's most valuable aspect.
What needs improvement?
The management of the solution needs to be modernized. Managing the radius data is hard.
The solution requires modern scoring. There's not a good way of knowing how the models are performing from a data science perspective. The solution needs more model scoring abilities. It doesn't necessarily need more model monitoring, but more model scoring and performance from a data science perspective.
Databricks is an analytics platform. It should offer more data science. It should have more features for data scientists to work with.
For how long have I used the solution?
I've been using the solution for one year so far.
What do I think about the stability of the solution?
The solution is not exactly stable. We've faced a few bugs which have really affected it. There are bugs especially when it comes to connecting with Spark.
What do I think about the scalability of the solution?
It's hard to say how scalable the solution is. The scalability comes into play on the Spark side, not on the Databricks side.
We have about 20 people on the solution right now.
How are customer service and technical support?
We've never been in touch with technical support, so I don't have any experience in terms of dealing with them.
How was the initial setup?
The initial setup is straightforward. I wouldn't say that it's complex in any way.
Deployment times vary and really depend on multiple factors. It can take anywhere from a few weeks to a few months to deploy the solution. In our case, it took us about three months to fully deploy it.
It takes two to three people to deploy the solution.
What about the implementation team?
I deployed the solution with the help of my team.
What's my experience with pricing, setup cost, and licensing?
I'm not sure what the licensing costs are on the solution.
Which other solutions did I evaluate?
We did evaluate Amazon PageMaker before ultimately choosing Databricks. It's the only other solution we evaluated at the time.
What other advice do I have?
We're partners with Databricks.
We're using the latest version of the solution, but I can't recall what version number we are on.
I'd advise others considering the solution to look at usage. They shouldn't adopt the solution blindly. How the implementation and usage will go will depend on the skill of the data engineer and what your requirements are.
I'd rate the solution seven out of ten.
Which deployment model are you using for this solution?
On-premises
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Data Science Consultant at Syniti
Good performance, easy to set up, and easy to use if you have a Python background
Pros and Cons
- "I work in the data science field and I found Databricks to be very useful."
- "It would be very helpful if Databricks could integrate with platforms in addition to Azure."
What is our primary use case?
We are building internal tools and custom models for predictive analysis. We are currently building a platform where we can integrate multiple data sources, such as data that is coming from Azure, AWS, or any SQL database. We integrate the data and run our models on top of that.
We primarily use Databricks for data processing and for SQL databases.
What is most valuable?
I found that PySpark is the most useful tool. It uses in-memory calculation and when you want to run a model it does it very quickly. We used to use Python and when we migrated to PySpark the performance was much better.
What needs improvement?
It would be very helpful if Databricks could integrate with platforms in addition to Azure.
Having an open-source version or having the option to get a trial version of Databricks would be very helpful.
It would be very useful for beginners if there were tutorials and examples on how to write code for PySpark, R, or Scala. Having examples would give people something to refer to and play with.
For how long have I used the solution?
We have been using Databricks for the past two or three years.
What do I think about the stability of the solution?
A couple of times I faced an issue where a long-running process was consuming a lot of time and then stopped abruptly. It necessitated starting the process again.
What do I think about the scalability of the solution?
We are in the prototyping stage so we do not plan on increasing our usage yet.
How are customer service and technical support?
We have not been in contact with technical support.
Which solution did I use previously and why did I switch?
Before using Databricks, we were running our own cluster with a web server that executed our Python queries.
How was the initial setup?
The initial setup is straightforward. With respect to deployment, the development can be done within half an hour and we can use code and deploy from there.
What about the implementation team?
We implemented Databricks on our own. We haven't deployed as such, as we are just running our queries and it is not in production yet.
What other advice do I have?
I work in the data science field and I found Databricks to be very useful. If I want to run any models then I can code them in PySpark. If you are coming from a Python background then you can write code in PySpark and it runs quickly. This is a good solution in terms of performance.
I would rate this solution a nine out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Machine Learning Engineer at a tech vendor with 51-200 employees
A convenient notebook, good stability, and a straightforward setup
Pros and Cons
- "The most valuable aspect of the solution is its notebook. It's quite convenient to use, both terms of the research and the development and also the final deployment, I can just declare the spark jobs by the load tables. It's quite convenient."
- "The solution could be improved by integrating it with data packets. Right now, the load tables provide a function, like team collaboration. Still, it's unclear as to if there's a function to create different branches and/or more branches. Our team had used data packets before, however, I feel it's difficult to integrate the current with the previous data packets."
What is our primary use case?
We primarily use the solution to run current jobs; to run the spark jobs as the current job.
What is most valuable?
The most valuable aspect of the solution is its notebook. It's quite convenient to use, both terms of the research and the development and also the final deployment, I can just declare the spark jobs by the load tables. It's quite convenient.
What needs improvement?
The solution could be improved by integrating it with data packets. Right now, the load tables provide a function, like team collaboration. Still, it's unclear as to if there's a function to create different branches and/or more branches. Our team had used data packets before, however, I feel it's difficult to integrate the current with the previous data packets.
The support could be improved a bit around the database. When we stream it to Data Lake, some data cannot be loaded. It should be a priority to fix this.
For how long have I used the solution?
I've been using the solution for half a year.
What do I think about the stability of the solution?
The solution is stable.
What do I think about the scalability of the solution?
The solution is scalable. However, it still needs us to manually set out the number of nodes in a cluster. It's really dependent on the application. Sometimes, when the tasks are bigger, and it gets a little difficult for us to define the number of nodes in a cluster. If the solution could allow users to set up the clusters, I think that'll be good.
Currently, we have three people using the solution. We may increase usage in the future.
How are customer service and technical support?
The technical support is quite good. In the beginning, when we had a few POC projects, they were very supportive.
Which solution did I use previously and why did I switch?
We didn't previously use a different solution, however, we built our own from scratch. This is the first unified platform that we've used.
How was the initial setup?
The initial setup is very straightforward. We just use their job functions. To deploy as a spark job is quite straightforward.
In our use case, we also had some external databases to handle the deployment. For example, we only generated some prediction results. We saved the results into an external database. The solution takes time to deploy to the external database, but the spark job is quite easy.
What other advice do I have?
I'm a software development engineer. I'm working with the latest version.
As long as the developers have an understanding of spark, and understanding technical tricks, it's very fast in terms of using the database.
I'd rate the solution eight out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros
sharing their opinions.
Updated: November 2024
Popular Comparisons
Microsoft Azure Machine Learning Studio
KNIME
Alteryx
Amazon SageMaker
Dataiku
RapidMiner
IBM SPSS Statistics
Dremio
IBM Watson Studio
IBM SPSS Modeler
Anaconda
Domino Data Science Platform
Starburst Enterprise
H2O.ai
Cloudera Data Science Workbench
Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Which do you prefer - Databricks or Azure Machine Learning Studio?
- How would you compare Databricks vs Amazon SageMaker?
- Which would you choose - Databricks or Azure Stream Analytics?
- Which product would you choose for a data science team: Databricks vs Dataiku?
- Which are the best end-to-end data science platforms?
- What enterprise data analytics platform has the most powerful data visualization capabilities?
- What Data Science Platform is best suited to a large-scale enterprise?
- How can ML platforms be used to improve business processes?
- When evaluating Data Science Platforms, what aspect do you think is the most important to look for?