I have recently gotten into Databricks and trained on one model. I started using Databricks because of its hardware support and all the other things that it provides, and it is easier to get into. Earlier, when I had to test some part of my code or test if it was working or not, it was not just a fair, not a full production run, but just a fair testing; I had to get a machine, raise a request, get into the whole process. With Databricks, I can just simply create one myself. I could get the resources, whatever they are required, test it out all there, and then go ahead with that, and that is why I have been using it primarily.
Associate Machine Learning Engineer at a tech services company with 501-1,000 employees
Provides resources to users quickly without much hassle
Pros and Cons
- "The most valuable features of the solution are the hardware and the resources it quickly provides without much hassle."
- "I think setting up the whole account for one person and giving access are areas that can be difficult to manage and should be made a little easier."
What is our primary use case?
What is most valuable?
The most valuable features of the solution are the hardware and the resources it quickly provides without much hassle.
What needs improvement?
I think setting up the whole account for one person and giving access are areas that can be difficult to manage and should be made a little easier.
For how long have I used the solution?
I have experience with Databricks.
Buyer's Guide
Databricks
November 2024
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
816,406 professionals have used our research since 2012.
What do I think about the stability of the solution?
I think there's a duration after which our training without any activity would expire, which I think is a fair point, and that is the only place where I think this will stop. I haven't come across a lot of problems with Databricks.
What do I think about the scalability of the solution?
The tool is not used as frequently as PyTorch. I don't know why I am comparing Databricks to PyTorch, but I think around five people use it.
How are customer service and support?
I have not contacted the solution's technical support team.
Which solution did I use previously and why did I switch?
Before Databricks, I used to use a cloud support platform.
How was the initial setup?
The solution is deployed on the cloud.
Which other solutions did I evaluate?
I chose Databricks over other products, considering the hardware support it offers.
What other advice do I have?
A little bit of time will be needed to get comfortable with Databricks.
I rate the tool an eight out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Last updated: Jul 23, 2024
Flag as inappropriateLead Analytics at a manufacturing company with 10,001+ employees
Useful machine learning and easy to scale
Pros and Cons
- "In the manufacturing industry, Databricks can be beneficial to use because of machine learning. It is useful for tasks, such as product analysis or predictive maintenance."
- "The stability of the clusters or the instances of Databricks would be better if it was a much more stable environment. We've had issues with crashes."
What is our primary use case?
Our team is currently utilizing machine learning for various applications, and a few members are also exploring Databrick's use for ML operations.
What is most valuable?
In the manufacturing industry, Databricks can be beneficial to use because of machine learning. It is useful for tasks, such as product analysis or predictive maintenance.
For how long have I used the solution?
I have been using Databricks for approximately six months
What do I think about the stability of the solution?
The stability of the clusters or the instances of Databricks would be better if it was a much more stable environment. We've had issues with crashes.
What do I think about the scalability of the solution?
The scalability of Databricks is good as long as you have a data lake, and it's easy to scale.
We have approximately 50 users using this solution in my company.
How are customer service and support?
We have a different team who handles the support. I do not have contact with Databricks support.
Which solution did I use previously and why did I switch?
I have not used a similar solution to Databricks.
What was our ROI?
I have seen an ROI using Databricks.
What's my experience with pricing, setup cost, and licensing?
I rate the price of Databricks as eight out of ten.
What other advice do I have?
Having a good understanding of physical security in relation to cybersecurity in an OT (Operational Technology) environment would be beneficial, and utilizing an existing data lake prior to implementing a Databricks initiative would greatly aid in its success.
I rate Databricks an eight out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Databricks
November 2024
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
816,406 professionals have used our research since 2012.
Associate Principal - Data Engineering at a tech services company with 10,001+ employees
It's a unified platform that lets you do streaming and batch processing in the same place
Pros and Cons
- "I like that Databricks is a unified platform that lets you do streaming and batch processing in the same place. You can do analytics, too. They have added something called Databricks SQL Analytics, allowing users to connect to the data lake to perform analytics. Databricks also will enable you to share your data securely. It integrates with your reporting system as well."
- "Databricks may not be as easy to use as other tools, but if you simplify a tool too much, it won't have the flexibility to go in-depth. Databricks is completely in the programmer's hands. I prefer flexibility rather than simplicity."
What is our primary use case?
We build data solutions for the banking industry. Previously, we worked with AWS, but now we are on Azure. My role is to assess the current legacy applications and provide cloud alternatives based on the customers' requirements and expectations.
Databricks is a unified platform that provides features like streaming and batch processing. All the data scientists, analysts, and engineers can collaborate on a single platform. It has all the features, you need, so you don't need to go for any other tool.
What is most valuable?
I like that Databricks is a unified platform that lets you do streaming and batch processing in the same place. You can do analytics, too. They have added something called Databricks SQL Analytics, allowing users to connect to the data lake to perform analytics. Databricks also will enable you to share your data securely. It integrates with your reporting system as well.
The Unity Catalog provides you with the data links and material capabilities. These are some of the unique features that fulfill all the requirements of the banking domain.
What needs improvement?
Every tool has room for improvement. Normally what happens, a solution will claim it can do ETL and everything else, but you encounter some limitations when you actually start. Then you keep on interacting with the vendor, and they continue to upgrade it. For example, we haven't fully implemented Databricks Unity Catalog, a newly introduced feature. We need to check how it works and then accordingly, there can be improvements in that also.
Databricks may not be as easy to use as other tools, but if you simplify a tool too much, it won't have the flexibility to go in-depth. Databricks is completely in the programmer's hands. I prefer flexibility rather than simplicity.
For how long have I used the solution?
I have been using Databricks for a year.
What do I think about the scalability of the solution?
Databricks relies on scalability and performance. Every cloud vendor prioritizes scalability, high availability, performance, and security. These are the most important reasons to move to the cloud.
How was the initial setup?
Deploying Databricks on the cloud is straightforward. It's not like an on-premise solution, where you must create a cluster and all those other prerequisites for big data.
I don't think it's challenging to maintain, but you need an expert programmer because Databricks isn't GUI-based. With GUI-based tools, building ETLs is drag-and-drop. Databricks entirely relies on coding, so you need skilled programmers to building your code, ETLs, etc.
What's my experience with pricing, setup cost, and licensing?
The price of Databricks is based on the computing volume. You also need to pay storage costs for the cloud where you're hosting Databricks, whether it is AWS, Azure, or Google.
What other advice do I have?
I rate Databricks nine out of 10. Databricks is one of the best tools on the market.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: My company has a business relationship with this vendor other than being a customer: Implementer
Chief Data-strategist and Director at Theworkshop.es
Flexible, stable, and reasonably priced
Pros and Cons
- "The solution is very easy to use."
- "The integration of data could be a bit better."
What is our primary use case?
We primarily use the solution for retail and manufacturing companies. It allows us to build data lakes.
What is most valuable?
The solution is very easy to use.
The storage on offer is very good.
The solution is perfect for dealing with big data.
The artificial intelligence on offer is very good.
The product is quite flexible.
We have found the solution to be stable.
The cloud services on offer are very reasonably priced.
Technical support is very good. They also have very good documentation on offer to help you navigate the product and learn about its offerings.
What needs improvement?
The solution works very well for us. I can't recall any missing features or anything the solution really lacks. It's very complete.
It would help if there were different versions of the solution on offer.
The integration of data could be a bit better.
For how long have I used the solution?
I've worked for about 20 to 25 years in business intelligence analytics and have worked with Databricks for about four years at this point.
What do I think about the stability of the solution?
The stability of the solution is very good. It doesn't crash or freeze. There are no bugs or glitches. Its performance is very good.
What do I think about the scalability of the solution?
The scalability is quite good. A company that needs to expand it can do so with ease.
We only have four people on the solution at this time. The front-end users never use the product directly. The companies aren't that big here. If the economy improves, we'll likely have more of a need for the product.
How are customer service and technical support?
I've dealt with technical support in the past and have found them to be very good. They are helpful and responsive. We are satisfied with their level of service.
Which solution did I use previously and why did I switch?
I work with Databricks, Cloudera and Snowflake.
How was the initial setup?
The solution is on the cloud and therefore there isn't really an installation process that you need to go through. You only really need to configure the clusters.
Within the clusters, you configure according to how many platforms you need, or if you want to, you can build a cluster for artificial intelligence. You just configure it as required.
What's my experience with pricing, setup cost, and licensing?
The pricing of the product is very reasonable. The fact that it is on the cloud makes it a less expensive option. Other solutions that are on-premises are quite expensive.
What other advice do I have?
We are customers and end-users.
Databricks is on the could and therefore, we're always on the latest version of the solution. It's constantly updated for us so that we have access to the latest updates and upgrades.
I'd rate the solution at a nine out of ten. The capability of the product is quite good and we are very satisfied with it overall.
I'd recommend the solution to other companies and organizations.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Senior Data Engineer at TCS
Supports multiple languages, plenty of Python libraries, but user-interface could improve
Pros and Cons
- "Databricks is a unified solution that we can use for streaming. It is supporting open source languages, which are cloud-agnostic. When I do database coding if any other tool has a similar language pack to Excel or SQL, I can use the same knowledge, limiting the need to learn new things. It supports a lot of Python libraries where I can use some very easily."
- "The query plan is not easy with Databrick's job level. If I want to tune any of the code, it is not easily available in the blogs as well."
What is our primary use case?
We are using Databricks to receive the data from Data Lake where we are processing it and doing the transformation, and cleansing. Once it is processed, we are sending the data to the Azure SQL database.
What is most valuable?
Databricks is a unified solution that we can use for streaming. It is supporting open source languages, which are cloud-agnostic. When I do database coding if any other tool has a similar language pack to Excel or SQL, I can use the same knowledge, limiting the need to learn new things. It supports a lot of Python libraries where I can use some very easily.
What needs improvement?
The query plan is not easy with Databrick's job level. If I want to tune any of the code, it is not easily available in the blogs as well.
For how long have I used the solution?
I have been using Databricks for approximately three years.
What do I think about the stability of the solution?
Databricks is stable.
What do I think about the scalability of the solution?
The salability of Databricks is good. However, if I want to use the higher clusters and high concurrency clusters, you will need to wait more time to spin up the clusters.
We have different teams. Among them, I'm part of the data analytics where, our team, almost 10 people are using it. But I'm not sure about the rest of the teams.
We are using Databricks extensively. We have a team of 10 using the solution.
How was the initial setup?
The initial setup of Databricks is not straightforward. You need to create VLANs, VPNs, and networks. We are two ways of deployment, we are having the legacy PowerShell for the deployment and the template method to deploy the Databricks code to higher levels.
We have not integrated Databricks directly into the DevOps architecture. We are downloading the notebooks manually and we are uploading them.
What's my experience with pricing, setup cost, and licensing?
The billing of Databricks can be difficult and should improve.
Which other solutions did I evaluate?
We have evaluated Azure Synapse and SQL. Both Databricks and Azure Synapse are similar, the UI is the only difference. SQL and Databricks are the same, and one of the largest setbacks is the processing of a lot of data takes a long time.
What other advice do I have?
I rate Databricks a seven out of ten.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Owner at a marketing services firm with 1-10 employees
The data governance has been absolutely efficient in between other kinds of solutions
Pros and Cons
- "Databricks' Lakehouse architecture has been most useful for us. The data governance has been absolutely efficient in between other kinds of solutions."
- "I would like it if Databricks made it easier to set up a project."
What is our primary use case?
We use Databricks for video streaming and security purposes.
What is most valuable?
Databricks' Lakehouse architecture has been most useful for us. The data governance has been absolutely efficient in between other kinds of solutions.
What needs improvement?
I would like it if Databricks made it easier to set up a project. The use case determines which services we are going to use. You have the application engine, and you generate a potential budget for your workloads, so you can understand what you are going to do, what you are going to use, and what you will invest in.
Because I'm deploying on the Google Cloud Platform, measuring the investment, value, and use case is extremely difficult. So I leave it and move on without the risk. It would be easier if I had one page where you can see three columns: one for the use cases of a specific architecture, a second one for the prices based on the volume of data or machine time, and the third column for the budget. That would make it easier to know if I am using the appropriate architecture for the right solution.
I have seen something like that in Microsoft Azure, but obviously Microsoft Azure costs a lot of money. Amazon has something like that, but it's very complicated to use.
For how long have I used the solution?
We've been using Databricks for about five years.
What do I think about the stability of the solution?
Databricks is very stable and powerful.
What do I think about the scalability of the solution?
It was simple to make Databricks scalable. We found that we could set up an alert to tell us if we needed more resources, money, or time from our team. We're alerted when the system detects some trigger for any use of the instance. If you have another alert from your side, that would be extremely useful because it takes a lot of time to develop that kind of trigger.
How are customer service and support?
Databricks technical support was lovely. We don't need it so much, but the few questions we had were answered immediately.
How was the initial setup?
I am not a data engineer because I just started data science at the company, but it was straightforward and clear for the architect to set up. He provided me with that idea because he realized it would take time if we had use cases. You can select and change the data or add some modules or products. You have all the technology to do so.
What other advice do I have?
I rate Databricks eight out of 10. I like to move my customers into Databricks, but I take care of the internal system infrastructure so they can continue to use familiar software or operating systems and databases. They have a lot of doubts because they don't know the solution. We need to train them, explain things, and show the solution's potential value.
Generally, companies try to keep the same flavor when they migrate. For example, if they are using many Microsoft products, they want to work with Azure. If they are open to other options, they go with GCP or AWS. However, Databricks doesn't have enough customers here in my market because it's not a visible brand. Azure, GCP, and AWS are highly visible here, so the local teams are friendly with the three brands.
Which deployment model are you using for this solution?
Private Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Other
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Coordenador Financeiro at Icatu
Good technical support, but is difficult to set up and integrate
Pros and Cons
- "The technical support is good."
- "The initial setup is difficult."
What is our primary use case?
I believe we are using the new version.
Our company makes comprehensive use of the solution to consolidate data and do a certain amount of reporting and analytics. All the data consumers use Databricks to develop the information.
What needs improvement?
Data governance should be addressed. We have some trouble connecting all the governance solutions with Databricks. This means the integrative capabilities are problematic.
The initial setup is difficult.
For how long have I used the solution?
We have been using Databricks for a year-and-a-half.
What do I think about the stability of the solution?
The solution is stable.
What do I think about the scalability of the solution?
The solution is scalable.
How are customer service and support?
The technical support is good.
Which solution did I use previously and why did I switch?
As we are talking about a corporate solution, the deployment of Databricks lasted longer than the one day it took for Alteryx.
We used Alteryx prior to Databricks and continue to do so, it being the only other solution we have employed. We use the two with different software.
How was the initial setup?
The initial setup is difficult.
While I don't know exactly how long the deployment took, I do know that it lasted longer than the one day needed for Alteryx.
What about the implementation team?
I believe we used a partner for the deployment, although I cannot say for certain, as this is not within my purview.
I don't know how many people are needed for maintenance and deployment.
What's my experience with pricing, setup cost, and licensing?
As the licensing is not within my purview, I am not in a position to comment on this.
What other advice do I have?
My company makes use of the solution. It is employed by my data team and the technology one. I do not have personal experience using the solution.
The solution is deployed on base, on data.
I am not aware of how many people make use of it.
I rate Databricks as a seven out of ten.
Which deployment model are you using for this solution?
Private Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Data Scientist at a computer software company with 501-1,000 employees
Good built-in optimization, easy to use with a great user interface
Pros and Cons
- "The built-in optimization recommendations halved the speed of queries and allowed us to reach decision points and deliver insights very quickly."
- "The product could be improved by offering an expansion of their visualization capabilities, which currently assists in development in their notebook environment."
What is our primary use case?
We are using this solution to run large analytics queries and prepare datasets for SparkML and ML using PySpark.
We ran on multiple clusters set up for a minimum of three and a maximum of nine nodes having 16GB RAM each.
For one ad hoc requirement, a 32-node cluster was required.
Databricks clusters were set for autoscaling and to time out after forty minutes of inactivity. Multiple users attached their notebooks to a cluster. When some workloads required different libraries, a dedicated cluster was spun up for that user.
How has it helped my organization?
Databricks took care of all the underlying cluster management seamlessly. We could configure our clusters to run and deliver results without any delays due to hardware configuration or installation issues.
Databricks allowed us to go from non-existent insights (because the datasets were just too large) to immediate and rich insights once the datasets were ingested into our PySpark notebooks.
What is most valuable?
Immense ease in running very large scale analytics, with a convenient and slick UI. This saved us from having to tweak, tune, dive into deeper abstractions, get involved in procurement, and also having to wait for other workloads to run.
The built-in optimization recommendations halved the speed of queries and allowed us to reach decision points and deliver insights very quickly.
The Delta data format proved excellent. Databricks had already done the heavy lifting and optimized the format for large scale interactive querying. They saved us a lot of time.
What needs improvement?
The product could be improved by offering an expansion of their visualization capabilities, which currently assists in development in their notebook environment. Perhaps a few connectors that auto-deploy to a reporting server?
More parallelized Machine Learning libraries would be excellent for predictive analytics algorithms.
For how long have I used the solution?
I have been using this solution for three years.
What do I think about the stability of the solution?
This solution is stable and proved very robust. When very obvious programmatic recommendations were not followed, causing memory overruns on a driver, the clusters required restarting.
What do I think about the scalability of the solution?
Absolutely, seamlessly, and massively scalable, within only budgetary limits. Also, the product itself offers real-time efficiency and optimization recommendations.
How are customer service and support?
So brilliant, it was never required. Their documentation is comprehensive, clear, simple, and thorough.
Which solution did I use previously and why did I switch?
Previously I used Hive and Livy in Zeppelin on an in-house Hadoop installation. The queries constantly threw exceptions and timeouts and the necessary configuration changes proved time-consuming and problematic. Databricks, on the other hand, simply made all those problems vanish.
How was the initial setup?
Setup and Support are single-click.
What about the implementation team?
We used an in-house team for implementation.
What was our ROI?
Our ROI was of the order of USD $75k per year for one deployment. We were able to switch our workloads from an onsite Hadoop cluster, billed to our department for more than USD $100k per year, to a Databricks workspace in the cloud for a quarter of that expenditure.
Further, we were able to transparently and efficiently scale our queries to run under fifteen minutes per major analytics use case, while being subject to unstable queries and highly brittle data flow use cases from the in-house Hadoop cluster.
We are further reducing spending on our traditional RDBMS solution by offloading reporting workloads to the Databricks PySpark notebooks, which is reducing our expensive datacenter resources and freeing up RDBMS resources for OLTP loads.
What's my experience with pricing, setup cost, and licensing?
Set up a cluster in your cloud of choice, but Databricks' service might also be very competitive as their pricing units will be built in.
Licensing on site I would counsel against, as on-site hardware issues tend to really delay and slow down delivery.
Which other solutions did I evaluate?
I evaluated Hortonworks, Livy, and Zeppelin. These were unsuitable due to the unavailability of sufficiently skilled personnel.
What other advice do I have?
By investing in people skilled in data querying, Python coding, and even basic Data Science, a Databricks setup will reward the business.
Once the Databricks data flows are established, it is a matter of a few incremental steps to opening up streaming and running up-to-the-minute queries, allowing the business to build its data-driven processes.
Databricks continues to advance the state-of-the-art and will be my go-to choice for mission-critical PySpark and ML workflows.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros
sharing their opinions.
Updated: November 2024
Popular Comparisons
Microsoft Azure Machine Learning Studio
KNIME
Alteryx
Amazon SageMaker
Dataiku
RapidMiner
IBM SPSS Statistics
Dremio
IBM Watson Studio
IBM SPSS Modeler
Anaconda
Domino Data Science Platform
Starburst Enterprise
H2O.ai
Cloudera Data Science Workbench
Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Which do you prefer - Databricks or Azure Machine Learning Studio?
- How would you compare Databricks vs Amazon SageMaker?
- Which would you choose - Databricks or Azure Stream Analytics?
- Which product would you choose for a data science team: Databricks vs Dataiku?
- Which are the best end-to-end data science platforms?
- What enterprise data analytics platform has the most powerful data visualization capabilities?
- What Data Science Platform is best suited to a large-scale enterprise?
- How can ML platforms be used to improve business processes?
- When evaluating Data Science Platforms, what aspect do you think is the most important to look for?