Try our new research platform with insights from 80,000+ expert users
MILTON FERREIRA - PeerSpot reviewer
Co-founder/Senior Data Scientist at Hence
Real User
Responsive support, integrates and scales well
Pros and Cons
  • "The most valuable feature of Databricks is the integration of the data warehouse and data lake, and the development of the lake house. Additionally, it integrates well with Spark for processing data in production."
  • "The solution could be improved by adding a feature that would make it more user-friendly for our team. The feature is simple, but it would be useful. Currently, our team is more familiar with the language R, but Databricks requires the use of Jupyter Notebooks which primarily supports Python. We have tried using RStudio, but it is not a fully integrated solution. To fully utilize Databricks, we have to use the Jupyter interface. One feature that would make it easier for our team to adopt the Jupyter interface would be the ability to select a specific variable or line of code and execute it within a cell. This feature is available in other Jupyter Notebooks outside of Databricks and in our own IDE, but it is not currently available within Databricks. If this feature were added, it would make the transition to using Databricks much smoother for our team."

What is our primary use case?

We are using Databricks for machine learning workloads specifically.

Databricks aligns well with our skillset and overall approach. We sought out their solution specifically for a big data application we are currently working on, as we needed a platform capable of handling large amounts of data and building models. Additionally, the fact that they use open-source software and can integrate data warehouse and data lake systems was particularly appealing, as we have encountered such issues in the past. We determined that Databricks would be an effective solution for our needs.

What is most valuable?

The most valuable feature of Databricks is the integration of the data warehouse and data lake, and the development of the lake house. Additionally, it integrates well with Spark for processing data in production. 

What needs improvement?

The solution could be improved by adding a feature that would make it more user-friendly for our team. The feature is simple, but it would be useful. Currently, our team is more familiar with the language R, but Databricks requires the use of Jupyter Notebooks which primarily supports Python. We have tried using RStudio, but it is not a fully integrated solution. To fully utilize Databricks, we have to use the Jupyter interface. One feature that would make it easier for our team to adopt the Jupyter interface would be the ability to select a specific variable or line of code and execute it within a cell. This feature is available in other Jupyter Notebooks outside of Databricks and in our own IDE, but it is not currently available within Databricks. If this feature were added, it would make the transition to using Databricks much smoother for our team.

The most important feature other than the Jupyter interface would be to have the RStudio interface inside Databricks. This would be perfect.

For how long have I used the solution?

We have been using Databricks for approximately one year.

Buyer's Guide
Databricks
February 2025
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: February 2025.
838,713 professionals have used our research since 2012.

What do I think about the stability of the solution?

The stability of Databricks is good.

I rate the stability of Databricks a nine out of ten.

What do I think about the scalability of the solution?

Databricks is scalable.

I rate the scalability of Databricks a nine out of ten.

How are customer service and support?

I have been receiving responsive answers from Databricks's support. I have been pleased with the support.

I rate the support from Databricks a ten out of ten.

How would you rate customer service and support?

Positive

How was the initial setup?

The initial setup of Databricks is simple. I did not experience any challenges. The time it takes for the deployment is approximately four hours.

I rate the initial setup of Databricks.

What about the implementation team?

We did the deployment of the solution in-house. There were three people involved in the deployment. A data engineer, data analyst, and machine learning engineer.

What's my experience with pricing, setup cost, and licensing?

We have only incurred the cost of our AWS cloud services. This is because during this period, Databricks provided us with an extended evaluation period, and we have not spent much money yet. We are just starting to incur costs this month, I will know more later on the full cost perspective.

We only pay standard fees for the solution. 

What other advice do I have?

We use a data engineer, data analyst, and machine learning engineer for the maintenance of the solution.

I rate Databricks a nine out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Trond Jensen - PeerSpot reviewer
Data Analyst at Eviny
Real User
Fast and does what it needs to but customer service should be improved upon
Pros and Cons
  • "It is fast, it's scalable, and it does the job it needs to do."
  • "I would like to see the integration between Databricks and MLflow improved. It is quite hard to train multiple models in parallel in the distributed fashions. You hit rate limits on the clients very fast."

What needs improvement?

I would like to see the integration between Databricks and MLflow improved. It is quite hard to train multiple models in parallel in the distributed fashions. You hit rate limits on the clients very fast.

For how long have I used the solution?

I have been using Databricks for three years.

What do I think about the stability of the solution?

I would rate the stability of this solution a nine out of 10, with one being not stable and 10 being very stable.

What do I think about the scalability of the solution?

I would rate the scalability of this solution an eight out of 10, with one being not scalable and 10 being very scalable.

There are three people using this solution in our organization.

How are customer service and support?

I would rate the available customer service a three. It's worth mentioning that this is Microsoft and not Databricks itself. I haven't spoken to Databricks people directly, but I know the people who have and they have been a lot more pleased.

How would you rate customer service and support?

Negative

What's my experience with pricing, setup cost, and licensing?

I would rate their pricing plan a six (on a scale of one to 10, with one being cheap and 10 being expensive). I think the prices could be lowered a little bit.

What other advice do I have?

Overall, I would rate this solution an eight out of 10, with one being quite poor and 10 being excellent. It is fast, it's scalable, and it does the job it needs to do.

Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
PeerSpot user
Buyer's Guide
Databricks
February 2025
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: February 2025.
838,713 professionals have used our research since 2012.
Head of Credit Risk and Data at Cegid Invoice and Financing
Real User
It's a reasonably priced all-in-one platform that enables us to build a lakehouse framework
Pros and Cons
  • "Databricks gives us the ability to build a lakehouse framework and do everything implicit to this type of database structure. We also like the ability to stream events. Databricks covers a broad spectrum, from reporting and machine learning to streaming events. It's important for us to have all these features in one platform."
  • "I'm not the guy that I'm working with Databricks on a daily basis. I'm on the management team. However, my team tells me there are limitations with streaming events. The connectors work with a small set of platforms. For example, we can work with Kafka, but if we want to move to an event-driven solution from AWS, we cannot do it. We cannot connect to all the streaming analytics platforms, so we are limited in choosing the best one."

What is our primary use case?

We primarily use Databricks for reporting and machine learning.

What is most valuable?

Databricks gives us the ability to build a lakehouse framework and do everything implicit to this type of database structure. We also like the ability to stream events. Databricks covers a broad spectrum, from reporting and machine learning to streaming events. It's important for us to have all these features in one platform.

What needs improvement?

I'm not the guy that I'm working with Databricks on a daily basis. I'm on the management team. However, my team tells me there are limitations with streaming events. The connectors work with a small set of platforms. For example, we can work with Kafka, but if we want to move to an event-driven solution from AWS, we cannot do it. We cannot connect to all the streaming analytics platforms, so we are limited in choosing the best one.

Also, this is an all-in-one platform, but it might be preferable if there were an a la carte model where we could select the best tool in each class for reporting, machine learning, etc. I'm not yet sure if this strategy is the best one. 

For how long have I used the solution?

We've been using Databricks since the start of the year.

What do I think about the stability of the solution?

Databricks is quite stable. We haven't had any issues with stability. It's always working perfectly with no downtime.

What do I think about the scalability of the solution?

Databricks is based on Spark, which is based on Scala. These languages aren't easy to handle, and it's challenging to find people who know them well. At the same time, a couple of other vendors that work on top of Databricks are low-code platforms. We have to work around Databrick's lack of scalability by using low-code platforms that work on top of Databricks to give us scalability.

How are customer service and support?

I'll give Databricks support 10 out of 10. They are always prompt even though we didn't buy a support package. They have done an excellent job.

How would you rate customer service and support?

Positive

How was the initial setup?

Setting up Databricks is a bit complex, and the initial deployment took a few days—closer to a week. Of course, not everyone is working full-time on this. There are intervals when people are doing other stuff. 

What was our ROI?

It's too soon to tell what kind of return we're getting because we just started using it, and we're still migrating.

What's my experience with pricing, setup cost, and licensing?

The cost of Databricks is in the lower range compared to other solutions. That was one of the main reasons we chose Databricks over other vendors and platforms.  

We pay as we go, so there isn't a fixed price. It's charged by the unit. I don't have any details detail about how they measure this, but it should be a mix between processing and quantity of data handled. We run a simulation based on our use cases, which gives us an estimate. We've been monitoring this, and the costs have met our expectations. 

What other advice do I have?

I give Databricks nine out of 10. The solution has met all our expectations. I'd recommend it to a friend. It's a reasonably priced all-in-one solution that gives us data lake and lakehouse capabilities. Those were the primary reasons we chose Databricks.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer1702092 - PeerSpot reviewer
Machine Learning Engineer at a mining and metals company with 10,001+ employees
Real User
Highly scalable, stable and good technical support
Pros and Cons
  • "Databricks is a scalable solution. It is the largest advantage of the solution."
  • "The interface of Databricks could be easier to use when compared to other solutions. It is not easy for non-data scientists. The user interface is important before we had to write code manually and as solutions move to "No code AI" it is critical that the interface is very good."

What is our primary use case?

We were using Databricks to build an AI solution. We are only evaluating it, we have approximately three people that tried it out. Later we choose another solution, we did not fully deploy Databricks.

How has it helped my organization?

Before I used Databricks it took me a long time to do some functions and now with Databricks I can do them much quicker. It scales very well.

What needs improvement?

The interface of Databricks could be easier to use when compared to other solutions. It is not easy for non-data scientists. The user interface is important before we had to write code manually and as solutions move to "No code AI" it is critical that the interface is very good.

For how long have I used the solution?

I have used Databricks within the last 12 months.

What do I think about the stability of the solution?

The solution is stable.

What do I think about the scalability of the solution?

Databricks is a scalable solution. It is the largest advantage of the solution.

How are customer service and support?

We have been in contact with the technical support of Databricks, they were good.

Which solution did I use previously and why did I switch?

We have used a lot of different solutions, such as Watson and DataIQ.

How was the initial setup?

The initial setup is easy. However, I do not know much about the implementation because the company does it.

What about the implementation team?

We did the implementation of the solution.

What other advice do I have?

If companies want scalability, they should choose Databricks.

I rate Databricks a nine out of ten.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Oscar Estorach - PeerSpot reviewer
Chief Data-strategist and Director at Theworkshop.es
Real User
Top 10
Flexible, stable, and reasonably priced
Pros and Cons
  • "The solution is very easy to use."
  • "The integration of data could be a bit better."

What is our primary use case?

We primarily use the solution for retail and manufacturing companies. It allows us to build data lakes.

What is most valuable?

The solution is very easy to use. 

The storage on offer is very good. 

The solution is perfect for dealing with big data.

The artificial intelligence on offer is very good.

The product is quite flexible.

We have found the solution to be stable. 

The cloud services on offer are very reasonably priced.

Technical support is very good. They also have very good documentation on offer to help you navigate the product and learn about its offerings. 

What needs improvement?

The solution works very well for us. I can't recall any missing features or anything the solution really lacks. It's very complete. 

It would help if there were different versions of the solution on offer.

The integration of data could be a bit better.

For how long have I used the solution?

I've worked for about 20 to 25 years in business intelligence analytics and have worked with Databricks for about four years at this point. 

What do I think about the stability of the solution?

The stability of the solution is very good. It doesn't crash or freeze. There are no bugs or glitches. Its performance is very good.

What do I think about the scalability of the solution?

The scalability is quite good. A company that needs to expand it can do so with ease.

We only have four people on the solution at this time. The front-end users never use the product directly. The companies aren't that big here. If the economy improves, we'll likely have more of a need for the product.

How are customer service and technical support?

I've dealt with technical support in the past and have found them to be very good. They are helpful and responsive. We are satisfied with their level of service.

Which solution did I use previously and why did I switch?

I work with  Databricks, Cloudera and Snowflake.

How was the initial setup?

The solution is on the cloud and therefore there isn't really an installation process that you need to go through. You only really need to configure the clusters. 

Within the clusters, you configure according to how many platforms you need, or if you want to, you can build a cluster for artificial intelligence. You just configure it as required. 

What's my experience with pricing, setup cost, and licensing?

The pricing of the product is very reasonable. The fact that it is on the cloud makes it a less expensive option. Other solutions that are on-premises are quite expensive.

What other advice do I have?

We are customers and end-users. 

Databricks is on the could and therefore, we're always on the latest version of the solution. It's constantly updated for us so that we have access to the latest updates and upgrades. 

I'd rate the solution at a nine out of ten. The capability of the product is quite good and we are very satisfied with it overall. 

I'd recommend the solution to other companies and organizations.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
PeerSpot user
Global Data Architecture and Data Science Director at FH
Real User
ExpertModerator
Flexible with support for several programming languages, good visualization and workload management functionality
Pros and Cons
  • "Databricks gives you the flexibility of using several programming languages independently or in combination to build models."
  • "Databricks requires writing code in Python or SQL, so if you're a good programmer then you can use Databricks."

What is our primary use case?

The primary use is for data management and managing workloads of data pipelines.

Databricks can also be used for data visualization, as well as to implement machine learning models. Machine learning development can be done using R, Python, and Spark programming.

What is most valuable?

Databricks gives you the flexibility of using several programming languages independently or in combination to build models.

The quick visualization of the data is very good.

The workload management functionality works well.

What needs improvement?

Databricks requires writing code in Python or SQL, so if you're a good programmer then you can use Databricks.

For how long have I used the solution?

I have been using Databricks since 2017. I am no longer using it personally, although my team is, and will continue to do so in the future.

What do I think about the stability of the solution?

Databricks is quite popular these days and it appears to be stable. I have not found any issues with stability.

What do I think about the scalability of the solution?

Databricks is scalable, regardless of which cloud provider is being used. It is supported on Microsoft Azure, AWS, and they have their own cloud as well.

For a small workload, Databricks may not be worth the costs. However, for larger workloads, Databricks is a very good solution.

In my previous organization, there were between 10 and 15 users.

How are customer service and technical support?

The technical support is handled by Microsoft partners and because we had premium support, it was easy to get. That said, I did not require any support.

Which solution did I use previously and why did I switch?

I have not used tools that are similar to Databricks for workload management, but Azure ADFv2, Google BigQuery, SAS are some the most powerful tools in this space, that I have used in the past. I have also heard of Dataiku and other tools but I have not used them. The only things that I have used are tools written in Python or scripting languages.

How was the initial setup?

There is no installation required.

What's my experience with pricing, setup cost, and licensing?

Databricks uses pay-per-use model, where you can use as much compute as you need. I think that the cost can be reduced, given that there are more users on the platform, although it is not as expensive as some other solutions like SAS.

What other advice do I have?

As we transition to the Azure cloud, I expect that we will be using Databricks for workloads.

This is a product that I recommend for those who want to scale and have a good budget. It is good for automating a data pipeline and managing workloads. My advice for anybody who is starting to use it is to take the proper training.

Overall, based on my uses, I think that this product is pretty good.

I would rate this solution an eight out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Tristan Bergh - PeerSpot reviewer
Data Scientist at a computer software company with 501-1,000 employees
Real User
Top 10
Good built-in optimization, easy to use with a great user interface
Pros and Cons
  • "The built-in optimization recommendations halved the speed of queries and allowed us to reach decision points and deliver insights very quickly."
  • "The product could be improved by offering an expansion of their visualization capabilities, which currently assists in development in their notebook environment."

What is our primary use case?

We are using this solution to run large analytics queries and prepare datasets for SparkML and ML using PySpark.

We ran on multiple clusters set up for a minimum of three and a maximum of nine nodes having 16GB RAM each.

For one ad hoc requirement, a 32-node cluster was required.

Databricks clusters were set for autoscaling and to time out after forty minutes of inactivity. Multiple users attached their notebooks to a cluster. When some workloads required different libraries, a dedicated cluster was spun up for that user.

How has it helped my organization?

Databricks took care of all the underlying cluster management seamlessly. We could configure our clusters to run and deliver results without any delays due to hardware configuration or installation issues.

Databricks allowed us to go from non-existent insights (because the datasets were just too large) to immediate and rich insights once the datasets were ingested into our PySpark notebooks.

What is most valuable?

Immense ease in running very large scale analytics, with a convenient and slick UI. This saved us from having to tweak, tune, dive into deeper abstractions, get involved in procurement, and also having to wait for other workloads to run.

The built-in optimization recommendations halved the speed of queries and allowed us to reach decision points and deliver insights very quickly. 

The Delta data format proved excellent. Databricks had already done the heavy lifting and optimized the format for large scale interactive querying. They saved us a lot of time.

What needs improvement?

The product could be improved by offering an expansion of their visualization capabilities, which currently assists in development in their notebook environment. Perhaps a few connectors that auto-deploy to a reporting server?

More parallelized Machine Learning libraries would be excellent for predictive analytics algorithms.

For how long have I used the solution?

I have been using this solution for three years.

What do I think about the stability of the solution?

This solution is stable and proved very robust. When very obvious programmatic recommendations were not followed, causing memory overruns on a driver, the clusters required restarting.

What do I think about the scalability of the solution?

Absolutely, seamlessly, and massively scalable, within only budgetary limits. Also, the product itself offers real-time efficiency and optimization recommendations. 

How are customer service and support?

So brilliant, it was never required. Their documentation is comprehensive, clear, simple, and thorough. 

Which solution did I use previously and why did I switch?

Previously I used Hive and Livy in Zeppelin on an in-house Hadoop installation. The queries constantly threw exceptions and timeouts and the necessary configuration changes proved time-consuming and problematic. Databricks, on the other hand, simply made all those problems vanish. 

How was the initial setup?

Setup and Support are single-click.

What about the implementation team?

We used an in-house team for implementation.

What was our ROI?

Our ROI was of the order of USD $75k per year for one deployment. We were able to switch our workloads from an onsite Hadoop cluster, billed to our department for more than USD $100k per year, to a Databricks workspace in the cloud for a quarter of that expenditure. 

Further, we were able to transparently and efficiently scale our queries to run under fifteen minutes per major analytics use case, while being subject to unstable queries and highly brittle data flow use cases from the in-house Hadoop cluster.

We are further reducing spending on our traditional RDBMS solution by offloading reporting workloads to the Databricks PySpark notebooks, which is reducing our expensive datacenter resources and freeing up RDBMS resources for OLTP loads. 

What's my experience with pricing, setup cost, and licensing?

Set up a cluster in your cloud of choice, but Databricks' service might also be very competitive as their pricing units will be built in. 

Licensing on site I would counsel against, as on-site hardware issues tend to really delay and slow down delivery.

Which other solutions did I evaluate?

I evaluated Hortonworks, Livy, and Zeppelin. These were unsuitable due to the unavailability of sufficiently skilled personnel.

What other advice do I have?

By investing in people skilled in data querying, Python coding, and even basic Data Science, a Databricks setup will reward the business. 

Once the Databricks data flows are established, it is a matter of a few incremental steps to opening up streaming and running up-to-the-minute queries, allowing the business to build its data-driven processes. 

Databricks continues to advance the state-of-the-art and will be my go-to choice for mission-critical PySpark and ML workflows. 

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer2514822 - PeerSpot reviewer
Associate Machine Learning Engineer at a tech services company with 501-1,000 employees
Real User
Top 10
Provides resources to users quickly without much hassle
Pros and Cons
  • "The most valuable features of the solution are the hardware and the resources it quickly provides without much hassle."
  • "I think setting up the whole account for one person and giving access are areas that can be difficult to manage and should be made a little easier."

What is our primary use case?

I have recently gotten into Databricks and trained on one model. I started using Databricks because of its hardware support and all the other things that it provides, and it is easier to get into. Earlier, when I had to test some part of my code or test if it was working or not, it was not just a fair, not a full production run, but just a fair testing; I had to get a machine, raise a request, get into the whole process. With Databricks, I can just simply create one myself. I could get the resources, whatever they are required, test it out all there, and then go ahead with that, and that is why I have been using it primarily.

What is most valuable?

The most valuable features of the solution are the hardware and the resources it quickly provides without much hassle.

What needs improvement?

I think setting up the whole account for one person and giving access are areas that can be difficult to manage and should be made a little easier.

For how long have I used the solution?

I have experience with Databricks.

What do I think about the stability of the solution?

I think there's a duration after which our training without any activity would expire, which I think is a fair point, and that is the only place where I think this will stop. I haven't come across a lot of problems with Databricks.

What do I think about the scalability of the solution?

The tool is not used as frequently as PyTorch. I don't know why I am comparing Databricks to PyTorch, but I think around five people use it.

How are customer service and support?

I have not contacted the solution's technical support team.

Which solution did I use previously and why did I switch?

Before Databricks, I used to use a cloud support platform.

How was the initial setup?

The solution is deployed on the cloud.

Which other solutions did I evaluate?

I chose Databricks over other products, considering the hardware support it offers.

What other advice do I have?

A little bit of time will be needed to get comfortable with Databricks.

I rate the tool an eight out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user