Try our new research platform with insights from 80,000+ expert users
reviewer1334334 - PeerSpot reviewer
Data Scientist at a retailer with 5,001-10,000 employees
Real User
Quick development, reliable, has interactive clusters, and is priced per usage
Pros and Cons
  • "One of the features provides nice interactive clusters, or compute instances that you don't really need to manage often."
  • "I would like to see more documentation in terms of how an end-user could use it, and users like me can easily try it and implement use cases."

What is our primary use case?

Currently, I am using this solution for a forecasting project.

What is most valuable?

One of the features provides nice interactive clusters, or compute instances that you don't really need to manage often. You can just spin it off and use that for a lot of your pre-processing, which is very convenient. 

The normal features are very good in terms of doing some quick development or doing some EDA.

Also, one of the newest features brought into this solution provides you with a way to solve, deploy, and train models using the platform itself. Or, it can connect to your Azure Machine Learning in order to train, deploy, and productionalize some of the machine learning models.

What needs improvement?

Since the Databricks community is not that old, there is not a lot of information about some of the issues that we face. We have to go back to the Databricks stream to get some of the issue resolutions from there. 

As time passes, and more people start putting more information out there about this technology, wit will be helpful.

I think even with the features that we currently have, they're still optimizing some of the clusters and trying to parallelize to better read from other types of data. So, that's going really well in terms of one of the features that they recently came up with to include the data format for data, which was really good, and that speeds up a lot of the processes.

I would like to see more documentation in terms of how an end-user could use it, and users like me can easily try it and implement use cases.

For how long have I used the solution?

I have been using Databricks on a daily basis for over a year.

It's deployed on the cloud, so it's always up to date.

Buyer's Guide
Databricks
February 2025
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: February 2025.
838,713 professionals have used our research since 2012.

What do I think about the stability of the solution?

It's definitely quite stable, in terms of an enterprise solution. 

I'd say that it's pretty stable. 

You have these clusters running on-demand, and you can also come up with these clusters that are scheduled, and that can be run for your production jobs.

What's my experience with pricing, setup cost, and licensing?

The pricing depends on the usage itself. They measure the cost of the companies in town. It also depends on the type of cluster that you are using. If you are using a very heavy cluster, it would be the price per CPU.

What other advice do I have?

I would rate Databricks an eight out of ten.

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
IT Manager: User Support at a financial services firm with 10,001+ employees
Real User
Great technology that helps us decrease costs
Pros and Cons
  • "It's great technology."
  • "A lot of people are required to manage this solution."

What is our primary use case?

Our primary use case is to decrease costs and prevent any security press on data. I'm an IT manager and we are customers of Databricks. 

What is most valuable?

I think what I value is more about the technology itself because you don't need to have too much knowledge to be able to use the solution. 

What needs improvement?

I think we are using a lot of people to manage this solution. I'd like to see the people using this solution sharing their knowledge. 

For how long have I used the solution?

We've been using this solution for around two years. 

What do I think about the stability of the solution?

The stability is okay now although a month after the data load there was a limitation for the first time on the project. That sorted itself out. 

What do I think about the scalability of the solution?

It's a scalable solution. 

How are customer service and technical support?

We have a good connection with technical support. 

What other advice do I have?

I think the point is that because we'll be working collaboratively in the future, internally and externally, we should compare experiences and exchange knowledge. 

I would rate this solution an eight out of 10. 

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Databricks
February 2025
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: February 2025.
838,713 professionals have used our research since 2012.
it_user1050483 - PeerSpot reviewer
CEO at Inosense
Real User
Great for dealing with huge amounts of data and it is easy to connect to different sources of data
Pros and Cons
  • "We are completely satisfied with the ease of connecting to different sources of data or pocket files in the search"
  • "The integration features could be more interesting, more involved."

What is our primary use case?

Our primary use case is really DevOps, for integration and continuous development. We've combined our database with some components from Azure to deploy elements in Sandbox for our data scientists and for our data engineers. 

What is most valuable?

Valuable features would have to include the Notebook for piping some models and the future of executing the notebooks in parallel, in batches, which is also something that we use. And we use the Notebook on Spark with Python. 

What needs improvement?

Improvements could include the pricing, the product is a little expensive, although I think comparable to other similar options. The integration features could be more interesting, more involved. For example, we use the Database Notebook, which is not as great as Jupyter Notebook, for providing a great user experience. The look and feel are not the same and we've had complaints from some of our users. They say that it's easier and more productive for them to use Jupyter Notebook.

And then there is the integration feature for connecting to data sources, for example, Jupyter Notebook through publishes connect. The problem is that when you do that, you don't get all the Jupyter features which is a shame for us. 

For additional features, having some PyTorch or TensorFlow type features inside would definitely be great. For now, my users are developing for themselves by importing their libraries into their Notebook and then creating models based on the potential flow of PyTorch. That requires a lot of imports, particularly library imports, something that is now available in the new version of  Machine Learning services. These things are very important because the self appliance community has shifted from the traditional way of preparing models, to a deeper learning system. It's now more common to have those features. 

For how long have I used the solution?

I've been using the product inside Azure for about six months now. 

What do I think about the stability of the solution?

Given my experience, the product is very stable. 

What do I think about the scalability of the solution?

The product is quite easy to scale and increasing the number of users is quite simple. 

Which solution did I use previously and why did I switch?

We previously used the earlier version of Azure Machine Learning services and we decided to move over because over time it became more difficult to deploy. That was two years ago, but now with the new version, it's much easier to deploy Machine Learning.

How was the initial setup?

The setup is straightforward, I did it myself. 

What other advice do I have?

The product has improved and I'm sure this will continue in the next versions. We are completely satisfied with it, the ease of connecting to different sources of data or pocket files in the search. 

I think it could be very interesting for users looking for a framework to use Databricks. I would, however, recommend a more complicated architecture for using Databricks and achieving a great result for end-users. 

I would rate this product an eight out of 10. 

Which deployment model are you using for this solution?

Private Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer2058633 - PeerSpot reviewer
Data Engineering Manager at a pharma/biotech company with 10,001+ employees
Real User
A great and easy-to-use platform for data engineers and data scientists who rely on a large dataset to do advanced analytics reporting
Pros and Cons
  • "The most valuable feature is the Spark cluster which is very fast for heavy loads, big data processing and Pi Spark."
  • "It would be great if Databricks could integrate all the cloud platforms."

What is our primary use case?

We use Databricks for data science work in projects that create data pipelines, pre-processing, data wrangling, big data cluster management and ML, machine learning and deep learning tasks.

How has it helped my organization?

Databricks collaborates very well with the Azure platform, Dataiku, and enterprise AI tool. Databricks is a new connection to pull the data or connect to the Spark cluster. It is helpful for us to instance it or distribute the load through the Spark cluster, and it is very user-friendly.

What is most valuable?

The most valuable feature is the Spark cluster which is very fast for heavy loads, big data processing and Pi Spark.

What needs improvement?

Databricks as a solution is integrated with Azure, but Google Cloud has some restrictions. I'm not sure about AWS Cloud, but it would be great if Databricks could integrate all the cloud platforms. Regarding additional features, we would like to see them mostly on the data engineering side, where we have a Spark cluster and some inbuilt ML. In addition, pre-processing steps will be useful.

For how long have I used the solution?

We have been using this solution for two years and are using the latest update.

What do I think about the stability of the solution?

It is a stable solution as long as the Microsoft Azure Platform is stable too.

What do I think about the scalability of the solution?

It is a scalable solution, both vertically and horizontally, which is good. My organization is big, and we have a lot of users. In my department, we have about 15 people using Databricks.

How are customer service and support?

We have not escalated any issues to technical support, but we initially struggled with configuration and the settings of Hive metastore, but we resolved it. I rate the technical support a nine out of ten.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We were using the looped EMR elastic MapReduce from AWS before using Databricks. We switched to Databricks because the whole platform changed from AWS to Azure platform, and Databricks comes as a package.

How was the initial setup?

The initial setup was easy to complete and not complex. It may initially be challenging for a new user, but it improves over time. The CICD pipeline works well with the Microsoft Azure platform because the continuous integration, development and deployment come with the Git integration. It makes it easier for Databricks and the CICD. The deployment should be improved from the perspective of auto ML functionality, so it doesn't have intensive automation learning capability.

We don't use Databricks directly because we work on a data science project. It requires an auto ML and inbuilt machine learning capability. We found capabilities like the large language model using NLP and other deep learning models that are not that intensive. It is meant for data engineering purposes rather than data science purposes. It'll be great if Databricks could be intensive for data science.

We used a third-party, Dataiku platform for the deployment, where we connected to Databricks and completed the ML ops. We required about three people for deployment, and it is easy to maintain the solution.

What was our ROI?

We have seen an ROI but cannot differentiate because it also comes with the Azure platform.

What's my experience with pricing, setup cost, and licensing?

I do not have details about the pricing.

What other advice do I have?

I rate this solution a nine out of ten. Regarding advice, Databricks is a very good platform, popular and easy to use daily for data engineers and data scientists who rely on a large dataset to do advanced analytics reporting. It's a very good tool.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Joaquin Marques - PeerSpot reviewer
CEO - Founder / Principal Data Scientist / Principal AI Architect at Kanayma LLC
Real User
Saves time and effort; thousands of applicable use cases
Pros and Cons
  • "Databricks has improved my organization by allowing us to transform data from sources to a different format and feed that to the analytics, business intelligence, and reporting teams. This tool makes it easy to do those kinds of things."
  • "In the next release, I would like to see more optimization features."

What is our primary use case?

Databricks is very useful and can handle thousands of different use cases. The use cases are all over the place.

How has it helped my organization?

Databricks has improved my organization by allowing us to transform data from sources to a different format and feed that to the analytics, business intelligence, and reporting teams. This tool makes it easy to do those kinds of things.

What is most valuable?

The most valuable Databricks feature for us is that it does not require us to configure clusters. It automatically configures the clusters to the right size, the right number of clusters, the right number of nodes per cluster, et cetera.

What needs improvement?

The area in which this product can be improved is optimization. In the next release, I would like to see more optimization features.

For how long have I used the solution?

I have been using Databricks for a couple of years.

What was our ROI?

I would say the ROI for this solution is expressed mainly in terms of effort and time.

What's my experience with pricing, setup cost, and licensing?

I would advise that they train themselves before using Databricks. They should figure out which advantages Databricks has over just plain Spark and use it to the best advantage that they can.

What other advice do I have?

I am currently implementing the latest version of Databricks.

The Databricks solution is deployed through Cloud.

I would rate the Databricks solution a nine.

Which deployment model are you using for this solution?

Private Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Diego Henrique Da Silva Bastos - PeerSpot reviewer
Data Engineer Analyst at Metyis
Real User
Highly scalable, easy to use, and performs well
Pros and Cons
  • "The most valuable feature of Databricks is the notebook, data factory, and ease of use."
  • "When I used the support, I had communication problems because of the language barrier with the agent. The accent was difficult to understand."

What is our primary use case?

I am using Databricks in my company.

What is most valuable?

The most valuable feature of Databricks is the notebook, data factory, and ease of use.

For how long have I used the solution?

I have been using Databricks for approximately nine months.

What do I think about the stability of the solution?

The performance and stability of Databricks are good. It is quick and I have not had problems.

What do I think about the scalability of the solution?

Databricks is highly scalable.

We have 200 people using the solution in my organization.

How are customer service and support?

When I used the support, I had communication problems because of the language barrier with the agent. The accent was difficult to understand.

Which solution did I use previously and why did I switch?

I have not worked with another solution prior to Databricks.

What's my experience with pricing, setup cost, and licensing?

The price of Databricks is reasonable compared to other solutions.

What other advice do I have?

I rate Databricks an eight out of ten.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Sarbani Maiti - PeerSpot reviewer
Vice President at a tech services company with 51-200 employees
Real User
Very easy to use and requires minimal coding and customizations
Pros and Cons
  • "Easy to use and requires minimal coding and customizations."
  • "Doesn't provide a lot of credits or trial options."

What is our primary use case?

Our primary use case of this product is for our customers who are running large systems and looking for an API -- a quick, easy integration with their own system. We use Databricks to create a secure API interface. I'm vice president of data science and we are customers of Databricks. 

What is most valuable?

Databricks is quite easy to use and requires less coding and customizations than a solution like AWS SageMaker which I'd previously used on a lot of projects. Databricks enables more people to efficiently build and host their ML code. Another great aspect is that MLflow is already integrated with Databricks which makes a big difference. It enables us to track and monitor all our different experiments. We have mostly used the MLflow part and generic notebooks with the ML building machine learning model, as well as using Pytorch for some of our medical imaging. We were able to quickly deploy both these features without requiring anything extra. 

What needs improvement?

I'm struggling a little because I wanted to do some POC solutions. I present a lot of projects in various forums and seminars and there aren't a lot of credits and trial options with Databricks. Even if we want to explore, we're not able to and that's a challenge. The solution is quite expensive.

For how long have I used the solution?

I've been using this solution for a year. 

What do I think about the stability of the solution?

It's currently stable although we have not yet tested it with a huge volume of data. We'll focus on the performance and model serving capability in the near future. We're still carrying out performance testing, developing the models and figuring out the infrastructure.

What do I think about the scalability of the solution?

Scalability is quite good because we just used 128 GB of resources. It's quite easy to scale.

How was the initial setup?

It was relatively simple, we didn't face any challenges. Deployment takes around two days. 

Which other solutions did I evaluate?

We did a PSU in Azure ML Studio which is quite a good solution, easy to deploy and use. It's almost a no-code platform. We've also found Azure ML Studio to be quite cost-effective.

What other advice do I have?

I would recommend trying Databricks because it's cloud agnostic. A lot of customers currently use Azure but want to build something on their own down the track. Databricks makes that easy with its integration with other cloud customers. If somebody wants to build something on their infrastructure or their own virtual cloud, this is a good platform.

I rate the solution eight out of 10 because of the issue I'm having with a lack of trial options.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Senior Data Engineer at TCS
Real User
Supports multiple languages, plenty of Python libraries, but user-interface could improve
Pros and Cons
  • "Databricks is a unified solution that we can use for streaming. It is supporting open source languages, which are cloud-agnostic. When I do database coding if any other tool has a similar language pack to Excel or SQL, I can use the same knowledge, limiting the need to learn new things. It supports a lot of Python libraries where I can use some very easily."
  • "The query plan is not easy with Databrick's job level. If I want to tune any of the code, it is not easily available in the blogs as well."

What is our primary use case?

We are using Databricks to receive the data from Data Lake where we are processing it and doing the transformation, and cleansing. Once it is processed, we are sending the data to the Azure SQL database.

What is most valuable?

Databricks is a unified solution that we can use for streaming. It is supporting open source languages, which are cloud-agnostic. When I do database coding if any other tool has a similar language pack to Excel or SQL, I can use the same knowledge, limiting the need to learn new things. It supports a lot of Python libraries where I can use some very easily.

What needs improvement?

The query plan is not easy with Databrick's job level. If I want to tune any of the code, it is not easily available in the blogs as well.

For how long have I used the solution?

I have been using Databricks for approximately three years.

What do I think about the stability of the solution?

Databricks is stable.

What do I think about the scalability of the solution?

The salability of Databricks is good. However, if I want to use the higher clusters and high concurrency clusters, you will need to wait more time to spin up the clusters. 

We have different teams. Among them, I'm part of the data analytics where, our team, almost 10 people are using it. But I'm not sure about the rest of the teams.

We are using Databricks extensively. We have a team of 10 using the solution.

How was the initial setup?

The initial setup of Databricks is not straightforward. You need to create VLANs, VPNs, and networks. We are two ways of deployment, we are having the legacy PowerShell for the deployment and the template method to deploy the Databricks code to higher levels.

We have not integrated Databricks directly into the DevOps architecture. We are downloading the notebooks manually and we are uploading them.

What's my experience with pricing, setup cost, and licensing?

The billing of Databricks can be difficult and should improve.

Which other solutions did I evaluate?

We have evaluated Azure Synapse and SQL. Both Databricks and Azure Synapse are similar, the UI is the only difference. SQL and Databricks are the same, and one of the largest setbacks is the processing of a lot of data takes a long time.

What other advice do I have?

I rate Databricks a seven out of ten.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
PeerSpot user