Try our new research platform with insights from 80,000+ expert users
Nabil Fegaiere1 - PeerSpot reviewer
Chief Executive Officer at dotFIT, LLC
Real User
Top 10
A powerful solution that is easily integrated into a variety of platforms
Pros and Cons
  • "It's very simple to use Databricks Apache Spark."
  • "I would like more integration with SQL for using data in different workspaces."

What is our primary use case?

I am a Databricks service partner, and my customers use Azure Databricks and Data Factory.

What is most valuable?

It's very simple to use Databricks Apache Spark. It's really good for parallel execution to scale up the workload. In this context, the usage is more about virtual machines.

Using meta-stores like Hive was optional, and the solution is good for data science use cases. With the Authenticator Log, Databricks is good for data transformation and BI usage. We have a platform.

What needs improvement?

I would like more integration with SQL for using data in different workspaces. We use the user interface for some functionalities, while for others, we have to use SQL to create data sets and grant permissions. For example, when creating a cluster, we have to create it with some API or user interface. Creating a cluster with some properties using SQL grants the possibility of using SQL syntax. Integration with SQL will make Databricks easier to use by people who have experience with databases like Lakehouse, and they would be able to use the data lake and BI. More integration will help have one point of view for everyone using SQL syntax.

Integration with Kubernetes could also be good for minimizing the price because you can use Kubernetes instead of virtual machines. But that won't be easy.

For how long have I used the solution?

I have worked with the solution for four or five years, with some experience since 2016.

Buyer's Guide
Databricks
November 2024
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
816,406 professionals have used our research since 2012.

What do I think about the stability of the solution?

The solution is stable. The only problem with stability would be that people are not using it efficiently.

What do I think about the scalability of the solution?

The solution is good for scalability.

How was the initial setup?

When we have administration experience, the solution is not difficult to deploy. Technically, however, it's difficult because governance is more complex. For example, I have two warehouses on Databricks, which are clusters in this workspace, and we have to switch from workspace to workspace to have all this information. There is a system table that has all this, but I don't know if everyone can use these tables.

What's my experience with pricing, setup cost, and licensing?

Databricks are not costly when compared with other solutions' prices.

Which other solutions did I evaluate?

Databricks's functionalities are as good as solutions like Snowflake, BigQuery, and Redshift.

What other advice do I have?

People sometimes do not use the solution efficiently. They misunderstand databases, the usage of tables, and the performance. Many data engineers are very junior and don't have skills in that. Stability is more a customer problem than a problem with the product itself. One possible problem with the product is that there's no method to pause the usage of something. For example, we have to use the meta server or the data catalog in Synapse. But in Databricks, we have a choice to use a catalog or not, or Hive, which is always integrated, but we have to choose whether to use it or not. Many customers directly use the passes on Databricks, which causes performance and governance problems.

I can offer a lot of advice on Databricks, and one is to use meta stores like Unity Catalog or Hive Metastore. For incoming use cases, it's better to use Unity Catalog.

I rate Databricks a nine out of ten.

Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
PeerSpot user
Anand Sharma - PeerSpot reviewer
Sr Data Engineer at PIMCO
Real User
Supports several coding languages, good performance, and facilitates team collaboration
Pros and Cons
  • "The load distribution capabilities are good, and you can perform data processing tasks very quickly."
  • "In the future, I would like to see Data Lake support. That is something that I'm looking forward to."

What is our primary use case?

Our primary use case is ETL.

How has it helped my organization?

Using Databricks enables us to use the Data Mesh methodology, where every team performs their own ETL.

What is most valuable?

The most valuable feature is the versatility of the ecosystem. You can write code in SQL, Python, or Java.

The load distribution capabilities are good, and you can perform data processing tasks very quickly.

You can save and share notebooks between different teams.

The interface is easy to use.

What needs improvement?

The cost of this solution is high, on the expensive side.

In the future, I would like to see Data Lake support. That is something that I'm looking forward to.

For how long have I used the solution?

I worked with Databricks for approximately two years in my previous company.

What do I think about the scalability of the solution?

This is a very scalable solution. We have twenty-five data engineers that use it, and we may grow our usage.

How are customer service and support?

The technical support is okay. I would rate them a seven out of ten.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

We did not use another similar solution prior to Databricks.

How was the initial setup?

The cloud-based deployment is simple.

If you use an on-premises deployment then there is more to do.

What about the implementation team?

We deployed it with our in-house team.

There is no maintenance required.

What was our ROI?

We have seen a return on our investment with Databricks.

What's my experience with pricing, setup cost, and licensing?

Price-wise, I would rate Databricks a three out of five.

Which other solutions did I evaluate?

When we looked into Databricks, we evaluated Azure Data Factory and some of the others on the market. We found that Databricks was one of the easiest ones to use.

What other advice do I have?

My advice for anybody that is looking into Databricks is not to use the on-premises deployment. Instead, use the cloud-based setup.

In summary, this is a good product.

I would rate this solution an eight out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Databricks
November 2024
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
816,406 professionals have used our research since 2012.
Heba Ismail - PeerSpot reviewer
Senior Data Engineer at a computer software company with 1,001-5,000 employees
Real User
Enhancing data integration and processing across cloud services with seamless transformations
Pros and Cons
  • "It helps integrate data science and machine learning capabilities."
  • "Performance could be improved."

What is our primary use case?

I work in a project where I build data pipelines using Azure Data Factory. I ingest data from on-premises to Azure Data Lake. After that, I perform transformations using Databricks notebooks and Spark, building the Databricks bronze, silver, and gold layers. We export reports from the gold layer.

How has it helped my organization?

Recently, we started using Databricks in our organization. It helps integrate data science and machine learning capabilities.

What is most valuable?

The Unity Catalog is a central governance for all data around the workspaces, and also Databricks' integration capabilities with cloud services like Azure Event Hub and Azure Data Factory. It is user-friendly for data processing, and Spark is a strong language for big data processing.

What needs improvement?

Performance could be improved. It is crucial to check coding, configure Spark correctly, implement caching, and monitor performance metrics to enhance performance.

For how long have I used the solution?

I have used Databricks for over two years.

What do I think about the stability of the solution?

I would rate stability as eight out of ten. It is quite stable.

What do I think about the scalability of the solution?

Databricks is perfect for scalability. It is easy to scale clusters.

How are customer service and support?

I haven't faced any issues requiring customer support, so I don't have experience with their customer support.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We used Informatica before, which is perfect for data management solutions. We started using Databricks for its capabilities in data science and machine learning.

How was the initial setup?

I would rate the initial setup as nine out of ten. It is quite easy for someone experienced with Spark.

What's my experience with pricing, setup cost, and licensing?

For my company, it's okay to upgrade to Databricks because it's comparable in price to Informatica. It is not considered expensive for the company.

Which other solutions did I evaluate?

For machine learning, I used Python and its libraries manually. Prior to Databricks, there was no special tool used for these purposes.

What other advice do I have?

If a company focuses on data science and machine learning, I recommend using Databricks. It's a great solution in this field. For data management needs, Informatica is advantageous due to its comprehensive tools.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Karan  Sharma - PeerSpot reviewer
Data Analyst at Allianz
Real User
Top 10
An easy to setup tool that provides its users with an insight into the metadata of the data they process
Pros and Cons
  • "The initial setup phase of Databricks was good."
  • "Scalability is an area with certain shortcomings. The solution's scalability needs improvement."

What is our primary use case?

My company uses Databricks to process real-time and batch data with its streaming analytics part. We use Databricks' Unified Data Analytics Platform, for which we have Azure as a solution to bring the unified architecture on top of that to handle the streaming load for our platform.

What is most valuable?

The most valuable feature of the solution stems from the fact that it is quite fast, especially regarding features like its computation and atomicity parts of reading data on any solution. We have a storage account, and we can read the data on the go and use that since we now have the unity catalog in Databricks, which is quite good for giving you an insight into the metadata of the data you're going to process. There are a lot of things that are quite nice with Databricks.

What needs improvement?

Scalability is an area with certain shortcomings. The solution's scalability needs improvement.

For how long have I used the solution?

I have been using Databricks for a few years. I use the solution's latest version. Though currently my company is a user of the solution, we are planning to enter into a partnership with Databricks.

What do I think about the stability of the solution?

It is a stable solution. Stability-wise, I rate the solution an eight to nine out of ten.

What do I think about the scalability of the solution?

It is a scalable solution. Scalability-wise, I rate the solution an eight to nine out of ten.

My company has a team of 50 to 60 people who use the solution.

How are customer service and support?

Sometimes, my company does need support from the technical team of Databricks. The technical team of Databricks has been good and helpful. I rate the technical support an eight out of ten.

How would you rate customer service and support?

Positive

How was the initial setup?

The initial setup phase of Databricks was good. You can spin up clusters and integrate those with DevOps as well. Databricks it's quite nice owing to its user-friendly UI, DPP, and workspaces.

The solution is deployed on the cloud.

The time taken for the deployment depends on the workload.

What's my experience with pricing, setup cost, and licensing?

I cannot judge whether the product is expensive or cheap since I am unaware of the prices of the other products, which are competitors of Databricks. The licensing costs of Databricks depend on how many licenses we need, depending on which Databricks provides a lot of discounts.

What other advice do I have?

It is a state-of-the-art product revolutionizing data analytics and machine learning workspaces. Databricks are a complete solution when it comes to working with data.

I rate the overall product an eight out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Avadhut Sawant - PeerSpot reviewer
Consulting Architect at a computer software company with 10,001+ employees
Real User
Ahead of the competition in building data ecosystems, but needs to improve ease-of-use
Pros and Cons
  • "A very valuable feature is the data processing, and the solution is specifically good at using the Spark ecosystem."
  • "Generative AI is catching up in areas like data governance and enterprise flavor. Hence, these are places where Databricks has to be faster."

What is our primary use case?

I worked with Databricks pretty recently. The particular design processes involved in Databricks were also a part of that specific design/architectural process.

We have used the solution for the overall data foundation ecosystem for processing and storage on a Delta format. We have also seen use cases where we were trying to establish advanced analytics models and data sharing where we leverage the Delta Sharing capabilities from Databricks.

What is most valuable?

A very valuable feature is the data processing, and the solution is specifically good at using the Spark ecosystem.

What needs improvement?

There are some aspects of Databricks, like generative AI, where they are positioning things like DALL-E. They're a little bit late to the game, but I think there are some things that they are working on. Generative AI is catching up in areas like data governance and enterprise flavor. Hence, these are places where Databricks has to be faster, and even though they are fast, I'm not sure how they'll catch up and get adopted because there are strong players in the market.

Databricks is coming up with a few good things in terms of integration. But I have to put one point forward that covers multiple aspects, which is the ease of use for the end user while operating this particular tool. For example, a tool like ADS gives you a GUI-based development, which is good for the end user who does development or maintenance. Looking at the complexities of data integration, a GUI might not be easy, but Databricks should embrace something on the graphical user development front because it is currently notebook-driven. Also, in terms of accessing the data for the end user, Databricks has an SQL interface, similar to earlier tools like SQL Management Studio. Since people are mostly comfortable with SSMS already or not, Databricks can build integration to known tools for data access, and that also helps, apart from what they're doing. I would like to see improvements with respect to user enablement, which is a good part of enterprise strategy. I would like to see their integration with a broader ecosystem of products. If you have to do data governance in tools like Microsoft Purview, it's manual and difficult. Now, I'm unsure if that momentum must be from Databricks or Microsoft. But it would be good if Databricks had some open interfaces to share metadata, which could be viewed in tools enabling data governance like Collibra, Purview, or Informatica. The improvement has to do with user and metadata integration for tools.

For how long have I used the solution?

I've worked with Databricks for over five or six years, but it's been on and off.

What do I think about the scalability of the solution?

The solution is scalable. In this particular ecosystem, there is no one else who can catch up with Databricks for now.

How are customer service and support?

Databricks' customer support is very good. They have a lot of ways in which they interact with vendors and service partners across the globe. They have periodic touch-up sessions with vendors, where their engineers answer your questions.

How was the initial setup?

The implementation is not challenging because the solution integrates well with the platforms on which they are established, whether it's Azure, AWS, or GCP. The solution is not difficult to set up, but you'd probably need a technical user to operate it.

It's the same story with maintenance, where you'd need a technically proficient person with programming knowledge to maintain it.

What other advice do I have?

Databricks integrates many enterprise processes because data processing and AIML are a small part of a larger ecosystem. Databricks has been a part of other platforms, and they are trying to establish their platform, which is a good direction.

Most of the capabilities of the underlying platform can be leveraged there. But the setup isn't difficult if the database lacks some capability, you can't find it in the database, or you're not comfortable with a certain feature in the database. It integrates well with the underlying platform. For example, with scheduling, let's say you are uncomfortable with workflow management. You can utilize integrations with EDA for any other tool and probably perform scheduling. Even if what you're trying to do is not easy, it is enabled with integration. Either they build a required feature in their tool later on, like a GUI, or you perform integrations to make the features possible.

We did evaluate licensing costs, but it had more to do with the Azure ecosystem pricing since whatever we are doing has more to do with Azure Databricks. Many optimizations are recommended, but we haven't exercised those for now. But considering that the processing is a bit more efficient, the overall price won't be much different from what it could be for any other similar component or technology. We haven't had specific discussions with Databricks' folks on pricing.

My advice to users who would like to start working with Databricks is that it is a good solution to work with for data integration and machine learning. Databricks is maturing for other use cases, so there are two points to be considered. One is that you need to evaluate how they will mature, which will be on a case-to-case basis. Second, how will it align with the overall platform story? There will be many overlapping aspects over there as Databricks expands its capabilities. In that case, it must be considered that if those capabilities overlap, how will the underlying platform vendors handle it? How would that interplay happen if many of Databricks' new capabilities align with Microsoft Fabric? That has to be very carefully considered. Otherwise, if you utilize those new capabilities, there might be a discontinuity where you cannot use Databricks because the platform does not support that.

If I specifically talk about Spark-based processing transformations, the data integration story, and advanced stability, I would rate Databricks around eight out of ten. However, with respect to new capabilities like cataloging, data governance, and security integration, I rate Databricks around five because it has to establish these features. And since Databricks integrates with platforms, we must see the interplay with the platforms' capabilities.

I overall rate Databricks a seven out of ten.

Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
PeerSpot user
Shiva Prasad ELLUR - PeerSpot reviewer
Vice President - Data Engineering and Analytics at a financial services firm with 10,001+ employees
Real User
Top 10
A good, but expensive, web-based platform for automated cluster management with some coding limitations
Pros and Cons
  • "We like that this solution can handle a wide variety and velocity of data engineering, either in batch mode or real-time."
  • "This solution only supports queries in SQL and Python, which is a bit limiting."

What is our primary use case?

We use this solution for advanced civilization power.

What is most valuable?

We like that this solution can handle a wide variety and velocity of data engineering, either in batch mode or real-time.

This product allows us to write the email models in a way that allows us to take the advantage of the parallel scaling computer window backend on any of the satellite services.

What needs improvement?

This solution only supports queries in SQL and Python, which is a bit limiting. 

This is a fairly expensive solution for any service outside of the basic package, and costs can add up quite quickly if there are large scaling requirements.

What do I think about the stability of the solution?

This is a stable solution in our experience.

What do I think about the scalability of the solution?

We have found that part of the beauty of this platform is that it is easy to scale and expand.

How are customer service and support?

The support for this product uses Microsoft as a middle man, and due to this there have been times when we experienced communication delays, as well as misunderstandings of what our issues are.

How would you rate customer service and support?

Neutral

How was the initial setup?

The initial setup for this solution is very simple.

What's my experience with pricing, setup cost, and licensing?

The basic version of this solution is now open-source, so there are no license costs involved. However, there is a charge for any advanced functionality and this can be quite expensive.

Which other solutions did I evaluate?

We looked at both Snowflake and BigQuery as a comparison with this solution. We choose this product as it offered more scalability and a higher level of security, which is extremely important in our banking environment.

What other advice do I have?

We would rate this solution an eight out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Sr. BigData Architect at ITC Infotech
MSP
Very elastic, easy to scale, and a straightforward setup
Pros and Cons
  • "It's easy to increase performance as required."
  • "Instead of relying on a massive instance, the solution should offer micro partition levels. They're working on it, however, they need to implement it to help the solution run more effectively."

What is our primary use case?

We work with clients in the insurance space mostly. Insurance companies need to process claims. Their claim systems run under Databricks, where we do multiple transformations of the data. 

What is most valuable?

The elasticity of the solution is excellent.

The storage, etc., can be scaled up quite easily when we need it to.

It's easy to increase performance as required.

The solution runs on Spark very well.

What needs improvement?

Instead of relying on a massive instance, the solution should offer micro partition levels. They're working on it, however, they need to implement it to help the solution run more effectively.

They're currently coming out with a new feature, which is Date Lake. It will come with a new layer of data compliance.

For how long have I used the solution?

We've been using the solution for two years.

What do I think about the stability of the solution?

I don't see any issues with stability going down to the cluster. It would certainly be fine if it's maintained. It's highly available even if things are dropped. It will still be up and running. I would describe it as very reliable. We don't have issues with crashing. There aren't bugs and glitches that affect the way it works.

What do I think about the scalability of the solution?

The system is extremely scalable. It's one of its greatest features and a big selling point. If a company needs to scale or expand, they can do so very easily.

We require daily usage from the solution even though we don't directly work with Databricks on a day to day basis. Due to the fact that we schedule everything we need and it will trigger work that needs to be done, it's used often. Do you need to log into the database console every day? No. You just need to configure it one time and that's it. Then it will deliver everything needed in the time required.

How are customer service and technical support?

We use Microsoft support, so we are enterprise customers for them. We raise a service request for Databricks, however, we use Microsoft. Overall, we've been satisfied with the support we've been given. They're responsive to our needs.

Which solution did I use previously and why did I switch?

We work with multiple clients and this solution is just one of the examples of products we work with. We use several others as well, depending on the client.

It's all wrappers between the same underlying systems. For example, Spark. It's all open-source. We've worked with them as well as the wrappers around it, whether the company was labeled Databrary, IBM insights, Cloudera, etc. These wrappers are all on the same open-source system.

If we with Azure data, we take over Databricks. Otherwise, we have to create a VM separately. Those things are not needed because Azure is already providing those things for us.

How was the initial setup?

The situation may have been a bit different for me than for many users or organizations. I've been in this industry for more than 15 or 17 years. I have a lot of experience. I also took the time to do some research and preparation for the setup. It was straightforward for me.

The deployment with Microsoft usually can be done in 20 minutes. However, it can take 40 to 45 minutes to complete. An organization only requires one person to upload the data and have complete access to the account.

What about the implementation team?

I deployed the solution myself. I didn't require any assistance, so I didn't enlist any resellers or consultants to help with the process.

What's my experience with pricing, setup cost, and licensing?

The solution is expensive. It's not like a lot of competitors, which are open-source.

What other advice do I have?

There isn't really a version, per se. 

It's a popular service. I'd recommend the solution. The solution is cloud-agnostic right now, so it really can go into any cloud. It's the users who will be leveraging installed environments that can have these services, no matter if they are using Azure or Ubiquiti, or other systems.

I don't think you can find any other tool or any other service that is faster them Databricks. I don't see that right now. It's your best option.

Overall, I'd rate the solution eight out of ten. The reason I'm not giving it full marks is that it's expensive compared to open source alternatives. Also, the configuration is difficult, so sometimes you need to spend a couple of hours to get it right.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer2041779 - PeerSpot reviewer
Principal at a computer software company with 5,001-10,000 employees
Real User
Top 20
Has advanced modeling and machine-learning features; highly scalable, with no stability issues
Pros and Cons
  • "What I like about Databricks is that it's one of the most popular platforms that give access to folks who are trying not just to do exploratory work on the data but also go ahead and build advanced modeling and machine learning on top of that."
  • "I have had some issues with some of the Spark clusters running on Databricks, where the Spark runtime and clusters go up and down, which is an area for improvement."

What is our primary use case?

I've worked with Databricks primarily in the pharmaceuticals and life sciences space, which means a lot of work on patient-level data and the predictive analytics around that.

Another use case for Databricks is in the manufacturing industry. I'm a consultant, so the use cases for the product vary, but my primary use case for it is in the pharma space.

What is most valuable?

From a data science and applied analytics perspective, what I like about Databricks is that it's probably one of the most popular platforms that give access to folks who are trying not just to do exploratory work on the data but also go ahead and build advanced modeling and machine learning on top of that, and then go ahead and make that available for dissemination of insights. For example, you can save all data and build out endpoints, so business analysts and users can access that data through a dashboard.

During the process, I also like that Databricks allows you to do portion control to keep track of your operations on the data and maintain that lineage to create reproducible results. 

The most significant Databricks advantage is that you can do everything within the platform. You don't need to exit the platform because it's a one-stop shop that can help you do all processes.

The solution is top-notch from a data science, applied ML, or advanced analytics perspective.

What needs improvement?

I have had some issues with some of the Spark clusters running on Databricks, where the Spark runtime and clusters go up and down, which is an area for improvement. Still, I am generally unaware of any super-critical issues.

For how long have I used the solution?

My experience with Databricks is two and a half years.

What do I think about the stability of the solution?

Databricks stability is an eight out of ten because I never had issues with its stability.

What do I think about the scalability of the solution?

Databricks has high scalability. Most of my work on the solution has been in the pharma space, which has massive data sets, so it's a nine out of ten, scalability-wise.

How are customer service and support?

I've never dealt with the Databricks technical support team.

How was the initial setup?

I don't have experience setting up Databricks because that's generally taken care of by the IT, data, or software engineering team before the data science team comes in and starts leveraging the platform. I have yet to experience setting up the Databricks environment personally. However, I have had experience setting up clusters, which was pretty straightforward. Still, in the overall environment of an enterprise-wide system, I have yet to gain experience setting Databricks up.

What's my experience with pricing, setup cost, and licensing?

The cost for Databricks depends on the use case. I work on it as a consultant, so I'm using the client's Databricks, so it depends on how big the client is. If it's a global organization, that cost varies versus a smaller organization that has just adopted the platform and is trying to onboard a small team of five people. It depends.

What other advice do I have?

I'm a data scientist, so I frequently use Databricks and Domino Data Science Platform.

I'm a consultant, so every client has a different version or a different runtime in Databricks, so the versions used would vary per client.

The deployment for the solution is on the cloud, predominantly on AWS or Azure.

My clients adopted Databricks as the platform of choice, and with different use cases and more teams coming on board, the usage of Databricks will increase. I don't see that going down. It can only go up.

My advice to anyone looking into implementing Databricks is that it should be one of your top choices, especially if you're looking to focus on data processing, standard ETL operations, advanced analytics, or the ML type of work.

I'd rate the solution as nine out of ten. It checks almost all the boxes that modern applications need to have.

My organization is an active partner and implementer of Databricks, but it doesn't resell the solution.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
PeerSpot user
Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros sharing their opinions.
Updated: November 2024
Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros sharing their opinions.