Try our new research platform with insights from 80,000+ expert users
reviewer1438992 - PeerSpot reviewer
Cloud & Infra Security, Group Manager at a tech vendor with 10,001+ employees
MSP
A scalable solution to quickly process and analyze streams of information
Pros and Cons
  • "Databricks helps crunch petabytes of data in a very short period of time."
  • "Costs can quickly add up if you don't plan for it."

What is our primary use case?

We are working with Databricks and SMLS in the financial sector for big data and analytics. There are a number of business cases for analysis related to debt there. Several clients are working with it, analyzing data collected over a period of time and planning the next steps in multiple business divisions.

My organization is a professional consulting service. We provide services for the other organizations, which implement and use them in a production environment. We manage, implement, and upgrade those services, but we don't use them.

What is most valuable?

Databricks helps crunch petabytes of data in a very short period of time for data scientists or business analysts. It helps with fraud analysis, finance, projections, etc. I like it.

This is exactly the purpose of big data and analytics. It provides the mechanism to process and analyze a stream of information. It's best for share analysis and stream analysis.

What needs improvement?

Costs can quickly add up if you don't plan for it. 

For how long have I used the solution?

I've been using Databricks for just over a year.

Buyer's Guide
Databricks
November 2024
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
816,406 professionals have used our research since 2012.

What do I think about the stability of the solution?

Databricks is stable. It also helps that their support is included as part of the service.

What do I think about the scalability of the solution?

Databricks is scalable. The only issue is how much money you have for it. For example, if you need to run 100 servers, there's an eight-course with 256 gigabytes of RAM. You run out of money easily. It's charged to your credit card or your account, and you'll have to pay for it if you don't plan for that in advance.

How are customer service and support?

Databricks technical support is excellent. They provided their responses on time, and they're useful. However, I don't have extensive experience with them.

Which solution did I use previously and why did I switch?

I have used different Microsoft solutions before.

How was the initial setup?

The initial setup depends on the readiness of the team working with Databricks. There is no one template saying that it's easy, and it isn't easy. It can be complex to set up if you don't have a really good plan.

You can get in this environment at least for a test. You can do it in the lab, follow it step by step, and that'll take about an hour. The difficulty depends on the business requirements. 

If it's for training purposes, you can do it in about half an hour, and you're good to go. If you need it to support a business, it will be much more rigorous because multiple divisions would be interested in running their own environment, working with their data.

What's my experience with pricing, setup cost, and licensing?

The price is okay. It's competitive. 

What other advice do I have?

If you're thinking of implementing Databricks, I would recommend working with professionals. It'll help you save time. Also, plan the work and work the plan. Otherwise, it'll be a waste of time and money.

On a scale from one to ten, I would give Databricks a nine.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
PeerSpot user
IT Manager: User Support at a financial services firm with 10,001+ employees
Real User
Great technology that helps us decrease costs
Pros and Cons
  • "It's great technology."
  • "A lot of people are required to manage this solution."

What is our primary use case?

Our primary use case is to decrease costs and prevent any security press on data. I'm an IT manager and we are customers of Databricks. 

What is most valuable?

I think what I value is more about the technology itself because you don't need to have too much knowledge to be able to use the solution. 

What needs improvement?

I think we are using a lot of people to manage this solution. I'd like to see the people using this solution sharing their knowledge. 

For how long have I used the solution?

We've been using this solution for around two years. 

What do I think about the stability of the solution?

The stability is okay now although a month after the data load there was a limitation for the first time on the project. That sorted itself out. 

What do I think about the scalability of the solution?

It's a scalable solution. 

How are customer service and technical support?

We have a good connection with technical support. 

What other advice do I have?

I think the point is that because we'll be working collaboratively in the future, internally and externally, we should compare experiences and exchange knowledge. 

I would rate this solution an eight out of 10. 

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Databricks
November 2024
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
816,406 professionals have used our research since 2012.
it_user1050483 - PeerSpot reviewer
CEO at Inosense
Real User
Great for dealing with huge amounts of data and it is easy to connect to different sources of data
Pros and Cons
  • "We are completely satisfied with the ease of connecting to different sources of data or pocket files in the search"
  • "The integration features could be more interesting, more involved."

What is our primary use case?

Our primary use case is really DevOps, for integration and continuous development. We've combined our database with some components from Azure to deploy elements in Sandbox for our data scientists and for our data engineers. 

What is most valuable?

Valuable features would have to include the Notebook for piping some models and the future of executing the notebooks in parallel, in batches, which is also something that we use. And we use the Notebook on Spark with Python. 

What needs improvement?

Improvements could include the pricing, the product is a little expensive, although I think comparable to other similar options. The integration features could be more interesting, more involved. For example, we use the Database Notebook, which is not as great as Jupyter Notebook, for providing a great user experience. The look and feel are not the same and we've had complaints from some of our users. They say that it's easier and more productive for them to use Jupyter Notebook.

And then there is the integration feature for connecting to data sources, for example, Jupyter Notebook through publishes connect. The problem is that when you do that, you don't get all the Jupyter features which is a shame for us. 

For additional features, having some PyTorch or TensorFlow type features inside would definitely be great. For now, my users are developing for themselves by importing their libraries into their Notebook and then creating models based on the potential flow of PyTorch. That requires a lot of imports, particularly library imports, something that is now available in the new version of  Machine Learning services. These things are very important because the self appliance community has shifted from the traditional way of preparing models, to a deeper learning system. It's now more common to have those features. 

For how long have I used the solution?

I've been using the product inside Azure for about six months now. 

What do I think about the stability of the solution?

Given my experience, the product is very stable. 

What do I think about the scalability of the solution?

The product is quite easy to scale and increasing the number of users is quite simple. 

Which solution did I use previously and why did I switch?

We previously used the earlier version of Azure Machine Learning services and we decided to move over because over time it became more difficult to deploy. That was two years ago, but now with the new version, it's much easier to deploy Machine Learning.

How was the initial setup?

The setup is straightforward, I did it myself. 

What other advice do I have?

The product has improved and I'm sure this will continue in the next versions. We are completely satisfied with it, the ease of connecting to different sources of data or pocket files in the search. 

I think it could be very interesting for users looking for a framework to use Databricks. I would, however, recommend a more complicated architecture for using Databricks and achieving a great result for end-users. 

I would rate this product an eight out of 10. 

Which deployment model are you using for this solution?

Private Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer1334334 - PeerSpot reviewer
Data Scientist at a retailer with 5,001-10,000 employees
Real User
Quick development, reliable, has interactive clusters, and is priced per usage
Pros and Cons
  • "One of the features provides nice interactive clusters, or compute instances that you don't really need to manage often."
  • "I would like to see more documentation in terms of how an end-user could use it, and users like me can easily try it and implement use cases."

What is our primary use case?

Currently, I am using this solution for a forecasting project.

What is most valuable?

One of the features provides nice interactive clusters, or compute instances that you don't really need to manage often. You can just spin it off and use that for a lot of your pre-processing, which is very convenient. 

The normal features are very good in terms of doing some quick development or doing some EDA.

Also, one of the newest features brought into this solution provides you with a way to solve, deploy, and train models using the platform itself. Or, it can connect to your Azure Machine Learning in order to train, deploy, and productionalize some of the machine learning models.

What needs improvement?

Since the Databricks community is not that old, there is not a lot of information about some of the issues that we face. We have to go back to the Databricks stream to get some of the issue resolutions from there. 

As time passes, and more people start putting more information out there about this technology, wit will be helpful.

I think even with the features that we currently have, they're still optimizing some of the clusters and trying to parallelize to better read from other types of data. So, that's going really well in terms of one of the features that they recently came up with to include the data format for data, which was really good, and that speeds up a lot of the processes.

I would like to see more documentation in terms of how an end-user could use it, and users like me can easily try it and implement use cases.

For how long have I used the solution?

I have been using Databricks on a daily basis for over a year.

It's deployed on the cloud, so it's always up to date.

What do I think about the stability of the solution?

It's definitely quite stable, in terms of an enterprise solution. 

I'd say that it's pretty stable. 

You have these clusters running on-demand, and you can also come up with these clusters that are scheduled, and that can be run for your production jobs.

What's my experience with pricing, setup cost, and licensing?

The pricing depends on the usage itself. They measure the cost of the companies in town. It also depends on the type of cluster that you are using. If you are using a very heavy cluster, it would be the price per CPU.

What other advice do I have?

I would rate Databricks an eight out of ten.

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer1276782 - PeerSpot reviewer
Data Scientist at a energy/utilities company with 10,001+ employees
Real User
Has a good feature set but it needs samples and templates to help invite users to see results
Pros and Cons
  • "Imageflow is a visual tool that helps make it easier for business people to understand complex workflows."
  • "The product needs samples and templates to help invite users to see results and understand what the product can do."

What is our primary use case?

I am a data scientist here and that is my official role. I own the company. Our team is quite small at this point. We have around five people on the team and we are working with about five different businesses. The projects we get from them are massive undertakings. Each of us on the team takes multiple roles in our company and we use multiple tools to help best serve our clients. We are trying to look at creative ways that different solutions can be integrated and we try to understand what products we can use to create solutions for client companies that will be effective in meeting their needs.  

We are personally using Databricks for certain projects where we want to consider creating intelligent solutions. I have been working on Databricks as part of my role in this company, trying to see if there are any kind of standard products that we can use with it to create solutions. We know that Databricks integrates with Airflow, so that is something that we are exploring right now as a potential solution for enabling a creative response. We are exploring the cloud as an option. Databricks is available in Azure and we are currently figuring out the viability of using that as a cloud platform. So we are exploring the way Databricks and Azure integrate at the same time to give us this type of flexibility.  

What we use it for right now is more like asset management. If we have a lot of assets and we get a lot of real-time data, we certainly want to do some processing on some of this data, but you do not want to have to work on all of it in real-time. That is why we use Databricks. We push the data from Azure through Databricks and work on the data algorithm in Databricks and execute it from Azure with probably an RPA (Robotic Process Automation) or something of that sort. It intelligently offloads real-time processing.  

What is most valuable?

Of the available feature set, I like the Imageflow feature a lot. It is very interesting. It gives me clarity on the execution of a process. I can draw the complete flow from start to finish in the exact way that I want it to execute. It is more visual and it is also easier for the people in businesses where I make presentations to understand.  

When I demonstrate a process to a business and show them the approach I am taking using code and technical language, then of course not many are going to understand that. But when I show them the process in terms of the graphical layout Imageflow helps provide, then they will be able to understand it much easier. They understand why I am choosing a particular way of executing the process and why I am taking certain steps in the way I have chosen to do it. The point is to help other people understand the solution more clearly.  

What needs improvement?

I think the automatic categorization of variables needs to be improved. The current functionality is not always efficiently identifying the features of the data that is collected. Probably that is the only thing I can think of. Apart from that, I have not explored the product enough yet to go into more depth because there is only one asset project that I have taken on right now. Because I own this company, I have been doing more to run it than to explore this product very deeply. But when you get any form of data inside there, if it could understand what type of variables there are and what features the data has, it would help massively in taking processing to the next step. If it does not exactly identify the variables you may have to modify them a little. Apart from working with Databricks to understand its capabilities, I am also trying to learn Apache Spark right now. Some members of my team want to work with Apache Spark as a solution and at this point, we are evaluating both and we are planning to use Spark or Databricks.  

As far as what might be added, some custom algorithm samples would be useful. All of the other products of this type — Azure, AWS, SageMaker — they all have customizable algorithms. You have the capability to implement a sort of workflow from that by modifying things in the sample and changing it to fit your purposes. Probably that is something that might help in doing some small NDP (Near-Data Processing) development. It might not help in the project directly, but it will help while we work on some NDP development of our own so that we can quickly evaluate how something is going to work. Templates or other samples could make working on things easier.  

That would also help massively in getting people to understand the potential of what the product can actually do. But I also think not many people would strongly agree with this. Many people go to the first solution they can think of that they know very well already in the IT field even if they could imagine that something could be better.  

To get the value out of this technology, people will need to come to accept it. Technical people will accept Databricks more if they understand that this is something that they can use and start working on without a lot of experience. Adopting it will take time for new users who have no experience. But to feel like they can have success with a product, they have to execute something in a very short time and see how it can work. When you talk about AI — or really when you talk about anything new — people do not initially want to invest the time in discovery. These processes do take time to learn, but with templates or samples, you get to see immediately what the possibilities are and what you might get out of it. Then when they try something of their own and are able to get it working in less than a week's time, they will be encouraged to look into the product and the technology some more.  

For how long have I used the solution?

We have been using the Databricks product for approximately three months.  

What do I think about the stability of the solution?

It is very hard to comment on the stability right now. We will need more time to experience the product in actual usage to render any opinions about stability accurately at that level.  

What do I think about the scalability of the solution?

We have not really gotten to the point of scaling and testing scalability at this point. We only have two people involved with the product. One is a data scientist and one is a data engineer.  

How was the initial setup?

The initial setup was not complex at all. The documentation is good. It is clear and not very difficult to understand. Because the documentation is good, the installation is fine.  

We did the implementation by ourselves — within our team and with the help of the documentation. But I would not say that we have already deployed the model yet. This is an ongoing process, as there are certain inputs that changed over time.  

So we have not implemented the product completely, but we have gotten to advance with the product and our understanding of it. It is good, but our company is still trying to get much better data from it. At this point, it is like the data is just junk and more junk. So we are now working toward that goal of improving the result. Whenever the data result gets better, we'll try to implement the workflow to see how it performs. I would say it will probably take two to three months more before we actually get good data.  

Which other solutions did I evaluate?

I did have some experience with SageMaker before looking at Databricks, but apart from we have not been looking into any of the other solutions that are available. We were just exploring a few of the different solutions that the members of the team already have experience with. Most of the team came to our company with some experience using Azure, and most of them came with experience in EBS (Elastic Block Store) and some of them come with experience on various other platforms. We wanted to mine that knowledge and just explore some of these possibilities to see which one works with all of us as a team.  

What other advice do I have?

On a scale from one to ten where one is the worst and ten is the best, I would rate Databricks overall as around a 7 or 7.5. If we had more experience with it and could be sure we had a solid understanding of what it could do and the reliability, I might recommend it with a better score. I do not think I should give it more than a seven for now.  

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer1901577 - PeerSpot reviewer
Cloud Administrator at a retailer with 5,001-10,000 employees
Real User
Top 20
A simple and stable solution that can help with business engineering
Pros and Cons
  • "The solution is very simple and stable."
  • "The tool should improve its integration with other products."

What is our primary use case?

We use the solution for business engineering.

What is most valuable?

The solution is very simple and stable.

What needs improvement?

The tool should improve its integration with other products.

For how long have I used the solution?

I have been using the solution for around two years.

What do I think about the stability of the solution?

I would rate the product’s stability a seven out of ten.

What do I think about the scalability of the solution?

I would rate the tool’s scalability a seven out of ten.

How was the initial setup?

The solution is very easy to setup. I would rate its setup a ten out of ten.

What's my experience with pricing, setup cost, and licensing?

I would rate the tool’s pricing an eight out of ten.

What other advice do I have?

The tool’s performance is great. I would rate it an eight out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Business Intelligence Coordinator Latam at a construction company with 5,001-10,000 employees
Real User
The capacity of use of the different types of coding is valuable
Pros and Cons
  • "The capacity of use of the different types of coding is valuable. Databricks also has good performance because it is running in spark extra storage, meaning the performance and the capacity use different kinds of codes."
  • "There would also be benefits if more options were available for workers, or the clusters of the two points."

What is our primary use case?

My company is a customer of Databricks. We use Data Science products for machine learning, engineering, and data preparation.

We have between five and eight people working on coding in Databricks. Indirectly, we have 1500 people consuming the data. We have plans to increase the usage of data bricks by 30% next year.

What is most valuable?

The capacity of use of the different types of coding is valuable. Databricks also has good performance because it is running in spark extra storage, meaning the performance and the capacity use different kinds of codes.

What needs improvement?

Databricks does not always have clear updates. Often we find an update in the tool but we are not really sure what has changed. We would appreciate better communication from Databricks. It could be in the form of a friendly warning that talks about the updates. 

There would also be benefits if more options were available for workers, or the clusters of the two points.

For how long have I used the solution?

I have been using Databricks for two years.

What do I think about the stability of the solution?

Databricks is stable, however, we do find some errors and don't understand what has happened. Usually, they are resolved within a few minutes. I would say it is 95% stable.

What do I think about the scalability of the solution?

Scalability is really good.

How are customer service and support?

I have not had to contact Databrick's support other than through the deployment, which they helped a lot. 

How was the initial setup?

The initial setup of Databricks is straightforward and simple. It is not complex because they provide a lot of documentation. The deployment was fast, it took less than three days with five people assigned to the task.

What about the implementation team?

We implemented in-house. It is difficult to find a good consultant or reseller for Databricks in Brazil.

What's my experience with pricing, setup cost, and licensing?

We pay monthly on a pay as you go plan.

What other advice do I have?

With Databricks, you may have a lot of devices. It is important to use each cluster for each kind of process and then not use the small clusters. Using the bigger cluster you will receive better performance and the use is closer and will save you money. 

It is important to code it in parts because if you code it all in full you could find some problems with performance.

I would rate Databricks a 9 out of 10.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer1393860 - PeerSpot reviewer
Chief Research Officer at a consumer goods company with 1,001-5,000 employees
Real User
Ability to work collaboratively without concerns regarding the infrastructure is very beneficial to us
Pros and Cons
  • "Ability to work collaboratively without having to worry about the infrastructure."
  • "Would be helpful to have additional licensing options."

What is our primary use case?

Our primary use case of Databricks is for advanced analytics. I'm the chief research officer of the company and we're customers of Databricks.  

What is most valuable?

I think the features I like the most are the scalability of the solution as well as its ability to share. We work with multiple people on notebooks and it enables us to work collaboratively in an easy way without having to worry about the infrastructure. I think the solution is very intuitive, very easy to use. And that's what you pay for.

What needs improvement?

I'd like to see more licensing options for the solution, the availability of additional pricing tiers. I understand it's not easy to achieve because it's a kind of platform-as-a-service type of solution. If you wanted to be more specific about the parts, and what you might or might not need, then you could save some money, and go for a lower level. Of course, that would then mean you'd have to manage more configurations which, as a user, would make things more complex but it would be good to have that option. The pricing is not the cheapest but it's understandable because it's a very high-end solution and easy to use, there's a lot of complexity masked away.

I would like to see additional monitoring tools and, in general, anything that can improve visualization of data. I know it's not the main point of Databricks and there are other tools that can be used, but anything that facilitates the integration of Databricks with visualization tools could be really useful. Increasing data scalability would also be great. 

For how long have I used the solution?

I've been using this solution for a year. 

What do I think about the stability of the solution?

The solution has been very stable. 

What do I think about the scalability of the solution?

Scalability of the solution seems very easy to achieve. 

How are customer service and technical support?

We haven't had contact with technical support. 

How was the initial setup?

The initial set was very straightforward because it's also in our Azure cloud so it was quite easy to set up and configure. Very intuitive.

What other advice do I have?

I would rate this solution an eight out of 10. 

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros sharing their opinions.
Updated: November 2024
Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros sharing their opinions.