Databricks and Google Cloud Dataflow are competitive platforms in data analytics. Databricks holds an edge due to its advanced analytics and machine learning capabilities, although its higher cost is a factor, whereas Google Cloud Dataflow presents a cost-effective option with significant connectivity and scalability.
Features: Databricks offers a unified workspace with extensive support for various programming languages, large-scale data processing, and machine learning. It provides built-in optimization recommendations and seamless cloud service integration, enhancing collaboration. Google Cloud Dataflow stands out with its real-time streaming abilities, strong programming flexibility, and cost-effectiveness. It leverages Apache Beam's open-source framework for enhanced connectivity.
Room for Improvement: Databricks could improve by expanding library support for machine learning, refining error messaging, and increasing cost transparency. Deeper visualization tool integration and broader cloud platform integration would enhance its offering. Google Cloud Dataflow needs improved error logging, faster job startup times, and more seamless integration with additional services to boost user experience.
Ease of Deployment and Customer Service: Databricks is praised for its easy deployment on public and private clouds, featuring a user-friendly interface and strong documentation. While Databricks support is favorable, response times vary. Google Cloud Dataflow benefits from Google’s extensive documentation, though it can be complex for non-technical users to set up and navigate.
Pricing and ROI: Databricks, though priced higher, delivers considerable ROI, especially for enterprises transitioning from older systems due to its flexibility and performance. Google Cloud Dataflow offers a more affordable solution, particularly for smaller workloads, yet improvements in pricing flexibility and predictability could enhance its appeal.
For a lot of different tasks, including machine learning, it is a nice solution.
When it comes to big data processing, I prefer Databricks over other solutions.
Whenever we reach out, they respond promptly.
As of now, we are raising issues and they are providing solutions without any problems.
I rate the technical support as fine because they have levels of technical support available, especially partners who get really good support from Databricks on new features.
The fact that no interaction is needed shows their great support since I don't face issues.
Google's support team is good at resolving issues, especially with large data.
Whenever we have issues, we can consult with Google.
The patches have sometimes caused issues leading to our jobs being paused for about six hours.
Databricks is an easily scalable platform.
I would rate the scalability of this solution as very high, about nine out of ten.
Google Cloud Dataflow has auto-scaling capabilities, allowing me to add different machine types based on pace and requirements.
As a team lead, I'm responsible for handling five to six applications, but Google Cloud Dataflow seems to handle our use case effectively.
Google Cloud Dataflow can handle large data processing for real-time streaming workloads as they grow, making it a good fit for our business.
They release patches that sometimes break our code.
Although it is too early to definitively state the platform's stability, we have not encountered any issues so far.
Databricks is definitely a very stable product and reliable.
I have not encountered any issues with the performance of Dataflow, as it is stable and backed by Google services.
The job we built has not failed once over six to seven months.
The automatic scaling feature helps maintain stability.
Adjusting features like worker nodes and node utilization during cluster creation could mitigate these failures.
We prefer using a small to mid-sized cluster for many jobs to keep costs low, but this sometimes doesn't support our operations properly.
We use MLflow for managing MLOps, however, further improvement would be beneficial, especially for large language models and related tools.
Outside of Google Cloud Platform, it is problematic for others to use it and may require promotion as an actual technology.
Dealing with a huge volume of data causes failure due to array size.
I would like to see improvements in consistency and flexibility for schema design for NoSQL data stored in wide columns.
It is not a cheap solution.
It is part of a package received from Google, and they are not charging us too high.
Databricks' capability to process data in parallel enhances data processing speed.
The platform allows us to leverage cloud advantages effectively, enhancing our AI and ML projects.
The Unity Catalog is for data governance, and the Delta Lake is to build the lakehouse.
It supports multiple programming languages such as Java and Python, enabling flexibility without the need to learn something new.
The integration within Google Cloud Platform is very good.
Google Cloud Dataflow's features for event stream processing allow us to gain various insights like detecting real-time alerts.
Databricks is utilized for advanced analytics, big data processing, machine learning models, ETL operations, data engineering, streaming analytics, and integrating multiple data sources.
Organizations leverage Databricks for predictive analysis, data pipelines, data science, and unifying data architectures. It is also used for consulting projects, financial reporting, and creating APIs. Industries like insurance, retail, manufacturing, and pharmaceuticals use Databricks for data management and analytics due to its user-friendly interface, built-in machine learning libraries, support for multiple programming languages, scalability, and fast processing.
What are the key features of Databricks?
What are the benefits or ROI to look for in Databricks reviews?
Databricks is implemented in insurance for risk analysis and claims processing; in retail for customer analytics and inventory management; in manufacturing for predictive maintenance and supply chain optimization; and in pharmaceuticals for drug discovery and patient data analysis. Users value its scalability, machine learning support, collaboration tools, and Delta Lake performance but seek improvements in visualization, pricing, and integration with BI tools.
We monitor all Streaming Analytics reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.