Try our new research platform with insights from 80,000+ expert users

Apache Spark vs Azure Stream Analytics comparison

 

Comparison Buyer's Guide

Executive SummaryUpdated on Oct 8, 2024
 

Categories and Ranking

Apache Spark
Average Rating
8.4
Reviews Sentiment
7.7
Number of Reviews
64
Ranking in other categories
Hadoop (1st), Compute Service (4th), Java Frameworks (2nd)
Azure Stream Analytics
Average Rating
8.0
Number of Reviews
24
Ranking in other categories
Streaming Analytics (4th)
 

Mindshare comparison

Apache Spark and Azure Stream Analytics aren’t in the same category and serve different purposes. Apache Spark is designed for Hadoop and holds a mindshare of 18.2%, down 21.9% compared to last year.
Azure Stream Analytics, on the other hand, focuses on Streaming Analytics, holds 12.6% mindshare, down 13.5% since last year.
Hadoop
Streaming Analytics
 

Featured Reviews

SurjitChoudhury - PeerSpot reviewer
Offers batch processing of data and in-memory processing in Spark greatly enhances performance
Spark supports real-time data processing through Spark Streaming. It allows for batch processing of data. If you have immediate data, like chat information, that needs to be processed in real-time, Spark Streaming is used. For data that can be evaluated later, batch processing with Apache Spark is suitable. Mostly, batch processing is utilized in our organization, but for streaming data processing, tools like Kafka are often integrated. In-memory processing in Spark greatly enhances performance, making it a hundred times faster than the previous MapReduce methods. This improvement is achieved through optimization techniques like caching, broadcasting, and partitioning, which help in optimizing queries for faster processing.
Sudhendra Umarji - PeerSpot reviewer
Easy to set up and user-friendly, but could be priced better
I haven't come across missing items. It does what I need it to do. The pricing is a little bit high. The UI should be a little bit better from a usability perspective. The endpoint, if you are outsourcing to a third party, should have easier APIs. I'd like to have more destination sources available to us.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"We use it for ETL purposes as well as for implementing the full transformation pipelines."
"The tool's most valuable feature is its speed and efficiency. It's much faster than other tools and excels in parallel data processing. Unlike tools like Python or JavaScript, which may struggle with parallel processing, it allows us to handle large volumes of data with more power easily."
"Provides a lot of good documentation compared to other solutions."
"The solution is very stable."
"Now, when we're tackling sentiment analysis using NLP technologies, we deal with unstructured data—customer chats, feedback on promotions or demos, and even media like images, audio, and video files. For processing such data, we rely on PySpark. Beneath the surface, Spark functions as a compute engine with in-memory processing capabilities, enhancing performance through features like broadcasting and caching. It's become a crucial tool, widely adopted by 90% of companies for a decade or more."
"The most valuable feature of Apache Spark is its ease of use."
"The product's deployment phase is easy."
"I like that it can handle multiple tasks parallelly. I also like the automation feature. JavaScript also helps with the parallel streaming of the library."
"I appreciate this solution because it leverages open-source technologies. It allows us to utilize the latest streaming solutions and it's easy to develop."
"Technical support is pretty helpful."
"The way it organizes data into tables and dashboards is very helpful."
"I like the way the UI looks, and the real-time analytics service is aligned to this. That can be helpful if I have to use this on a production service."
"Provides deep integration with other Azure resources."
"The integrations for this solution are easy to use and there is flexibility in integrating the tool with Azure Stream Analytics."
"It provides the capability to streamline multiple output components."
"It's a product that can scale."
 

Cons

"There could be enhancements in optimization techniques, as there are some limitations in this area that could be addressed to further refine Spark's performance."
"It should support more programming languages."
"Apache Spark could potentially improve in terms of user-friendliness, particularly for individuals with a SQL background. While it's suitable for those with programming knowledge, making it more accessible to those without extensive programming skills could be beneficial."
"Apache Spark can improve the use case scenarios from the website. There is not any information on how you can use the solution across the relational databases toward multiple databases."
"When you want to extract data from your HDFS and other sources then it is kind of tricky because you have to connect with those sources."
"From my perspective, the only thing that needs improvement is the interface, as it was not easily understandable."
"Stream processing needs to be developed more in Spark. I have used Flink previously. Flink is better than Spark at stream processing."
"The graphical user interface (UI) could be a bit more clear. It's very hard to figure out the execution logs and understand how long it takes to send everything. If an execution is lost, it's not so easy to understand why or where it went. I have to manually drill down on the data processes which takes a lot of time. Maybe there could be like a metrics monitor, or maybe the whole log analysis could be improved to make it easier to understand and navigate."
"The collection and analysis of historical data could be better."
"More flexibility in terms of writing queries and accommodating additional facilities would be beneficial."
"Early in the process, we had some issues with stability."
"The UI should be a little bit better from a usability perspective."
"I would like to have a contact individual at Microsoft."
"Sometimes when we connect Power BI, there is a delay or it throws up some errors, so we're not sure."
"The initial setup is complex."
"Azure Stream Analytics is challenging to customize because it's not very flexible."
 

Pricing and Cost Advice

"It is an open-source platform. We do not pay for its subscription."
"They provide an open-source license for the on-premise version."
"The solution is affordable and there are no additional licensing costs."
"The product is expensive, considering the setup."
"Apache Spark is not too cheap. You have to pay for hardware and Cloudera licenses. Of course, there is a solution with open source without Cloudera."
"Apache Spark is an open-source solution, and there is no cost involved in deploying the solution on-premises."
"It is quite expensive. In fact, it accounts for almost 50% of the cost of our entire project."
"Since we are using the Apache Spark version, not the data bricks version, it is an Apache license version, the support and resolution of the bug are actually late or delayed. The Apache license is free."
"When scaling up, the pricing for Azure Stream Analytics can get relatively high. Considering its capabilities compared to other solutions, I would rate it a seven out of ten for cost. However, we've found ways to optimize costs using tools like Databricks for specific tasks."
"I rate the price of Azure Stream Analytics a four out of five."
"The current price is substantial."
"We pay approximately $500,000 a year. It's approximately $10,000 a year per license."
"Azure Stream Analytics is a little bit expensive."
"The licensing for this product is payable on a 'pay as you go' basis. This means that the cost is only based on data volume, and the frequency that the solution is used."
"The cost of this solution is less than competitors such as Amazon or Google Cloud."
"The product's price is at par with the other solutions provided by the other cloud service providers in the market."
report
Use our free recommendation engine to learn which Hadoop solutions are best for your needs.
816,406 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
27%
Computer Software Company
13%
Manufacturing Company
8%
University
5%
Computer Software Company
15%
Financial Services Firm
14%
Manufacturing Company
8%
Insurance Company
6%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

What do you like most about Apache Spark?
We use Spark to process data from different data sources.
What is your experience regarding pricing and costs for Apache Spark?
Compared to other solutions like Doc DB, Spark is more costly due to the need for extensive infrastructure. It requires significant investment in infrastructure, which can be expensive. While cloud...
What needs improvement with Apache Spark?
The main concern is the overhead of Java when distributed processing is not necessary. In such cases, operations can often be done on one node, making Spark's distributed mode unnecessary. Conseque...
Which would you choose - Databricks or Azure Stream Analytics?
Databricks is an easy-to-set-up and versatile tool for data management, analysis, and business analytics. For analytics teams that have to interpret data to further the business goals of their orga...
What is your experience regarding pricing and costs for Azure Stream Analytics?
When scaling up, the pricing for Azure Stream Analytics can get relatively high. Considering its capabilities compared to other solutions, I would rate it a seven out of ten for cost. However, we'v...
What needs improvement with Azure Stream Analytics?
Azure Stream Analytics is challenging to customize because it's not very flexible. It's good for quickly setting up and implementing solutions, but for building complex data pipelines and engineeri...
 

Also Known As

No data available
ASA
 

Learn More

 

Overview

 

Sample Customers

NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions
Rockwell Automation, Milliman, Honeywell Building Solutions, Arcoflex Automation Solutions, Real Madrid C.F., Aerocrine, Ziosk, Tacoma Public Schools, P97 Networks
Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop. Updated: November 2024.
816,406 professionals have used our research since 2012.