Try our new research platform with insights from 80,000+ expert users

Apache Spark vs Azure Stream Analytics comparison

 

Comparison Buyer's Guide

Executive SummaryUpdated on Oct 8, 2024
 

Categories and Ranking

Apache Spark
Average Rating
8.4
Reviews Sentiment
7.7
Number of Reviews
64
Ranking in other categories
Hadoop (1st), Compute Service (4th), Java Frameworks (2nd)
Azure Stream Analytics
Average Rating
8.0
Reviews Sentiment
6.9
Number of Reviews
24
Ranking in other categories
Streaming Analytics (4th)
 

Mindshare comparison

Apache Spark and Azure Stream Analytics aren’t in the same category and serve different purposes. Apache Spark is designed for Hadoop and holds a mindshare of 18.0%, down 21.8% compared to last year.
Azure Stream Analytics, on the other hand, focuses on Streaming Analytics, holds 12.5% mindshare, down 13.6% since last year.
Hadoop
Streaming Analytics
 

Featured Reviews

SurjitChoudhury - PeerSpot reviewer
Offers batch processing of data and in-memory processing in Spark greatly enhances performance
Spark supports real-time data processing through Spark Streaming. It allows for batch processing of data. If you have immediate data, like chat information, that needs to be processed in real-time, Spark Streaming is used. For data that can be evaluated later, batch processing with Apache Spark is suitable. Mostly, batch processing is utilized in our organization, but for streaming data processing, tools like Kafka are often integrated. In-memory processing in Spark greatly enhances performance, making it a hundred times faster than the previous MapReduce methods. This improvement is achieved through optimization techniques like caching, broadcasting, and partitioning, which help in optimizing queries for faster processing.
Sudhendra Umarji - PeerSpot reviewer
Easy to set up and user-friendly, but could be priced better
I haven't come across missing items. It does what I need it to do. The pricing is a little bit high. The UI should be a little bit better from a usability perspective. The endpoint, if you are outsourcing to a third party, should have easier APIs. I'd like to have more destination sources available to us.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"I like Apache Spark's flexibility the most. Before, we had one server that would choke up. With the solution, we can easily add more nodes when needed. The machine learning models are also really helpful. We use them to predict energy theft and find infrastructure problems."
"I found the solution stable. We haven't had any problems with it."
"The scalability has been the most valuable aspect of the solution."
"Its scalability and speed are very valuable. You can scale it a lot. It is a great technology for big data. It is definitely better than a lot of earlier warehouse or pipeline solutions, such as Informatica. Spark SQL is very compliant with normal SQL that we have been using over the years. This makes it easy to code in Spark. It is just like using normal SQL. You can use the APIs of Spark or you can directly write SQL code and run it. This is something that I feel is useful in Spark."
"AI libraries are the most valuable. They provide extensibility and usability. Spark has a lot of connectors, which is a very important and useful feature for AI. You need to connect a lot of points for AI, and you have to get data from those systems. Connectors are very wide in Spark. With a Spark cluster, you can get fast results, especially for AI."
"We use Spark to process data from different data sources."
"Apache Spark can do large volume interactive data analysis."
"The most valuable feature of Apache Spark is its ease of use."
"It provides the capability to streamline multiple output components."
"It's a product that can scale."
"I like all the connected ecosystems of Microsoft, it is really good with other BI tools that are easy to connect."
"The most valuable features of Azure Stream Analytics are the ease of provisioning and the interface is not terribly complex."
"The most valuable features are the IoT hub and the Blob storage."
"We find the query editor feature of this solution extremely valuable for our business."
"I like the IoT part. We have mostly used Azure Stream Analytics services for it"
"The solution's technical support is good."
 

Cons

"The migration of data between different versions could be improved."
"Technical expertise from an engineer is required to deploy and run high-tech tools, like Informatica, on Apache Spark, making it an area where improvements are required to make the process easier for users."
"Apache Spark is very difficult to use. It would require a data engineer. It is not available for every engineer today because they need to understand the different concepts of Spark, which is very, very difficult and it is not easy to learn."
"The setup I worked on was really complex."
"The solution needs to optimize shuffling between workers."
"When using Spark, users may need to write their own parallelization logic, which requires additional effort and expertise."
"It's not easy to install."
"When you first start using this solution, it is common to run into memory errors when you are dealing with large amounts of data."
"The solution's interface could be simpler to understand for non-technical people."
"Early in the process, we had some issues with stability."
"More flexibility in terms of writing queries and accommodating additional facilities would be beneficial."
"The solution doesn't handle large data packets very efficiently, which could be improved upon."
"I would like to have a contact individual at Microsoft."
"It is not complex, but it requires some development skills. When the data is sent from Azure Stream Analytics to Power BI, I don't have the access to modify the data. I can't customize or edit the data or do some queries. All queries need to be done in the Azure Stream Analytics."
"Its features for event imports and architecture could be enhanced."
"Azure Stream Analytics is challenging to customize because it's not very flexible."
 

Pricing and Cost Advice

"Apache Spark is an open-source tool."
"It is an open-source solution, it is free of charge."
"Spark is an open-source solution, so there are no licensing costs."
"Considering the product version used in my company, I feel that the tool is not costly since the product is available for free."
"We are using the free version of the solution."
"Since we are using the Apache Spark version, not the data bricks version, it is an Apache license version, the support and resolution of the bug are actually late or delayed. The Apache license is free."
"I did not pay anything when using the tool on cloud services, but I had to pay on the compute side. The tool is not expensive compared with the benefits it offers. I rate the price as an eight out of ten."
"The solution is affordable and there are no additional licensing costs."
"When scaling up, the pricing for Azure Stream Analytics can get relatively high. Considering its capabilities compared to other solutions, I would rate it a seven out of ten for cost. However, we've found ways to optimize costs using tools like Databricks for specific tasks."
"We pay approximately $500,000 a year. It's approximately $10,000 a year per license."
"The product's price is at par with the other solutions provided by the other cloud service providers in the market."
"The cost of this solution is less than competitors such as Amazon or Google Cloud."
"The licensing for this product is payable on a 'pay as you go' basis. This means that the cost is only based on data volume, and the frequency that the solution is used."
"I rate the price of Azure Stream Analytics a four out of five."
"Azure Stream Analytics is a little bit expensive."
"The current price is substantial."
report
Use our free recommendation engine to learn which Hadoop solutions are best for your needs.
824,067 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
27%
Computer Software Company
13%
Manufacturing Company
8%
Retailer
5%
Computer Software Company
15%
Financial Services Firm
14%
Manufacturing Company
9%
Insurance Company
6%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

What do you like most about Apache Spark?
We use Spark to process data from different data sources.
What is your experience regarding pricing and costs for Apache Spark?
Compared to other solutions like Doc DB, Spark is more costly due to the need for extensive infrastructure. It requires significant investment in infrastructure, which can be expensive. While cloud...
What needs improvement with Apache Spark?
The main concern is the overhead of Java when distributed processing is not necessary. In such cases, operations can often be done on one node, making Spark's distributed mode unnecessary. Conseque...
Which would you choose - Databricks or Azure Stream Analytics?
Databricks is an easy-to-set-up and versatile tool for data management, analysis, and business analytics. For analytics teams that have to interpret data to further the business goals of their orga...
What is your experience regarding pricing and costs for Azure Stream Analytics?
Stream Analytics is cheaper, especially for small-scale requirements or telemetry needs. However, for enterprise-level data handling, structured streaming with Databricks might be more cost-effecti...
What needs improvement with Azure Stream Analytics?
More flexibility in terms of writing queries and accommodating additional facilities would be beneficial. The complexity of handling messages that need decoding and contain different characters sho...
 

Also Known As

No data available
ASA
 

Learn More

 

Overview

 

Sample Customers

NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions
Rockwell Automation, Milliman, Honeywell Building Solutions, Arcoflex Automation Solutions, Real Madrid C.F., Aerocrine, Ziosk, Tacoma Public Schools, P97 Networks
Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop. Updated: December 2024.
824,067 professionals have used our research since 2012.