Try our new research platform with insights from 80,000+ expert users

Apache Spark vs Pentaho Business Analytics comparison

 

Comparison Buyer's Guide

Executive Summary
 

Categories and Ranking

Apache Spark
Average Rating
8.4
Reviews Sentiment
7.7
Number of Reviews
64
Ranking in other categories
Hadoop (1st), Compute Service (4th), Java Frameworks (2nd)
Pentaho Business Analytics
Average Rating
8.0
Reviews Sentiment
5.8
Number of Reviews
42
Ranking in other categories
BI (Business Intelligence) Tools (21st), Cloud Operations Analytics (4th), Reporting (15th)
 

Mindshare comparison

Apache Spark and Pentaho Business Analytics aren’t in the same category and serve different purposes. Apache Spark is designed for Hadoop and holds a mindshare of 18.2%, down 21.9% compared to last year.
Pentaho Business Analytics, on the other hand, focuses on BI (Business Intelligence) Tools, holds 0.6% mindshare, down 0.6% since last year.
Hadoop
BI (Business Intelligence) Tools
 

Featured Reviews

SurjitChoudhury - PeerSpot reviewer
Offers batch processing of data and in-memory processing in Spark greatly enhances performance
Spark supports real-time data processing through Spark Streaming. It allows for batch processing of data. If you have immediate data, like chat information, that needs to be processed in real-time, Spark Streaming is used. For data that can be evaluated later, batch processing with Apache Spark is suitable. Mostly, batch processing is utilized in our organization, but for streaming data processing, tools like Kafka are often integrated. In-memory processing in Spark greatly enhances performance, making it a hundred times faster than the previous MapReduce methods. This improvement is achieved through optimization techniques like caching, broadcasting, and partitioning, which help in optimizing queries for faster processing.
Sayan König - PeerSpot reviewer
Flexible, easy to understand, and simple to set up
The repository should be improved. There should be the possibility to have versioning, to make it combinable with some Git repositories or something like that, to check out the processes and make sure it has a traceable history. The solution could really be improved. There are too many bugs in our version.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"The most valuable feature is the Fault Tolerance and easy binding with other processes like Machine Learning, graph analytics."
"I like that it can handle multiple tasks parallelly. I also like the automation feature. JavaScript also helps with the parallel streaming of the library."
"The scalability has been the most valuable aspect of the solution."
"Spark helps us reduce startup time for our customers and gives a very high ROI in the medium term."
"Its scalability and speed are very valuable. You can scale it a lot. It is a great technology for big data. It is definitely better than a lot of earlier warehouse or pipeline solutions, such as Informatica. Spark SQL is very compliant with normal SQL that we have been using over the years. This makes it easy to code in Spark. It is just like using normal SQL. You can use the APIs of Spark or you can directly write SQL code and run it. This is something that I feel is useful in Spark."
"The features we find most valuable are the machine learning, data learning, and Spark Analytics."
"The solution has been very stable."
"One of Apache Spark's most valuable features is that it supports in-memory processing, the execution of jobs compared to traditional tools is very fast."
"The initial setup is pretty straightforward."
"We were able to install it without any assistance from tech support."
"Pentaho is an analytics platform that can be used when an organization has a lot of big data storage systems already installed and needs to manage and analyze that data. It has a specific use case for unstructured data, such as documents, and needs to be able to search and analyze it."
"I use the BI Server, CDE Dashboards, Saiku, and Kettle, because these tools are very good and highly experienced."
"Pentaho Business Analytics' best features include the ease of developing data flows and the wide range of options to connect to databases, including those on the cloud."
"The most valuable feature of Pentaho is the Tableau report."
"Easy to use components to create the job."
 

Cons

"At the initial stage, the product provides no container logs to check the activity."
"One limitation is that not all machine learning libraries and models support it."
"When you first start using this solution, it is common to run into memory errors when you are dealing with large amounts of data."
"I would like to see integration with data science platforms to optimize the processing capability for these tasks."
"There could be enhancements in optimization techniques, as there are some limitations in this area that could be addressed to further refine Spark's performance."
"The product could improve the user interface and make it easier for new users."
"It requires overcoming a significant learning curve due to its robust and feature-rich nature."
"Apache Spark is very difficult to use. It would require a data engineer. It is not available for every engineer today because they need to understand the different concepts of Spark, which is very, very difficult and it is not easy to learn."
"Pentaho Business Analytics' user interface is outdated."
"Another concern is that Pentaho is not customizable or interactive."
"Pentaho, at the general level, should greatly improve the easy construction of its dashboards and easy integration of information from different sources without technical user intervention."
"Deployment is not simple. It is not simple because we are dealing with a lot of data; we are dealing with a lot of storage. So, it's not a simple process."
"We did not achieve the ROI. The work delivered to users had lesser value than the subscription cost."
"Logging capability is needed."
"Version control would be a good addition."
"The repository should be improved."
 

Pricing and Cost Advice

"Apache Spark is an expensive solution."
"On the cloud model can be expensive as it requires substantial resources for implementation, covering on-premises hardware, memory, and licensing."
"It is quite expensive. In fact, it accounts for almost 50% of the cost of our entire project."
"Since we are using the Apache Spark version, not the data bricks version, it is an Apache license version, the support and resolution of the bug are actually late or delayed. The Apache license is free."
"The product is expensive, considering the setup."
"Spark is an open-source solution, so there are no licensing costs."
"It is an open-source platform. We do not pay for its subscription."
"Apache Spark is open-source. You have to pay only when you use any bundled product, such as Cloudera."
"Free and commercial versions are available."
"We were lucky enough to find a Pentaho OEM partner who offered a data warehouse model and the ETL software for about 60K SGD per year."
"Pentaho is expensive ."
report
Use our free recommendation engine to learn which Hadoop solutions are best for your needs.
816,406 professionals have used our research since 2012.
 

Comparison Review

it_user6978 - PeerSpot reviewer
Jun 10, 2013
Jaspersoft vs. Pentaho – Which one to use & is there any need to purchase the commercial edition
Any company (be it technology, manfucaturing, human resource, ecommerce, SME etc) always has the need for Business Intelligence to some or the other extent. If cost is one of the consideration factor, then the 2 BI tools which are at the forefront are Pentaho and Jaspersoft. But, often the same…
 

Top Industries

By visitors reading reviews
Financial Services Firm
27%
Computer Software Company
13%
Manufacturing Company
8%
University
5%
Financial Services Firm
24%
Computer Software Company
14%
Government
8%
Educational Organization
8%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

What do you like most about Apache Spark?
We use Spark to process data from different data sources.
What is your experience regarding pricing and costs for Apache Spark?
Compared to other solutions like Doc DB, Spark is more costly due to the need for extensive infrastructure. It requires significant investment in infrastructure, which can be expensive. While cloud...
What needs improvement with Apache Spark?
The main concern is the overhead of Java when distributed processing is not necessary. In such cases, operations can often be done on one node, making Spark's distributed mode unnecessary. Conseque...
Seeking lightweight open source BI software
There are many...It would rather depend what System BI architecture or Enterprise legacy you have at your end...I would recommend as follows: 1) If you have legacies of SAP, Oracle - look for SAP...
What is your experience regarding pricing and costs for Pentaho Business Analytics?
The organization has both options based on their needs and budget constraints. The Enterprise Edition is expensive with references to an added number of features.
What needs improvement with Pentaho Business Analytics?
The product to me is not as user-friendly as other players in the market. It also still needs improvement in the reporting module. You will need to search for deployment examples or need to have a ...
 

Also Known As

No data available
Pentaho, Kettle, Hitachi Pentaho Business Analytics
 

Overview

 

Sample Customers

NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions
Cargo 2000 Lufthansa, Marketo, ModCloth, Cardiac Science, Telefonica, ExactTarget, Active Broadband Networks, and Brussels Airport.
Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop. Updated: November 2024.
816,406 professionals have used our research since 2012.