We performed a comparison between Apache Spark and Pentaho Business Analytics based on real PeerSpot user reviews.
Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop."The most valuable feature of this solution is its capacity for processing large amounts of data."
"Provides a lot of good documentation compared to other solutions."
"It provides a scalable machine learning library."
"Apache Spark provides a very high-quality implementation of distributed data processing."
"ETL and streaming capabilities."
"One of the key features is that Apache Spark is a distributed computing framework. You can help multiple slaves and distribute the workload between them."
"It is useful for handling large amounts of data. It is very useful for scientific purposes."
"The scalability has been the most valuable aspect of the solution."
"Easy to use components to create the job."
"Pentaho Business Analytics' best features include the ease of developing data flows and the wide range of options to connect to databases, including those on the cloud."
"The most valuable feature of Pentaho is the Tableau report."
"We were able to install it without any assistance from tech support."
"I use the BI Server, CDE Dashboards, Saiku, and Kettle, because these tools are very good and highly experienced."
"Pentaho is an analytics platform that can be used when an organization has a lot of big data storage systems already installed and needs to manage and analyze that data. It has a specific use case for unstructured data, such as documents, and needs to be able to search and analyze it."
"The initial setup is pretty straightforward."
"In data analysis, you need to take real-time data from different data sources. You need to process this in a subsecond, do the transformation in a subsecond, and all that."
"When using Spark, users may need to write their own parallelization logic, which requires additional effort and expertise."
"We've had problems using a Python process to try to access something in a large volume of data. It crashes if somebody gives me the wrong code because it cannot handle a large volume of data."
"I would like to see integration with data science platforms to optimize the processing capability for these tasks."
"There were some problems related to the product's compatibility with a few Python libraries."
"When you first start using this solution, it is common to run into memory errors when you are dealing with large amounts of data."
"The solution needs to optimize shuffling between workers."
"They could improve the issues related to programming language for the platform."
"Another concern is that Pentaho is not customizable or interactive."
"Pentaho Business Analytics' user interface is outdated."
"We did not achieve the ROI. The work delivered to users had lesser value than the subscription cost."
"Logging capability is needed."
"Pentaho, at the general level, should greatly improve the easy construction of its dashboards and easy integration of information from different sources without technical user intervention."
"Deployment is not simple. It is not simple because we are dealing with a lot of data; we are dealing with a lot of storage. So, it's not a simple process."
"Version control would be a good addition."
"The repository should be improved."
Apache Spark is ranked 1st in Hadoop with 60 reviews while Pentaho Business Analytics is ranked 19th in BI (Business Intelligence) Tools with 42 reviews. Apache Spark is rated 8.4, while Pentaho Business Analytics is rated 8.0. The top reviewer of Apache Spark writes "Reliable, able to expand, and handle large amounts of data well". On the other hand, the top reviewer of Pentaho Business Analytics writes "Flexible, easy to understand, and simple to set up". Apache Spark is most compared with Spring Boot, AWS Batch, Spark SQL, SAP HANA and Cloudera Distribution for Hadoop, whereas Pentaho Business Analytics is most compared with Microsoft Power BI, Databricks, KNIME, SAP Crystal Reports and Microsoft SQL Server Reporting Services.
We monitor all Hadoop reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.