Try our new research platform with insights from 80,000+ expert users

Pentaho Data Integration and Analytics vs StreamSets comparison

 

Comparison Buyer's Guide

Executive SummaryUpdated on Dec 19, 2024

Review summaries and opinions

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Categories and Ranking

Pentaho Data Integration an...
Ranking in Data Integration
22nd
Average Rating
8.0
Reviews Sentiment
6.9
Number of Reviews
53
Ranking in other categories
No ranking in other categories
StreamSets
Ranking in Data Integration
15th
Average Rating
8.4
Reviews Sentiment
7.1
Number of Reviews
20
Ranking in other categories
No ranking in other categories
 

Mindshare comparison

As of March 2025, in the Data Integration category, the mindshare of Pentaho Data Integration and Analytics is 1.4%, up from 0.5% compared to the previous year. The mindshare of StreamSets is 1.6%, up from 1.3% compared to the previous year. It is calculated based on PeerSpot user engagement data.
Data Integration
 

Featured Reviews

Ryan Ferdon - PeerSpot reviewer
Low-code makes development faster than with Python, but there were caching issues
If you're working with a larger data set, I'm not so sure it would be the best solution. The larger things got the slower it was. It was kind of buggy sometimes. And when we ran the flow, it didn't go from a perceived start to end, node by node. Everything kicked off at once. That meant there were times when it would get ahead of itself and a job would fail. That was not because the job was wrong, but because Pentaho decided to go at everything at once, and something would process before it was supposed to. There were nodes you could add to make sure that, before this node kicks off, all these others have processed, but it was a bit tedious. There were also caching issues, and we had to write code to clear the cache every time we opened the program, because the cache would fill up and it wouldn't run. I don't know how hard that would be for them to fix, or if it was fixed in version 10. Also, the UI is a bit outdated, but I'm more of a fan of function over how something looks. One other thing that would have helped with Pentaho was documentation and support on the internet: how to do things, how to set up. I think there are some sites on how to install it, and Pentaho does have a help repository, but it wasn't always the most useful.
Nantabo Jackie - PeerSpot reviewer
Simplified pipelines and helped us break down data silos within our organization
The design experience when implementing batch streaming or ECL pipelines is very easy and straightforward. When we initially attempted to integrate StreamSets with Kafka, it was somewhat challenging until we consulted the documentation, after which it became straightforward. We use StreamSets to move data into modern analytics platforms. Moving the data into modern analytics platforms is still complex. It requires a lot of understanding of logic. StreamSets enables us to build data pipelines without knowing how to code. StreamSets' ability to build data pipelines without requiring us to know complex programming is very important, as it allows us to focus on our projects without spending time writing code. StreamSets' Transformer for Snowflake is simple to use for designing both simple and complex transformation logic. StreamSets' Transformer for Snowflake is extremely important to me as it helps me to connect external data sources and keep my internal workflow organized. Transformer for Snowflake's functionality is a perfect ten out of ten. It is important and cost-effective that Transformer for Snowflake is a serverless engine embedded within the platform, as without this feature, it would be very expensive. This feature helps us to sell at lower budget costs, which would otherwise be at a high cost with other servers. StreamSets has helped improve our organization. StreamSets simplified pipelines for our organization. It is easier to complete a project when we know where and how to start, and working with the team remotely makes it more efficient. This helps us to save time and be more organized when creating data pipelines. Being a structured company that produces reliable resources for our application benefits both our clients and contacts. StreamSets' built-in data drift resilience plays a part in our ETL operations. With prior knowledge, the built-in data drift resilience is very effective, but it can be challenging to implement without the preexisting knowledge. The built-in data drift resilience reduced the time it takes us to fix data drift breakages by 45 percent. StreamSets helped us break down data silos within our organization. The use of StreamSets to break down data silos enabled us to be confident in the services and products we provide, as well as the real-time streaming we offer. This has had a positive impact on our business, as it allowed us to accurately determine the analytics we need to present to stakeholders, clients, and our sources while ensuring that the process is secure and transparent. StreamSets saved us time because anyone can use StreamSets not just developers. We can save around 40 percent of our time. StreamSets' reusable assets helped us reduce workload by around 25 percent. StreamSets saved us money by not having to hire developers with specialized skills. We saved around $2,000 US. StreamSets helped us scale our data operations. Since StreamSets makes it easy to scale our data operations, it enabled us to know exactly where to start at any time. We are aware of the timeline for completing the project, and depending on our familiarity with the software, we can come up with a solution quickly.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"The amount of data that it loads and processes is good."
"One of the valuable features is the ability to use PL/SQL statements inside the data transformations and jobs."
"The way it has improved our product is by giving our users the ability to do ad hoc reports, which is very important to our users. We can do predictive analysis on trends coming in for contracts, which is what our product does. The product helps users decide which way to go based on the predictive analysis done by Pentaho. Pentaho is not doing predictions, but reporting on the predictions that our product is doing. This is a big part of our product."
"It's my understanding that the product can scale."
"We use Lumada’s ability to develop and deploy data pipeline templates once and reuse them. This is very important. When the entire pipeline is automated, we do not have any issues in respect to deployment of code or with code working in one environment but not working in another environment. We have saved a lot of time and effort from that perspective because it is easy to build ETL pipelines."
"Its drag-and-drop interface lets me and my team implement all the solutions that we need in our company very quickly. It's a very good tool for that."
"The solution offers features for data integration and migration. Pentaho Data Integration and Analytics allows the integration of multiple data sources into one. The product is user-friendly and intuitive to use for almost any business."
"I can create faster instructions than writing with SQL or code. Also, I am able to do some background control of the data process with this tool. Therefore, I use it as an ELT tool. I have a station area where I can work with all the information that I have in my production databases, then I can work with the data that I created."
"StreamSets Transformer is a good feature because it helps you when you are developing applications and when you don't want to write a lot of code. That is the best feature overall."
"In StreamSets, everything is in one place."
"The Ease of configuration for pipes is amazing. It has a lot of connectors. Mainly, we can do everything with the data in the pipe. I really like the graphical interface too"
"For me, the most valuable features in StreamSets have to be the Data Collector and Control Hub, but especially the Data Collector. That feature is very elegant and seamlessly works with numerous source systems."
"I have used Data Collector, Transformer, and Control Hub products from StreamSets. What I really like about these products is that they're very user-friendly. People who are not from a technological or core development background find it easy to get started and build data pipelines and connect to the databases. They would be comfortable like any technical person within a couple of weeks."
"StreamSets data drift feature gives us an alert upfront so we know that the data can be ingested. Whatever the schema or data type changes, it lands automatically into the data lake without any intervention from us, but then that information is crucial to fix for downstream pipelines, which process the data into models, like Tableau and Power BI models. This is actually very useful for us. We are already seeing benefits. Our pipelines used to break when there were data drift changes, then we needed to spend about a week fixing it. Right now, we are saving one to two weeks. Though, it depends on the complexity of the pipeline, we are definitely seeing a lot of time being saved."
"It is really easy to set up and the interface is easy to use."
"The entire user interface is very simple and the simplicity of creating pipelines is something that I like very much about it. The design experience is very smooth."
 

Cons

"Parallel execution could be better in Pentaho. It's very simple but I don't think it works well."
"The web interface is rusty, and the biggest problem with Pentaho is debugging and troubleshooting. It isn't easy to build the pipeline incrementally. At least in our case, it's hard to find a way to execute step by step in the debugging mode."
"I was not happy with the Pentaho Report Designer because of the way it was set up. There was a zone and, under it, another zone, and under that another one, and under that another one. There were a lot of levels and places inside the report, and it was a little bit complicated. You have to search all these different places using a mouse, clicking everywhere... each report is coded in a binary file... You cannot search with a text search tool..."
"Larger data jobs take more time to execute."
"It's not very stable, at least not in the case of the community edition. I'm working with the community edition right now and I think perhaps it is because of that it is not very stable, it causes the system to sometimes hang. I'm not sure if this is the case for pair tiers."
"​I work with the Community Edition, therefore I do not have support. There was an issue that I could not resolve with community support.​"
"It could be better integrated with programming languages, like Python and R. Right now, if I want to run a Python code on one of my ETLs, it is a bit difficult to do. It would be great if we have some modules where we could code directly in a Python language. We don't really have a way to run Python code natively."
"In the Community edition, it would be nice to have more modules that allow you to code directly within the application. It could have R or Python completely integrated into it, but this could also be because I'm using an older version."
"They need to improve their customer care services. Sometimes it has taken more than 48 hours to resolve an issue. That should be reduced. They are aware of small or generic issues, but not the more technical or deep issues. For those, they require some time, generally 48 to 72 hours to respond. That should be improved."
"If you use JDBC Lookup, for example, it generally takes a long time to process data."
"We've seen a couple of cases where it appears to have a memory leak or a similar problem."
"The documentation is inadequate and has room for improvement because the technical support does not regularly update their documentation or the knowledge base."
"The execution engine could be improved. When I was at their session, they were using some obscure platform to run. There is a controller, which controls what happens on that, but you should be able to easily do this at any of the cloud services, such as Google Cloud. You shouldn't have any issues in terms of how to run it with their online development platform or design platform, basically their execution engine. There are issues with that."
"Visualization and monitoring need to be improved and refined."
"Sometimes, it is not clear at first how to set up nodes. A site with an explanation of how each node works would be very helpful."
"Currently, we can only use the query to read data from SAP HANA. What we would like to see, as soon as possible, is the ability to read from multiple tables from SAP HANA. That would be a really good thing that we could use immediately. For example, if you have 100 tables in SQL Server or Oracle, then you could just point it to the schema or the 100 tables and ingestion information. However, you can't do that in SAP HANA since StreamSets currently is lacking in this. They do not have a multi-table feature for SAP HANA. Therefore, a multi-table origin for SAP HANA would be helpful."
 

Pricing and Cost Advice

"You don't need the Enterprise Edition, you can go with the Community Edition. That way you can use it for free and, for free, it's a pretty good tool to use."
"We are using the Community Edition. We have been trying to use and sell the Enterprise version, but that hasn't been possible due to the budget required for it."
"If a company is looking for an ETL solution and wants to integrate it with their tech stack but doesn't want to spend a bunch of money, Pentaho is a good solution"
"The solution reduced our ETL development time by a lot because a whole project used to take about a month to get done previously. After having Lumada, it took just a week. For a big company in Brazil, it saves a team at least $10,000 a month."
"You need to go through the paid version to have Hitachi Lumada specialized support. However, if you are using the free version, then you will have only the community support. You will depend on the releases from Hitachi to solve some problem or questions that you have, such as bug fixes. You will need to wait for the newest versions or releases to solve these types of problems."
"There is a good open source option (Community Edition)​."
"The pricing has been pretty good. I'm used to using everything open-source or freeware-based. I understand that organizations need to make sure that the solutions are secure, and that's basically where I hit a roadblock in my current organization. They needed to ensure that we had a license and we had a secure way of accessing it so that no outside parties could get access to our data, but in terms of pricing, considering how much other teams are spending on cloud solutions or even their existing solutions, its price point is pretty good. At this time, there are no additional costs. We just have the licensing fees."
"I mostly used the open-source version. I didn't work with a license."
"I believe the pricing is not equitable."
"Its pricing is pretty much up to the mark. For smaller enterprises, it could be a big price to pay at the initial stage of operations, but the moment you have the Seed B or Seed C funding and you want to scale up your operations and aren't much worried about the funds, at that point in time, you would need a solution that could be scaled."
"The pricing is too fixed. It should be based on how much data you need to process. Some businesses are not so big that they process a lot of data."
"There are two editions, Professional and Enterprise, and there is a free trial. We're using the Professional edition and it is competitively priced."
"The licensing is expensive, and there are other costs involved too. I know from using the software that you have to buy new features whenever there are new updates, which I don't really like. But initially, it was very good."
"It's not so favorable for small companies."
"The overall cost is very flexible so it is not a burden for our organization... However, the cost should be improved. For small and mid-size organizations it might be a challenge."
"We use the free version. It's great for a public, free release. Our stance is that the paid support model is too expensive to get into. They should honestly reevaluate that."
report
Use our free recommendation engine to learn which Data Integration solutions are best for your needs.
842,651 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
22%
Computer Software Company
15%
Government
8%
Manufacturing Company
5%
Financial Services Firm
14%
Computer Software Company
11%
Manufacturing Company
10%
Insurance Company
8%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

Which ETL tool would you recommend to populate data from OLTP to OLAP?
Hi Rajneesh, yes here is the feature comparison between the community and enterprise edition : https://www.hitachivantara.com/en-us/pdf/brochure/leverage-open-source-benefits-with-assurance-of-hita...
What do you think can be improved with Hitachi Lumada Data Integrations?
In my opinion, the reporting side of this tool needs serious improvements. In my previous company, we worked with Hitachi Lumada Data Integration and while it does a good job for what it’s worth, ...
What do you use Hitachi Lumada Data Integrations for most frequently?
My company has used this product to transform data from databases, CSV files, and flat files. It really does a good job. We were most satisfied with the results in terms of how many people could us...
What do you like most about StreamSets?
The best thing about StreamSets is its plugins, which are very useful and work well with almost every data source. It's also easy to use, especially if you're comfortable with SQL. You can customiz...
What needs improvement with StreamSets?
We often faced problems, especially with SAP ERP. We struggled because many columns weren't integers or primary keys, which StreamSets couldn't handle. We had to restructure our data tables, which ...
What is your primary use case for StreamSets?
StreamSets is used for data transformation rather than ETL processes. It focuses on transforming data directly from sources without handling the extraction part of the process. The transformed data...
 

Also Known As

Hitachi Lumada Data Integration, Kettle, Pentaho Data Integration
No data available
 

Overview

 

Sample Customers

66Controls, Providential Revenue Agency of Ro Negro, NOAA Information Systems, Swiss Real Estate Institute
Availity, BT Group, Humana, Deluxe, GSK, RingCentral, IBM, Shell, SamTrans, State of Ohio, TalentFulfilled, TechBridge
Find out what your peers are saying about Pentaho Data Integration and Analytics vs. StreamSets and other solutions. Updated: February 2025.
842,651 professionals have used our research since 2012.