Try our new research platform with insights from 80,000+ expert users
Aqeel UR Rehman - PeerSpot reviewer
BI Analyst at a computer software company with 51-200 employees
Real User
Top 10
Simple to use, supports custom transformations, and the open-source version can be used free of charge
Pros and Cons
  • "This solution allows us to create pipelines using a minimal amount of custom coding."
  • "I have been facing some difficulties when working with large datasets. It seems that when there is a large amount of data, I experience memory errors."

What is our primary use case?

I have used this ETL tool for working with data in projects across several different domains. My use cases include tasks such as transforming data that has been taken from an API like PayPal, extracting data from different sources such as Magenta or other databases, and transforming all of the information.

Once the transformation is complete, we load the data into data warehouses such as Amazon Redshift.

How has it helped my organization?

There are a lot of different benefits we receive from using this solution. For example, we can easily accept data from an API and create JSON files. The integration is also very good.

I have created many data pipelines and after they are created, they can be reused on different levels.

What is most valuable?

The best feature is that it's simple to use. There are simple data transformation steps available, such as trimming data or performing different types of replacement.

This solution allows us to create pipelines using a minimal amount of custom coding. Anyone in the company can do so, and it's just a simple step. If any coding is required then we can use JavaScript.

What needs improvement?

I have been facing some difficulties when working with large datasets. It seems that when there is a large amount of data, I experience memory errors. If there is a large amount of data then there is definitely a lag.

I would like to see a cloud-based deployment because it will allow us to easily handle a large amount of data.

Buyer's Guide
Pentaho Data Integration and Analytics
February 2025
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: February 2025.
839,422 professionals have used our research since 2012.

For how long have I used the solution?

I have been working with Hitachi Lumada Data Integration for almost three years, across two different organizations.

What do I think about the stability of the solution?

There is definitely some lag but with a little improvement, it will be a good fit.

What do I think about the scalability of the solution?

This is a good product for an enterprise-level company.

We use this solution for all of our data integration jobs. It handles the transformation. As our business grows and the demand for data integration increases, our usage of this tool will also increase.

Between versions, they have added a lot of plugins.

How are customer service and support?

The technical support does not reply in a timely manner. I have filled out the support request form, one or two times, asking about different things, but I have not received a reply.

The support they have in place does not work very well. I would rate them one or two out of ten.

How would you rate customer service and support?

Negative

Which solution did I use previously and why did I switch?

In this business, they initially began with this product and did not use another one beforehand. I have also worked on the cloud-level integration tool. 

How was the initial setup?

The initial setup and deployment are straightforward.

I have deployed it on different servers and on average, it takes an hour to complete. I have not read any documentation regarding installation. With my experience, we were able to set everything up.

What's my experience with pricing, setup cost, and licensing?

I primarily work on the Community Version, which is available to use free of charge. I have asked for pricing information but have not yet received a response.

What other advice do I have?

We are currently using version 8.3 but version 9 is available. More features to support big data are available in the newest release.

My advice for anybody who is considering this product is if they're looking for any kind of custom transformation, or they're gleaning data from multiple sources and sending it to multiple destinations, I definitely recommend this tool.

Overall, this is a good product and I recommend it.

I would rate this solution an eight out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer1772286 - PeerSpot reviewer
Director of Software Engineering at a healthcare company with 10,001+ employees
Real User
Reports on predictions that our product is doing. It would be nice if they could have analytics perform well on large volumes.
Pros and Cons
  • "The way it has improved our product is by giving our users the ability to do ad hoc reports, which is very important to our users. We can do predictive analysis on trends coming in for contracts, which is what our product does. The product helps users decide which way to go based on the predictive analysis done by Pentaho. Pentaho is not doing predictions, but reporting on the predictions that our product is doing. This is a big part of our product."
  • "The performance could be improved. If they could have analytics perform well on large volumes, that would be a big deal for our products."

What is our primary use case?

We started using Pentaho for two purposes:

  1. As an ETL tool to bring data in. 
  2. As an analytics tool. 

As our solution progressed, we dropped the ETL piece of Pentaho. We didn't end up using it. What remains in our product today is the analytics tool.

We do a lot of simulations on our data with Pentaho reports. We use Pentaho's reporting capabilities to tell us how contracts need to be negotiated for optimal results by using the analytics tool within Pentaho.

How has it helped my organization?

This was an OEM solution for our product. The way it has improved our product is by giving our users the ability to do ad hoc reports, which is very important to our users. We can do predictive analysis on trends coming in for contracts, which is what our product does. The product helps users decide which way to go based on the predictive analysis done by Pentaho. Pentaho is not doing predictions, but reporting on the predictions that our product is doing. This is a big part of our product.

What is most valuable?

There is an end-to-end flow, where a user can say, "I am looking at this field and want to slice and dice my data based on these parameters." That flexibility is provided by Pentaho. This minimal manual coding is important to us.

What needs improvement?

The performance could be improved. If they could have analytics perform well on large volumes, that would be a big deal for our products.  

For how long have I used the solution?

I have been using it for eight years.

What do I think about the stability of the solution?

We are on-prem. Once the product was installed and up and running, I haven't had issues with the product going down or not being responsive.

We have one technical lead who is responsible for making sure that we keep upgrading the solution so we are not on a version that is not supported anymore. In general, it is low maintenance.

What do I think about the scalability of the solution?

The only complaint that I have with Pentaho has been with scaling. As our data grew, we tested it with millions of records. When we started to implement it, we had clients that went from 80 million to 100 million. I think scale did present a problem with the clients. I know that Pentaho talks about being able to manage big data, which is much more data than what we have. I don't know if it was our architecture versus the product limitations, but we did have issues with scaling.

Our product doesn't deal with big data at large. There are probably 17 million records. With those 17 million records, it performs well when it has been internally cached within Pentaho. However, if you are loading the dataset or querying it for the first time, then it does take awhile. Once it has been cached in Pentaho, the subsequent queries are reasonably fast.

How are customer service and support?

We haven't had a lot of functional issues. We had performance issues, especially early on, as we were trying to spin up this product. The response time from the support group has been a three on a scale of one to five.

We had trouble with the performance and had their engineers come in. We shared our troubles and problems, then those engineers had brainstorming sessions. Their ability to solve problems was really good and I would rate that as four out of five.

A lot of the problems were with the performance and scale of data that we had. It could have been that we didn't have a lot of upfront clean architecture. With the brainstorming sessions, we tried giving two sets of reports to users: 

  1. One was more summary level, which was quick, and that is what 80% of our clients use. 
  2. For 20% of our clients, we provided detailed reports that do take awhile. However, you are then not impacting performance for 80% of your clients. 

This was a good solution or compromise that we reached from both a business and technology perspective. 

Now, I feel like the product is doing well. It is almost like their team helped us with rearchitecting and building product expectations.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

Previously, we used to have something called QlikView, which is almost obsolete now. We had a lot of trouble with QlikView. Anytime processing was done, it would take a long time for those processed results to be loaded into QlikView's memory. This meant that there was a lot of time spent once an operation was done. Before users could see results or reports, it would take a couple of hours. We didn't want that lag. 

Pentaho offered an option not to have that lag. It did not have its own in-memory database, where everything had to be loaded. That was one of the big reasons why we wanted to switch away from QlikView, and Pentaho fit that need.

How was the initial setup?

I would say the deployment/implementation process was straightforward enough for both data ingestion and analytics.

When we started with the data ingestion, we went with something called Spoon. Then we realized, while it was a Pentaho product, Spoon was open source. We had integrated with the open source version of it, but later found that it didn't work for commercialization. 

For us to integrate Pentaho and get it working, it took a couple of months because we needed to figure out authentication with Pentaho. So, learning and deployment within our environment took a couple of months. This includes the actual implementation and figuring out how to do what we wanted to do.

Because this is a licensed product, the deployment for the client was a small part of the product's deployment. So, on an individual client basis, the deployment is easy and a small piece. 

It gives us the flexibility to deploy it in any environment, which is important to us.

If we went to the cloud version of Pentaho, that would be a big maintenance relief. We wouldn't have to worry about getting the latest version, installing it, and sending it out to our clients.

What about the implementation team?

For the deployment, we had people come in from Pentaho for a week or two. They were there with us through the process.

Which other solutions did I evaluate?

We looked at Tableau, Pentaho and an IBM solution. In the absence of Pentaho, we would have gone with either Tableau or building our own custom solution. When we were figuring out what third-party tool to use, we did an analysis and a bunch of other tools were compared. Ultimately, we went with Pentaho because it did have a wide variety of features and functionalities within its reports. Though I wasn't involved, there was a cost analysis done and Pentaho did favorably in terms of cost.

For the product that we use Pentaho for, I think we're happy with their decision. There are a few other products in our product suite. Those products ended up using Tableau. I know that there have been discussions about considering Tableau over Pentaho in the future. 

What other advice do I have?

Engage Pentaho's architects early on, so you know what data architecture works best with the product. We built our database and structures, then had performance issues. However, it was too late when we brought in the Pentaho architects, because our data structure was out in the field with multiple clients. Therefore, I think engaging them early on in the data architecture process would be wise.

I am not very familiar with Hitachi's roadmap and what is coming up for them. I know that they are good with sending out newsletters and keeping their customers in the know, but unfortunately, I am unaware of their roadmap.

I feel like this product is doing well. There haven't been complaints and things are moving along. I would rate it as seven out of 10.

Which deployment model are you using for this solution?

On-premises
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
Buyer's Guide
Pentaho Data Integration and Analytics
February 2025
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: February 2025.
839,422 professionals have used our research since 2012.
reviewer1872000 - PeerSpot reviewer
Senior Data Analyst at a tech services company with 51-200 employees
Real User
We're able to query large data sets without affecting performance
Pros and Cons
  • "One of the most valuable features is the ability to create many API integrations. I'm always working with advertising agents and using Facebook and Instagram to do campaigns. We use Pentaho to get the results from these campaigns and to create dashboards to analyze the results."
  • "Parallel execution could be better in Pentaho. It's very simple but I don't think it works well."

What is our primary use case?

I use it for ETL. We receive data from our clients and we join the most important information and do many segmentations to help with communication between our product and our clients.

How has it helped my organization?

Before we used Pentaho, our processes were in Microsoft Excel and the updates from databases had to be done manually. Now all our routines are done automatically and we have more time to do other jobs. It saves us four or five hours daily.

In terms of ETL development time, it depends on the complexity of the job, but if the job is simple it saves two or three hours.

What is most valuable?

One of the most valuable features is the ability to create many API integrations. I'm always working with advertising agents and using Facebook and Instagram to do campaigns. We use Pentaho to get the results from these campaigns and to create dashboards to analyze the results.

I'm working with large data sets. One of the clients I'm working with is a large credit card company and the database from this client is very large. Pentaho allows me to query large data sets without affecting its performance.

I use Pentaho with Jenkins to schedule the jobs. I'm using the jobs and transformations in Pentaho to create many links. 

I always find ways to have minimal code and create the processes with many parameters. I am able to reuse processes that I have created before. 

Creating jobs and putting them into production, as well as the visibility that Pentaho gives, are both very simple.

What needs improvement?

Parallel execution could be better in Pentaho. It's very simple but I don't think it works well.

For how long have I used the solution?

I've been working with Pentaho for four or five years.

What do I think about the stability of the solution?

The stability is good. 

What do I think about the scalability of the solution?

It's scalable.

How are customer service and support?

I find help on the forums.

Which solution did I use previously and why did I switch?

I used SQL Server Integration Services, but I have much more experience with Pentaho. I have also worked with Apache NiFi but it is more focused on single data processes but I'm always working with batch processes and large data sets.

How was the initial setup?

The first deployment was very complex because we didn't have experience with the solution, but the next deployment was simpler.

We create jobs weekly in Pentaho. The development time takes, on average, one week and the deployment takes just one day or so.

We just put it on Git and pull a server and schedule the execution.

We use it on-premises while the infrastructure is Amazon and Azure.

What other advice do I have?

I always recommend Pentaho for working with automated processes and to do API integrations.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Data Architect at a tech services company with 1,001-5,000 employees
Reseller
Helped us to fully digitalize a national census process, eliminating door-to-door interviews
Pros and Cons
  • "One of the valuable features is the ability to use PL/SQL statements inside the data transformations and jobs."
  • "I would like to see support for some additional cloud sources. It doesn't support Azure, for example. I was trying to do a PoC with Azure the other day but it seems they don't support it."

What is our primary use case?

We use it as an ETL tool. We take data from a source database and move it into a target database. We do some additional processing on our target databases as well, and then load the data into a data warehouse for reports. The end result is a data warehouse and the reports built on top of that.

We are a consulting company and we implement it for clients.

How has it helped my organization?

As a result of one of the projects that we did in the Middle East, we achieved the main goal of fully digitalizing their population census. They did previous censuses doing door-to-door surveys, but for the last census, using Pentaho Data Integration, we managed to get it all running in a fully digital way, with nothing on paper forms. No one had to go door-to-door and survey the people.

What is most valuable?

One of the valuable features is the ability to use PL/SQL statements inside the data transformations and jobs.

What needs improvement?

I would like to see support for some additional cloud sources - Azure, Snowflake.

For how long have I used the solution?

I have been using Hitachi Lumada Data Integration for four years.

What do I think about the stability of the solution?

There have been some bugs and some weird things every now and then, but it is mostly fairly stable.

What do I think about the scalability of the solution?

If you work with relatively small data sets, it's all okay. But if you are going to use really huge data sets, then you might get into a bit of trouble, at least from what I have seen.

How are customer service and support?

The support from Hitachi is not the greatest, the fixing of bugs can take a really long time.

How would you rate customer service and support?

Neutral

How was the initial setup?

The initial setup is very straightforward compared to many other ETL tools. It takes about half a day.

We have about five users, altogether. There are two to three developers, one to two customer people who run the ETLs and one to two admins who take care of the environment itself. It doesn't require much maintenance. Occasionally someone has to restart the server or take a look at logs.

What was our ROI?

Because you can basically get Pentaho Data Integration for free, I would give the cost versus performance a pretty good rating.

Taking the census project that I mentioned earlier as an example (Pentaho Data Integration was not the only contributor though, it was just one part of the whole solution) the statistical authority managed to save huge amounts of money by making the census electronic, versus the traditional version.

What's my experience with pricing, setup cost, and licensing?

You don't need the Enterprise Edition, you can go with the Community Edition. That way you can use it for free and, for free, it's a pretty good tool to use. 

If you pay for licenses, the only thing that you're getting, in addition, is customer support, which is pretty much nonexistent in any case. I would recommend going with the Community Edition.

Which other solutions did I evaluate?

I have had experience with other solutions, but for the last project we did not evaluate other options. Because we had previous experience with Pentaho Data Integration, it was pretty much a no-brainer to use it.

What other advice do I have?

Hitachi Vantara's roadmap is promising. They came up with Lumada and it seems that they do have some ideas on how to make their product a bit more attractive than it currently is.

I'm fairly satisfied with using Pentaho Data Integration. It's more or less okay. When it comes to all the other parts, like Pentaho reports and Pentaho dashboards, things could be better there.

The biggest lesson I've learned from using this solution is that a cheap, open-source tool can sometimes be even more efficient than some of the high-priced enterprise ETL tools. Overall, the solution is okay, considering the low cost. It has all of the main things that you would expect it to have, from a technical perspective.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Analytics Team Leader at HealtheLink
Real User
Enables us to manage our workload and generate a high volume of reporting
Pros and Cons
  • "We're using the PDI and the repository function, and they give us the ability to easily generate reporting and output, and to access data. We also like the ability to schedule."
  • "Since Hitachi took over, I don't feel that the documentation is as good within the solution. It used to have very good help built right in."

What is our primary use case?

We use it to connect to multiple databases and generate reporting. We also have ETL processes running on it.

Portions of it are in AWS, but we also have desktop access.

How has it helped my organization?

The solution has allowed us to automate reporting by automating its scheduling. 

It is also important to us that the solution enables you to leverage metadata to automate data pipeline templates and reuse them. It allows us to generate reports with fewer resources.

If we didn't have this solution, we wouldn't be able to manage our workload or generate the volume of reporting that we currently do. It's very important for us that it provides a single, end-to-end data management experience from ingestion to insights. We are a high-volume department and without those features, we wouldn't be able to manage the current workload.

What is most valuable?

We're using the PDI and the repository function, and they give us the ability to easily generate reporting and output, and to access data. We also like the ability to schedule.

What needs improvement?

Since Hitachi took over, I don't feel that the documentation is as good within the solution. It used to have very good help built right in. There's good documentation when you go to the site but the help function within the solution hasn't been as good since Hitachi took over.

For how long have I used the solution?

I've been using Lumada Data Integration since 2016, but the company has been using it much longer.

We are currently on version 8.3, but we're going to be doing an upgrade to 9.2 next month.

What do I think about the stability of the solution?

The stability is good. We haven't had any issues related to Pentaho.

What do I think about the scalability of the solution?

Its scalability is very good. We use it with multiple, large databases. We've added to it over time and it scales.

We have about 10 users of the solution including a data quality manager, clinical analyst, healthcare informatics analysts, senior healthcare informatics analyst, and an analytics team leader. It's used very extensively by all of those job roles in their day-to-day work. When we add additional staff members, they routinely get access to and are trained on the solution.

How are customer service and support?

Their ability to quickly and effectively solve issues we have brought up is very good. They have a ticketing system and they're very responsive to any tickets we enter. And that's true not only for issues but if we have questions about functionality.

How would you rate customer service and support?

Positive

How was the initial setup?

The solution is very flexible. It's pretty easy to set up connections within the solution.

Maintenance isn't required day-to-day. Our technical staff does the upgrades. They also, on occasion, have to do things like restarting the services, but that's typically related to server issues, not Pentaho itself.

What other advice do I have?

My advice would be to take advantage of the training that's offered.

The query performance of Lumada on large data sets is good, but the query performance is really only as good as the server.

In terms of Hitachi's roadmap, we haven't seen it in a little while. We did have a concern that they're going to be going away from Pentaho and rolling it into another product and we're not quite sure what the result of that is going to be. We don't have a good understanding of what's going to change. That's the concern.

We currently only use Pentaho. We don't have other Hitachi products but we're satisfied with it. We would recommend Pentaho.

Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
reviewer1510395 - PeerSpot reviewer
Technical Manager at a computer software company with 51-200 employees
Real User
Quite simple to learn and there is a lot of information available online
Pros and Cons
  • "Pentaho Data Integration is quite simple to learn, and there is a lot of information available online."
  • "I'm still in the very recent stage concerning Pentaho Data Integration, but it can't really handle what I describe as "extreme data processing" i.e. when there is a huge amount of data to process. That is one area where Pentaho is still lacking."

What is our primary use case?

We have an event planning system, which enables us to obtain a large report. It includes data Mart or data warehouse data. This is where we take data from the IT online system and pass it to the data warehouse. Then, from the data warehouse, they generate reports. We have 6 developers who are using the Panel Data Integrator, but there are no end users. We deploy the product, and the customer uses it for reporting. We have one person who undertakes a regular maintenance activity when it is required.

How has it helped my organization?

As we are a software company, we are using the tools provided with the Pentaho Data Integration for our various teams.

What is most valuable?

Pentaho Data Integration is quite simple to learn, and there is a lot of information available online. It is not a steep learning curve. It also integrates easily with other databases and that is great. We use the provided documentation, which is a simple process for integration compared to other proprietary tools.

What needs improvement?

I don't think they market it that well. We can make suggestions for improvements but they don't seem to take the feedback on board. This contrasts with Informatica who are really helpful and seem to listen more to their customer feedback. I would also really like to see improved data capture. At the moment the emphasis seems to be on data processing. I would like to see a real-time processing data integration tool. This would provide instant reporting whenever the data changes. I'm still in the very recent stage concerning Pentaho Data Integration, but it can't really handle what I describe as "extreme data processing" i.e. when there is a huge amount of data to process. That is one area where Pentaho is still lacking.

For how long have I used the solution?

We have been using Pentaho Data Integration for 6 years. The customer is using Mirabilis Cloud, which is a public cloud. We are currently using version A.3.

How are customer service and support?

Technical Support is really good. To get our answers only takes a little bit of time.

Which solution did I use previously and why did I switch?

One of our customers was completely into the Microsoft core framework. We have to use SSIS because it's readily available with them, and is part of the system. We had to use it for five years. 

As mentioned, one of our teams has worked with Informatica in the past. In terms of integration, Informatica isn't more powerful, but more accurate in some aspects. The community is also quite strong.

How was the initial setup?

The setup of Pentaho Data Integration is straightforward. 

What about the implementation team?

We implemented Pentaho Data Integration in-house. The current deployment has taken three months for the current set of requirements. We have another deployment in the pipeline where we are connecting other different data sources. These projects usually take a few months to complete.

What's my experience with pricing, setup cost, and licensing?

Sometimes we provide the licenses or the customer can procure their own licenses. Previously, we had an enterprise license. Currently, we are on a community license as this is adequate for our needs.

What other advice do I have?

For newcomers to the product, it is best to start with something simple. You can then scale it up fast as it is not a steep learning curve. If somebody wants to set up a good inbound integration platform, they can use the Panel Data Integrator. It's really simple and easy to use. The online community really helps you with numerous issues, such as licensing and a lot of other things. I would rate Pentaho Data Integration 8 out of 10.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Other
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
PeerSpot user
PeerSpot user
CEO with 51-200 employees
Vendor
Easy to use and has a nice GUI. The json input needs to perform better.

What is most valuable?

Ease of use, stability, graphical interface, small amount of "voodoo" and cost.

What needs improvement?

There some steps that should perform better like the json input, but because of the flexibility we at inflow, override it by using scripting steps. Of course it's ideal to use the steps that come with the software but if you can write your own step that's powerful. Also, it would be nice to have the drivers for the data sources shipped with Pentaho Kettle instead of looking for the right ones on the Internet.

What was my experience with deployment of the solution?

In every project there are issues with the deployment, but we were able to overcome them.

What do I think about the stability of the solution?

I think that Pentaho Kettle is a stable software, if it wasn't, we wouldn't have used it (because we don’t like angry customers).

What do I think about the scalability of the solution?

Actually, Pentaho Kettle comes equipped with the option to scale out, out of the box.
And no, we didn't encountered specific scalability problems.

How are customer service and technical support?

Customer Service:

We mainly use material which was written over the years (pentaho kettle materials), the forum, Matt casters blog, videos, etc. We also try to solve our issues inside the company for our customers before contacting customer service. We even developed a full-scale data integration Pentaho Kettle online course and built a website for it.

When we use the customer service it's very good. There is a large community for the tool, people gladly help each another.

Technical Support:

Very good support.

Which solution did I use previously and why did I switch?

Before Pentaho Kettle we used stored procedures, writing code and also Informatica. Informatica is a very good tool, but it is not open source so it is far more costly compared to Pentaho Kettle. From my perspective I don't see the difference, we can do almost everything with Pentaho Kettle and if we need a little extra we are tech guys, we solve it.

Of course that from the customer's perspective the cheaper the better, so if the customer has a smaller budget, they get more when using Pentaho Kettle open source. Even with the Pentaho Kettle enterprise edition.

What's my experience with pricing, setup cost, and licensing?

I can say from the vendor perspective- usually the part of the data integration (from data source to the warehouse/target) takes at least 60% of the whole initial business intelligence project. It depends on the data sources and complexity, for example: big data, NoSql, xml, web services, "weird" files and more.

After the data integration project is "live" it will work fine until someone breaks something. (Network connectivity, servers, DBA that changes the data source, or any other change for that matter that changes variables that the data integration was built upon) but this is true for all data integration software.

The day-to-day costs are very low if there are no new requirements. Luckily for us (as a vendor) once the customer starts and the users get their fancy reports and dashboards there's no turning back, and the requirements are piling up. But these are new requirements, not maintenance.

What other advice do I have?

Instead of trying to decide on a specific data integration tool, pick the right vendor partner, not a biased one. They will be able to recommend the set of tools you need according to your requirements and budget.

Business intelligence project are made up of at least three components:

  • 1. Data integration tool
  • 2. Data warehouse tool
  • 3. Visualization tool

Several of the software vendors have them all, but not the best solution for each component. From my experience it's better to combine solutions. (Unless it is a small project.)

For example: data integration from Pentaho Kettle, if it's big data we need an in memory/ columnar database for data warehouse but if it's not we can use traditional databases (SQL Server, Oracle, even MySQL for smaller projects) and a BI visualization tool like Yellowfin/Tableau/Sisense/etc.

In the middle you have tens of software vendors that can be suitable for the customer needs.

Of course if the vendor partner is biased then suddenly Tableau/Sisense/Qlikview/etc. become the best data integration tool. Or "you don’t need a data integration tool at all" although they don't have the right components. (They are very good tools for visualization but not for "playing" extract and transform complex data). We work with several vendors such as Sisense and Yellowfin which are are great tool for the specific solution they were made for.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer1384743 - PeerSpot reviewer
Specialist in Relational Databases and Nosql at a computer software company with 5,001-10,000 employees
Real User
Free to use, easy to set up, and has a great metadata injection feature
Pros and Cons
  • "The solution has a free to use community version."
  • "It's not very stable, at least not in the case of the community edition. I'm working with the community edition right now and I think perhaps it is because of that it is not very stable, it causes the system to sometimes hang. I'm not sure if this is the case for pair tiers."

What is our primary use case?

The most common use for the solution is gathering data from our databases or files in order to gather them into a different database. Another common use is to compare data between different databases. Due to a lack of integrity, you can attach these to synchronization issues.

What is most valuable?

One important feature, in my opinion, is the Metadata Injection. It gives flexibility to the scripts due to the fact that the scripts don't depend on a fixed structure or a fixed data model. Instead, you can develop transformations that are not dependant on the fixed structure or data models. 

Let me give a pair of examples. Sometimes your tables change, adding fields or dropping some of them. When this happens if you have a transformation without using Metadata Injection your transformation fails or doesn't manage the whole info from the table. If you use Metadata Injection instead, the new fields are included and the dropped columns are excluded from the transformation. Other times you have a complex transformation to apply to a lot of different tables. Traditionally, without the Metadata Injection feature, you had to repeat the transformation for each table, adapting the transformation to the concrete structure of each table. Fortunately, with the Metadata Injection, the same transformation is valid for all the tables you want to treat. A little bit effort gives you a great benefit.

Furthermore, the solution has a free to use community version.

The solution is easy to set up, very intuitive, clear to understand and easy to maintain.

What needs improvement?

I'm currently looking at a new competitor that's got some interesting features that this solution doesn't have. I have found this competitor has a feature braking system that is not present in the Pentaho Data Integration approach. The way their system sets can somehow maintain a track for the last executions and store the state which gives you the potential to run from the point that it ended the last time. It's very interesting. It would be nice if Pentaho had this type of feature.

Often you are required to install plugins. If you need to have access to, in my case, Neo4j databases new folder databases, you do need a plugin to do it.

For how long have I used the solution?

Between my current role and the role at my last company, I've been working with the solution for over five years.

What do I think about the stability of the solution?

It's not very stable, at least not in the case of the community edition. I'm working with the community edition right now and I think perhaps it is because of that it is not very stable, it causes the system to sometimes hang. I'm not sure if this is the case for pair tiers.

What do I think about the scalability of the solution?

I am the only person using the solution currently. There are two other people that occasionally also assist in it. I'm helping them understand the tool and they are beginning to use it. In that sense, we're slowly scaling.

I don't know if the solution scales well on a large scale, however.

It scales very well, overall with the very useful feature to run n copies to Start attribute in every step, perhaps balancing with the side effect of consuming a lot of memory and CPU resources.

How are customer service and support?

We haven't really contacted technical support in the past. We try to handle any issues ourselves in-house. I can't speak to the quality of the technical support, having never directly dealt with them.

Which solution did I use previously and why did I switch?

We've never really used another solution like this in our organization. This is the first.

How was the initial setup?

The solution is pretty simple to set up. It's not complex.

For our, deployment took about one month.

Maintenance is easy. The only maintenance tasks are to upgrade to the newer versions and backing up the repository frequently.

What about the implementation team?

I handled the implementation on my own. I didn't need any help from a reseller or consultant.

What's my experience with pricing, setup cost, and licensing?

We're using the community edition, which is free to use. I'm not sure how much their paid services cost. We haven't purchased any licensing.

What other advice do I have?

We're just users of the solution. We don't have a professional relationship with the company.

The solution is great to use and easy to share with teams via the central repository. It's very functional overall. I'd recommend the solution to other companies.

I'd rate the solution eight out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Pentaho Data Integration and Analytics Report and get advice and tips from experienced pros sharing their opinions.
Updated: February 2025
Product Categories
Data Integration
Buyer's Guide
Download our free Pentaho Data Integration and Analytics Report and get advice and tips from experienced pros sharing their opinions.