We have integrated multiple data sources into a single data warehouse. For this, we used to build complex ETL jobs and datasets to integrate data from multiple sources into a single data warehouse. So, these are basically the use cases.
Allows for the integration of multiple data sources into a single data warehouse but there is potential for scalability improvement
Pros and Cons
- "In IBM DataStage, the Transformer is the most valuable feature for me. It enables me to apply complex transformations, generate the gateway key, and map source tables into the session table."
- "So, there are some features that are missing. If I compare DataStage to Talend, Talend allows you to write custom code in Java or use these tools in your applications as well if you are building a job application. But in DataStage, it does not allow you to write custom code for any component."
What is our primary use case?
What is most valuable?
In IBM DataStage, the Transformer is the most valuable feature for me. It enables me to apply complex transformations, generate the gateway key, and map source tables into the session table.
What needs improvement?
So, there are some features that are missing. If I compare DataStage to Talend, Talend allows you to write custom code in Java or use these tools in your applications as well if you are building a job application. But in DataStage, it does not allow you to write custom code for any component.
Moreover, Talend allows you to extract Java code and call it in your APIs or applications, DataStage does not have this feature.
In future releases, DataStage could benefit from the ability to save metadata into a database. So, if the database crashes or you lose the data in the database, you could recover it. Unlike files, which are harder to manage.
For how long have I used the solution?
I have been using this solution for five years.
Buyer's Guide
IBM InfoSphere DataStage
November 2024
Learn what your peers think about IBM InfoSphere DataStage. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
816,660 professionals have used our research since 2012.
What do I think about the stability of the solution?
I would rate the stability of this solution a seven out of ten.
What do I think about the scalability of the solution?
I would rate the scalability of this solution a five out of ten. It should be improved. We have almost eight end users in our area. Some are engineers, one is an administrator, and two of them monitor the pipelines. The rest are developers.
We plan to increase the further usage.
How are customer service and support?
In terms of documentation and support, IBM is reliable for providing support to its partners or those with licenses. You can easily find problem resolution support online.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
I have previously worked on SSIS. After using SSIS, I moved to DataStage. We explored IBM DataStage for our specific needs.
How was the initial setup?
I would rate my experience with the initial setup a seven out of ten, where one is difficult and ten is easy. I have worked both on-premises and deployed it on a private cloud.
The deployment process usually takes a day.
What about the implementation team?
For deployment, you first need to install DataStage on the desired server. Then, you have to take a backup of the development and deploy it on the server. After importing, you need to execute and schedule it through your job application.
People required for the deployment depends on the scenario. Sometimes, one person is more than enough for deployment.
What's my experience with pricing, setup cost, and licensing?
Pricing is handled by the procurement department. But compared to other enterprise tools like Informatica or Pentaho, IBM DataStage is quite cheaper.
What other advice do I have?
I would highly recommend this solution because of its shared-nothing architecture that it uses, the capabilities it offers, and the fact that every feature has its own use. For example, it has a Director for creating jobs, clients for monitoring and scheduling jobs, and an Administrative client for administration purposes. This is something well managed by IBM.
Overall, I would rate the solution a seven out of ten. There are certain areas of improvement.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Program Manager at a consultancy with 10,001+ employees
The solution can incorporate very complex business rules, is moderately scalable, and is stable
Pros and Cons
- "The most valuable feature of the solution is the ability to incorporate very complex business rules in Data Stage."
- "The solution can be a bit more user-friendly, similar to Informatica."
What is our primary use case?
The solution is mainly used for, marketing campaigns, customer segmentation, and home loans.
What is most valuable?
The most valuable feature of the solution is the ability to incorporate very complex business rules in Data Stage.
What needs improvement?
The solution can be a bit more user-friendly, similar to Informatica.
I would like the solution to have some basic streaming functionality added.
For how long have I used the solution?
I have been using the solution for one year.
What do I think about the stability of the solution?
We don't currently have much in our production environment. We are gradually moving into production, so whatever small setup we have is okay for now. I'm taking the overall perspective into account and I think we do have dependencies on the other jobs. This is purely based on the feedback we receive, which sometimes means that we're not able to run our process because there are dependencies, similar to other jobs also. The jobs don't complete on time. We received feedback that there was a problem handling data, which caused jobs to fail and needed to be rerun. This could be product-specific, design-specific, or anything else, but I think there is room for improvement in terms of stability. I would give the solution a seven out of ten.
What do I think about the scalability of the solution?
I think that scalable systems should also have good performance. The scalability of this solution in my opinion may not be on the same level as Informatica Power Exchange Data Integration.
I give the scalability of the solution a seven out of ten. We are facing problems whenever we have huge amounts of data and there are job failures. We need to take care of how to tackle that situation.
How was the initial setup?
We didn't need to do anything because the customer, with whom we are working on the project, had already set everything up for us. The initial setup was not in our preview.
What other advice do I have?
I give the solution a seven out of ten.
We have a separate platform team or support team. In case of any query, it used to be routed to this team, which was internally used to deal with the Data Stage people.
I'm not a technical expert because I haven't been a developer for 12 years. This is what I understand from the feedback I've received. Informatica Power Exchange Data Integration is much better from a scalability perspective, compared to IBM InfoSphere Data Stage. Scalability, user-friendliness, and inclusion of different business rules are all important, but I think Informatica Power Exchange Data Integration gives us one step further on that.
Which deployment model are you using for this solution?
Hybrid Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Google
Disclosure: My company has a business relationship with this vendor other than being a customer:
Buyer's Guide
IBM InfoSphere DataStage
November 2024
Learn what your peers think about IBM InfoSphere DataStage. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
816,660 professionals have used our research since 2012.
Data engineer at nust
A scalable ETL tool with a slow connection that can make it time-consuming to work on
Pros and Cons
- "The solution's scalability is really good...we are using multi-instance jobs where you can scale them easily."
- "It takes a lot of time to actually trigger your job and then go into the logs and other stuff. So all of this is really time-consuming."
What is our primary use case?
Right now, I'm working for a telecom company. So, we are using IBM InfoSphere DataStage for constructing ETL jobs for them so that they can load data from their various different sources into their warehouse.
What is most valuable?
The valuable feature of the solution is, I think, its functionality. So, there are a lot of transformations that you can apply by just using a transformer. Also, you don't need to complicate your SQL queries while trying to transform your data. Hence, the transformer is something I like in the solution.
What needs improvement?
I don't know if it's just a problem with me, but the issue I see is that when we connect to the server from the client, especially when you're going to run a job or something, the whole connection is really slow. It takes a lot of time to actually trigger your job and then go into the logs and other stuff. So all of this is really time-consuming.
For how long have I used the solution?
I have been using IBM InfoSphere DataStage for five years. Also, I am using IBM InfoSphere DataStage Version 11.7. My company is a consultant for DataStage.
What do I think about the stability of the solution?
Most of the time, it is stable. Sometimes there are some issues you don't understand and go away when you have a read-only job. But that is quite rare. Other times, it seems quite stable.
What do I think about the scalability of the solution?
The solution's scalability is really good. In terms of parallel jobs, we are using multi-instance jobs where you can scale them easily.
In my company, my team is spread across multiple countries, including Pakistan and India.
How are customer service and support?
I haven't contacted IBM's technical support.
How was the initial setup?
The solution's initial setup is straightforward. Also, it's a one-time activity. It is better to have a competent person for deployment since newbies cannot do it themselves.
When I started using IBM InfoSphere DataStage, it was already deployed on the server. So I did not have to go through the installation phase.
What was our ROI?
ROI is something that the client takes care of, and I think they must be keeping track of it and getting a certain result indicating a good ROI. So, that's why they may have continued using it over the years.
Which other solutions did I evaluate?
Before DataStage, I did not evaluate other options. Our client was already comfortable with DataStage, so that's what we had to use.
What other advice do I have?
I recommend that other people who want to use it go for DataStage on the cloud. The on-prem version of the solution looks and feels old. Also it's time-consuming as well. Overall, I rate the solution a six out of ten.
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Owner at mip GmbH
Powerful, reliable and the ability to run it in parallel mode makes it very fast
Pros and Cons
- "The product is a stable and powerful data management solution that can run in parallel mode for enhanced speed."
- "The interface needs work to be more user-friendly."
What is our primary use case?
We are a consultancy company, so we are not using DataStage for our own purposes. We deploy it for our customers. We use it to supply data integration and data warehousing solutions based on the specific needs of clients.
How has it helped my organization?
The product has improved our organization by allowing us to provide the product to clients as a reliable data management solution.
What is most valuable?
The product is a very powerful data management tool and the ability to run it in the parallel mode makes it very, very fast. I would say that the ability to use parallel mode would be one of the most valuable features.
What needs improvement?
The features that could be better starts with the user interface. It has been getting better in the last releases and in the past few years, and I guess that they will continue to make progress on this front. But even with the improvements that they have made, it could be even better now, and really should be. I think it's a little bit difficult to use because of the interface. Being user-friendly is important for any product and they need to make this adjustment.
In addition to improvements in the base user interface, I would say it would be good to incorporate more interface options for cloud-based systems.
For how long have I used the solution?
The organization has been using this solution for about 10 years.
What do I think about the stability of the solution?
The product is very stable. The stability enhances the fact that it is powerful and fast, so it is a reliable solution with good performance.
What do I think about the scalability of the solution?
I feel that the product has excellent scalability. We currently have more than 15,000 users as clients worldwide and scalability has never presented as an issue. It's used in the U.S., in Asia, and in Europe, so it seems to perform in various markets satisfactorily.
It's an enterprise-enabled product, so with that designation, it really needs to be scalable to satisfy the needs of clients — and those needs change all the time. It has the ability to connect reliably to a lot of sources and this is a very, very important thing for people who are using it.
How are customer service and technical support?
We do have some experience with technical support and customer services. We have access to the IBM software hotline, which is normal for IBM. They work with the FTS (Follow the Sun) support model which means they are available 24/7. The availability is good and the response is as well. I would say it is good support.
How was the initial setup?
The initial setup of the product is straightforward. The first deployment may be more complex in one case than another. It depends on the complexity of the processes an organization is building and what they need to consider in future planning. Normally you can install it without much trouble and have your first processes live within a few days.
After that, the project is ongoing and it becomes more complex as you build it out. Normally everything does not have to be in place from the beginning as the solution may be deployed to solve new or future issues. In other words it is not replacing something that is already functioning, it is providing something new. As you build out, you get more processes going and the setup becomes more complex. But the initial setup is quite simple.
What about the implementation team?
We do the deployment with our own team for the client, but the implementation can change depending on the client needs. When the client has specific things that need to be resolved for their situation or they want to install and to implement additional products to integrate with the base solution, that affects the rollout. It's especially true during the implementation stage. The implementation team can begin with as little as one person and it can end up as a team of five, six, or seven members depending on complexity and needs for rapid deployment. It also depends on whether the product is going to be used throughout the company. There are a lot of customers who deploy globally or selectively because they may have a strategy already in place for certain solutions that they may not want to change. For example, they may already have ETL processes being used without DataStage and it may not make sense to convert these processes.
What's my experience with pricing, setup cost, and licensing?
It is very difficult to say how much the product costs because there are variables depending on the configuration. Normally it's priced according to use, so the price can vary quite a lot. The more you use, the more you pay.
In comparison to other products, I would say it's not so expensive as Informatica, but it is intended to be an enterprise solution so it's not very cheap to deploy as products that are not enterprise solutions.
The products we offer are really very different in pricing compared to open-source products. With open-source you have only the maintenance cost. For the software products we use, you have to invest in the software and then the maintenance costs are in addition to that.
There are no other costs in addition to the standard licensing fee and the maintenance. With IBM, you typically pay for the licenses and the first 12 months of maintenance is included in that cost. Afterward, you pay for the maintenance year-to-year.
Which other solutions did I evaluate?
We currently also use PowerCenter from Informatica as a solution for some clients. It isn't really a previous solution or a solution we evaluated and discarded but it is one that we sometimes use instead of DataStage. It depends on the needs of our customers.
The decision on which product to pick has partly to do with what the client wants to do and what we believe is the better solution for them and their needs. We have some projects where we use PowerCenter simply because our customer wants to use it; we have other projects where we use DataStage because of some of these customers are already using DataStage or they prefer it because it is from IBM.
In a similar way, we sometimes use Microsoft Integration Services, which is a very small part of our business. But again it isn't so much that we evaluated the solution and dismissed it or switched from using it. These products are opportunities for us to choose between in order to provide the best solution for clients. We evaluate the options and choose the best fit for the project.
What other advice do I have?
I would rate this particular product as a nine out of ten. It is very powerful and very fast, but the problems with the interface make it less than perfect.
As far as other advice that I would have for other people considering this as a solution, the first and most important is to examine your needs and decide on the processes you want to build. From that, you can immediately have a better idea of the type of solution that might be best for you. Then it is a good idea to get the advice of a consultant — like us.
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner.
Senior Data Warehouse Developer at a computer software company with 5,001-10,000 employees
Stable and scalable with a straightforward setup
Pros and Cons
- "Finding logs is very easy on the solution."
- "The template mapping could be easier."
What is our primary use case?
We primarily use the solution for the UTS tool as well a for billing and data. We use it to clean the data from different systems and for pulling in data.
What is most valuable?
The solution is good for bringing in data from third-party systems.
Various aspects of the solution are valuable, but it often depends on the use cases.
Finding logs is very easy on the solution.
Overall, compared to SAS or Informatica, the solution is much easier to navigate.
What needs improvement?
The mod options should be simplified. Some options on DataStage aren't working properly.
The solution needs to lower its price.
The template mapping could be easier.
The solution should allow for compression of data.
For how long have I used the solution?
I've been using the solution for about 12 years.
What do I think about the stability of the solution?
Typically, the solution is stable.
What do I think about the scalability of the solution?
The solution is scalable. We have about six or seven clients using the solution currently.
How are customer service and technical support?
We've never been in direct contact with IBM's technical support. We use a third party, so if we have issues, we turn to them for troubleshooting.
Which solution did I use previously and why did I switch?
We previously used tools such as Informatica. We've also previously used SQL for billing.
How was the initial setup?
The initial setup is easy for the DataStage, but for billing and metadata, it gets more complicated. In our case, before installing the metadata, the proper documentation was not there, which complicated things a bit.
What about the implementation team?
We handled the implementation ourselves.
What's my experience with pricing, setup cost, and licensing?
The solution is quite expensive in comparison to similar solutions.
Which other solutions did I evaluate?
We did approach other vendors before ultimately choosing IBM.
What other advice do I have?
We use the on-premises deployment model.
If you are comparing the solution to Informatica, this solution is much simpler. In Informatica, for example, there might be two to three ways to find a log, but with DataStage, they make it much easier. However, compared to other vendors, IBM's licensing costs are more expensive.
I'd rate the solution eight out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Managing Partner at a tech services company with 11-50 employees
Easy to understand to monitor the data lineage from source to target but pricing could be better
Pros and Cons
- "IBM is stable and accurate to monitor. It's easy to understand to monitor the data lineage from source to target."
- "DataStage is quite expensive. It is too hard to find a consultant using DataStage in Turkey."
What is our primary use case?
IBM InfoSphere DataStage is a core ETL tool. We use it with source systems like mainframes. DataStage is perfectly suited for extracting data from mainframes.
What is most valuable?
IBM is stable and accurate to monitor. It's easy to understand to monitor the data lineage from source to target.
What needs improvement?
DataStage is quite expensive. It is too hard to find a consultant using DataStage in Turkey.
For how long have I used the solution?
I have been using IBM InfoSphere DataStage for three years. I also used this solution for two years back in 2009-10.
What do I think about the stability of the solution?
The product is stable.
I rate the solution’s stability a nine out of ten.
What do I think about the scalability of the solution?
I rate the solution’s scalability an eight out of ten.
How are customer service and support?
The quality and response time of support is fine. It's pretty quick.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
Informatica is the first choice for me. It's easy to use and not so expensive compared to DataStage.
How was the initial setup?
You install IBM InfoSphere DataStage once you've set it up properly. It's robust and reliable, but initially configuring it can be challenging.
What other advice do I have?
The first consideration is the type of source system they have, whether it is a mainframe or not. Another key indicator for me to suggest DataStage is if the client has other IBM ecosystems, such as data quality or IBM governance tools. This makes it highly suitable because you can easily establish data lineage.
Overall, I rate the solution a seven out of ten.
Which deployment model are you using for this solution?
On-premises
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Partner at Avydium Data LLC
Its parallel processing capability allows you to go through extremely large data sets in no time at all
Pros and Cons
- "Highly customizable: Allowing you to handle multiple data latencies (scheduled batch, on-demand, and real-time) in the same job."
- "Working with some of the big data components is good, but I can see improvements are needed."
What is our primary use case?
Complex data integration projects which require integration from multiple data sources.
How has it helped my organization?
I have worked during many implementations using DataStage. All of the projects that I worked on have been successful. This is due mainly to the strict discipline around best practices, and by following a set of standards and templates designed to reduce complexity and improve automation, including strong reference architecture.
What is most valuable?
- Its parallel processing capability allows you to go through extremely large data sets in no time at all, if you do your job right.
- Highly customizable: Allowing you to handle multiple data latencies (scheduled batch, on-demand, and real-time) in the same job.
- High scalability: Start small and go big with the same job. You just need to adjust the configuration file, no need to recompile.
- Strong metadata management: Business, technical, and process metadata can all be managed from a single place.
- Ease of integration with other tool sets: Easily supports APIs (or build your own) to support data streaming (or batched) from other systems.
- Data Quality Management from within the tool: Supporting data sampling, including profiling of data, directly from the development canvas.
What needs improvement?
High-cost of ownership: They could take a page from open source software, such as Talend.
Working with some of the big data components is good, but I can see improvements are needed, such as native support for Spark and HBase.
For how long have I used the solution?
More than five years.
What do I think about the stability of the solution?
No issues.
What do I think about the scalability of the solution?
No issues.
How are customer service and technical support?
Support is always good.
Which solution did I use previously and why did I switch?
Have used quite a few ETL tools in my job.
- Ab Initio: Even pricier, but has a highly competent ETL tool. It is complete, but hard to use.
- Informatica: Not as flexible and does not support the same level of complexity in its maps.
- Talend: It is a good tool suite, extensive, but can be cumbersome to cite all its pieces.
- ODI: For the Oracle centric world.
- SSIS: Week when compared to any of the above tool sets.
How was the initial setup?
Depends on type of environment that is being installed. I have seen fairly simple to overly complex initial setups due to the environment, not due to the tool.
What about the implementation team?
Both vendor and in-house team implementations:
IBM has top-notch support and tool services along with other partners as well. Depending on the partner, this can go from installation and configuration to solution development, etc.)
Most in-house teams that I have seen tend to have have good developers, but not always good architects. Like most every data integration project, if you do not have a strong architecture, your solution will eventually fail.
What was our ROI?
Depends on the project.
Which other solutions did I evaluate?
Have done many ETL tool evaluations based on client requirements. DataStage has always been in the top-three. It may not have been selected due to different weights being used for different sections of the evaluation for different clients, but it has always been in the top-three consistently.
What other advice do I have?
If you have the budget and your solution requires industrial/enterprise strength data integration, this product is always a good choice.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Project Manager at Blue Technology
A highly-stable and scalable solution with seamless integration capabilities
Pros and Cons
- "The concept of integration is a valuable feature of the product."
- "The graphical user interface (GUI) feels a lot like the interfaces from the 1980s."
What is our primary use case?
We use it for many kinds of projects. For example, we use it for business intelligence, master data management, data quality, data governance, data integration to SAP, Oracle retail, and real-time integration.
What is most valuable?
The concept of integration is a valuable feature of the product. The product excels in this area. In fact, it is one of the two leading products in the integration field.
Based on my experience, interacting with data through integration has been more profitable than any other products or projects. Data integration is a reliable feature.
What needs improvement?
The graphical user interface (GUI) feels a lot like the interfaces from the 1980s. Regarding IBM, they initially indicated that version 8.11.7 would be their final release. We require more connectors to establish clear connections to cloud services. Connecting to the cloud is not easy or transparent within the product. Although the product has the potential to connect to the cloud, configuring and setting it up is challenging.
In future releases, connecting to the cloud should be easy and transparent.
For how long have I used the solution?
I’ve been using this solution since 2005. I’m currently using version 11.7.
What do I think about the stability of the solution?
The stability of the solution is ten on ten.
What do I think about the scalability of the solution?
The scalability of the solution is nine on ten.
How was the initial setup?
Setting up the system is difficult and requires a lot of technical expertise. It demands a high level of experience during the setup process.
However, despite the difficulty, our team, possibly the only one in Mexico, is good at quickly completing the setup.
Most of our implementations or installations are on-premises, accounting for approximately 90% of the total. The remaining 10% of our implementations are on the cloud.
What was our ROI?
The product has proven to be exceptional and highly competitive in the market. As a result, I have experienced substantial financial success with the product. There are many examples we have for the ROI.
What's my experience with pricing, setup cost, and licensing?
The price of the product is reasonable, especially for mid-sized companies. We have been successfully selling to several mid-sized companies.
Which other solutions did I evaluate?
The market is indeed filled with numerous solutions. Currently, I am exploring Microsoft's offerings, particularly their integration capabilities. Also, the purpose and functionalities of Azure are pretty interesting. I have been working with Quick Link, and specifically, I have been working with Azure Database.
What other advice do I have?
If you want to learn about integrating data, this product offers a valuable learning opportunity. Then you can consider transitioning to the next product in line. The underlying concept remains the same.
I believe this solution is more reliable and provides a better understanding of data integration, data quality management, and master data management. It covers all aspects of data management, making it easier to learn.
This is my perspective, possibly influenced by my extensive experience working with this product for many years. It is the main strength of this solution.
Overall, I would rate the solution a ten out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Download our free IBM InfoSphere DataStage Report and get advice and tips from experienced pros
sharing their opinions.
Updated: November 2024
Product Categories
Data IntegrationPopular Comparisons
Informatica Intelligent Data Management Cloud (IDMC)
Azure Data Factory
Informatica PowerCenter
Oracle Data Integrator (ODI)
Talend Open Studio
Oracle GoldenGate
SAP Data Services
Qlik Replicate
StreamSets
Alteryx Designer
Buyer's Guide
Download our free IBM InfoSphere DataStage Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- How do you compare Informatica PowerCenter with IBM DataStage?
- Would you upgrade to more premium versions of IBM InfoSphere DataStage?
- Is IBM InfoSphere DataStage more difficult to use compared to other tools in the field?
- Do you rely on IBM Cloud Paks for your data? Have you utilized this product, or do you use IBM InfoSphere DataStage without it?
- When evaluating Data Integration, what aspect do you think is the most important to look for?
- Microsoft SSIS vs. Informatica PowerCenter - which solution has better features?
- What are the best on-prem ETL tools?
- Which integration solution is best for a company that wants to integrate systems between sales, marketing, and project development operations systems?
- Experiences with Oracle GoldenGate vs. Oracle Data Integrator?
- What are the must-have features for a Data integration system?