We primarily use the solution for ETL jobs and movement of data.
IT Administrator at a transportation company with 10,001+ employees
A good data solution for ETL jobs that improves task performance
Pros and Cons
- "The solution has improved the time it takes to perform tasks related to batch applications."
- "The solution should be more user-friendly."
What is our primary use case?
How has it helped my organization?
The solution has improved the time it takes to perform tasks related to batch applications.
What is most valuable?
The parallel job scan has been a very valuable feature.
What needs improvement?
The solution should be more user-friendly.
Buyer's Guide
IBM InfoSphere DataStage
October 2024
Learn what your peers think about IBM InfoSphere DataStage. Get advice and tips from experienced pros sharing their opinions. Updated: October 2024.
814,649 professionals have used our research since 2012.
For how long have I used the solution?
I've been using the solution for 10 years.
What do I think about the stability of the solution?
The solution is quite stable.
What do I think about the scalability of the solution?
I've never investigated the scalability, but I can say that it only scales on the machine. I can have three machines running in parallel. We have about 20 people using the solution, and they are mostly developers and admins. We mainly use the solution for APL jobs.
How are customer service and support?
On a scale of one to ten, I would give technical support and eight.
Which solution did I use previously and why did I switch?
We were mostly using ETLs on mainframe jobs.
How was the initial setup?
It's a straightforward process. In terms of deployment from the data stage, it takes about a week.
What about the implementation team?
We use an integrator to assist with implementation.
What other advice do I have?
The advice I would give to others is to make sure they define a framework for development and for management. This could be very useful for the future of the product in the company.
I would rate the entire solution eight out of ten. I really like DataStage. The product fits our requirements perfectly.
We are changing the product now, however, to a cloud-based approach for DataStage.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
IT Analyst at Tata Consultancy Services
A stable solution for loading data into our data warehouse
Pros and Cons
- "The most valuable feature is the data integration for data warehousing."
- "The response time from support is slow and needs to be improved."
What is our primary use case?
We primarily use this product for data loading. We take data from different applications and databases and put it into our data warehouse.
What is most valuable?
The most valuable feature is the data integration for data warehousing.
What needs improvement?
The response time from support is slow and needs to be improved.
For how long have I used the solution?
I have been using InfoSphere DataStage for almost eight years.
What do I think about the stability of the solution?
This is a stable product.
What do I think about the scalability of the solution?
We have almost 30 people who use InfoSphere DataStage.
How are customer service and technical support?
Technical support can be slow.
What other advice do I have?
This product has a lot of good features.
I would rate this solution an eight out of ten.
Disclosure: My company has a business relationship with this vendor other than being a customer: partner
Buyer's Guide
IBM InfoSphere DataStage
October 2024
Learn what your peers think about IBM InfoSphere DataStage. Get advice and tips from experienced pros sharing their opinions. Updated: October 2024.
814,649 professionals have used our research since 2012.
Technical Lead at a tech services company with 5,001-10,000 employees
A powerful solution for complex transactions with an extremely straightforward setup
Pros and Cons
- "DataStage works better with Linux operating systems when the application services are hosted on Linux system equipment, but it's powerful on Windows too."
- "I really like this tool, but the administration should be on the same client application because a lot of administration features are not on the client-side, and they usually need to have administrative access. It's quite complicated to force IT teams to have separate administrative access from the developers."
What is most valuable?
DataStage itself is a very powerful tool and you have a lot of transformations that you can do. In comparison to Informatica, you can run very complex transactions on it. It's a precise and powerful environment. And when you have ETLs and you all your data documented on the data manager and you have the rest of management from IBM itself on InfoSphere, it's very powerful, especially when you use the whole suite. It helps with end-user technologies and gives you better imaging.
It's powerful in administration mode. It works well when you're using solutions like DataConnect, IBM CDC, etc.
DataStage works better with Linux operating systems when the application services are hosted on a Linux system equipment, but it's powerful on Windows too.
Also, no matter what language you use, you can always transform the data.
I'm not sure about the latest versions, however, as I'm on 11.3, which is about five years old.
What needs improvement?
I really like this tool, but the administration should be on the same client application because a lot of administration features are not on the client-side, and they usually need to have administrative access. It's quite complicated to force IT, teams, to have separate administrative access from the developers.
The platform also needs more stability. It caches a lot. It crashes on the application servers that the host allows on the platform.
The solution needs better online tools for data, or for sourcing data on the internet. They have InfoSphere exchange but it's not as useful for DataStage.
For how long have I used the solution?
I've been using the solution for three years.
What do I think about the stability of the solution?
The stability is related to the operating system your company uses. It crashes occasionally when you're hosting the application servers, but only in the servers on the operating system. With Linux, because Linux itself is very robust, it crashes less. For example, we have a telecom with 40 million users and using Linux it crashed maybe two times a year. However, when the solution was hosted on the Windows platform, it crashed two to three times a month.
The crashes are related to memory and if it's not automated, you have to deal with it manually. On Windows, you have to release the cache memories manually, but on Linux, you don't have to do that, which is why you get less crashing.
What do I think about the scalability of the solution?
The solution is very scalable, but at a certain point, it consumes a lot of resources. In order to scale, you need a lot of memory and a lot of people.
How are customer service and technical support?
The quality of technical support you can expect usually depends on the region. In Egypt, it was not great, but in Jordan, Dubai, and Kuwait, it was good. That was a couple of years ago. I'm not using the solution right now, so I can't say for sure if this is still the case.
How was the initial setup?
The solution has one of the most straightforward setups. It's even easier than Office.
What other advice do I have?
The last version I interacted with was 11.3 because the later versions were cloud-based and usually our customers didn't want to use the solution on the cloud.
In terms of advice, I would give to anyone trying to implement the solution is this: you to have accurate sizing. Clients always do the sizing wrong and they need more experience to get the sizing right. Setting up the environments takes sizing into account but it usually makes a lot of problems if the sizing is poor when it starts to operate. Then you have re-implement and it will require an increase in resources that will change your budget.
I would rate this solution nine out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Data/Solution Architect at a computer software company with 51-200 employees
Robust, easy to use, has a simple error logging mechanism, and works very well for huge volumes of data
Pros and Cons
- "As a data integration platform, it is easy to use. It is quite robust and useful for volumetric analysis when you have huge volumes of data. We have tested it for up to ten million rows, and it is robust enough to process ten million rows internally with its parallel processing. Its error logging mechanism is far simpler and easier to understand than other data integration tools. The newer version of InfoSphere has the data catalog and IDC lineage. They are helpful in the easy traceability of columns and tables."
- "Its documentation is not up to the mark. While building APIs, we had a lot of problems trying to get around it because it is not very user-friendly. We tried to get hold of API documentation, but the documentation is not very well thought out. It should be more structured and elaborate. In terms of additional features, I would like to see good reporting on performance and performance-tuning recommendations that can be based on AI. I would also like to see better data profiling information being reported on InfoSphere."
What is our primary use case?
We use it for creating a pattern for data integration with our data vault. We have also used it for creating APIs.
What is most valuable?
As a data integration platform, it is easy to use. It is quite robust and useful for volumetric analysis when you have huge volumes of data. We have tested it for up to ten million rows, and it is robust enough to process ten million rows internally with its parallel processing.
Its error logging mechanism is far simpler and easier to understand than other data integration tools.
The newer version of InfoSphere has the data catalog and IDC lineage. They are helpful in the easy traceability of columns and tables.
What needs improvement?
Its documentation is not up to the mark. While building APIs, we had a lot of problems trying to get around it because it is not very user-friendly. We tried to get hold of API documentation, but the documentation is not very well thought out. It should be more structured and elaborate.
In terms of additional features, I would like to see good reporting on performance and performance-tuning recommendations that can be based on AI. I would also like to see better data profiling information being reported on InfoSphere.
For how long have I used the solution?
It was DataStage previously, and then it became InfoSphere. I have used DataStage for ten years and InfoSphere for one year.
What do I think about the stability of the solution?
It is quite stable. In the newer components of InfoSphere, you have a mapping tool called FastTrack and a metadata generator, which can have issues from time to time, but they get resolved.
What do I think about the scalability of the solution?
It is not that easy to scale on-premises. I have worked on the ones deployed on Windows or Unix, and scalability is often dependent on whether you can add more CPUs or boxes. On the cloud, it would have been easier to scale. However, the current version can only be deployed on Windows or Unix.
How are customer service and technical support?
I have not been in touch with them recently. Earlier, I was in touch with their technical support and had raised tickets because some weird errors, such as fantom error, were being logged in the error log, which made no sense. We used to get in touch with their support team to understand these.
Which solution did I use previously and why did I switch?
I have used Informatica and SAS CA. IBM InfoSphere has the highest cost of licensing as compared to others. It is not very widely used, and it is very difficult to find people who have this sort of knowledge.
The newer version of Informatica is on the cloud and is much more user-friendly than InfoSphere because it provides profiling information in nice graphs and charts. It also provides a lot of templates. For example, if I want to build a whole dimensional kind of structure, Informatica has a template. I just need to use that template. So, the ease of use is far better in Informatica, and it has everything that InfoSphere has. The only thing is that Informatica comes in bundles. That's the reason sometimes organizations don't go for it. For example, the data integration is a separate section, and the data quality is a separate section. They have separate pricing.
How was the initial setup?
The initial setup is quite simple. It didn't take more than half an hour to set it up on my laptop.
What about the implementation team?
I implemented it myself. In terms of maintenance, a particular version might not require any maintenance. There could be bug fixes and minor versions going in for some versions.
What's my experience with pricing, setup cost, and licensing?
It is quite expensive.
What other advice do I have?
I would recommend this solution for large-scale implementation where you need a complex transformation and data integration to happen according to a structured format, either a data vault or a dimension model. It is suitable for big companies because of the cost. It is a very valuable platform for data in large volumes. For small volumes, you have other open-source tools that can do the same thing for you.
I am part of a consultancy, and I have deployed this product for companies. We have five to eight developers. Because InfoSphere is a licensed product, and its licenses cost a lot, there are not many InfoSphere developers.
I would rate IBM InfoSphere DataStage an eight out of ten.
Which deployment model are you using for this solution?
On-premises
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Managing Partner at a tech services company with 11-50 employees
A good platform for integrating data and has good ETL features
Pros and Cons
- "The ETL tools are probably the most valuable feature. It has an IBM tool, a friendly UI and it makes things more comfortable."
- "Reduced cost would allow more customers to choose the product. It's quite expensive in relation to the cost of other similar solutions."
What is our primary use case?
My company is a consulting firm and I'm a managing partner. We consult for large size companies in the telecommunications, banking and insurance sectors. We partner with IBM. We're not using the latest version of DataStage but one of the more recent ones. I use the product for a migration project. It's a DB2 to Oracle migration project and the source database is mainframe. I need DataStage for that purpose.
What is most valuable?
I think the ETL tools are the most valuable features. It has an IBM tool, a friendly UI and it makes things more comfortable. A second good feature is that after you make some ETLs, migrations source to target, DataStage is capable of providing details and extracting data. The push down feature is also a valuable feature.
What needs improvement?
The price would be the first thing I would want to change. Reduced cost would allow more customers to choose the product. It's quite expensive in relation to the cost of other similar solutions. I think it would also be helpful if the product was more adaptable to other platforms and vendors. I would also like to see an improvement in support.
For how long have I used the solution?
I've been using the product for more than two years.
What do I think about the stability of the solution?
The product is stable.
What do I think about the scalability of the solution?
The product is scalable.
How are customer service and technical support?
We do sometimes have issues in terms of support with regard to IBM products in this country. It's difficult to get what we want and it could really be improved and made more efficient.
How was the initial setup?
I recall that the setup was quite difficult. We had to call in some IBM people. It's possible that they have improved that aspect. I know there's a difference in installation depending on whether it's on-prem or cloud. It's possible that the cloud version is easier to install but I don't have experience with that.
What other advice do I have?
I would rate InfoSphere an eight out of 10.
Which deployment model are you using for this solution?
On-premises
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner.
Owner at 7Spring Consult
A powerful tool with parallel data streams
Pros and Cons
- "The data lineage report can be filtered for reporting. The reports are user-friendly and take less time to find what you need."
- "We would be happy to see in next versions the ability to return several parameters from jobs. Now, jobs can return just one parameter. If they could return several parameters, that would be great."
What is our primary use case?
It is in the environment of our client, who is a large Russian bank. They are in the top 20, as of August, and have the re-maintenance project of their data warehouse solution based on IBM technologies. They use IBM BWD, a banking data model, on Netezza and DataStages in ETL tools. It is a native case.
We are using the on-premise deployment model.
How has it helped my organization?
Our main goal of this project is to increase the efficiency of the usage of this solution and help the bank to get money from the data.
What is most valuable?
The data lineage report can be filtered for reporting. The reports are user-friendly and take less time to find what you need.
It is a powerful tool with parallel data streams.
What needs improvement?
The previous project was based on Microsoft SQL. It moved huge amounts of data from different data sources and DataStage to a middle stage, then moved it to Netezza. This created a bottleneck in the solution. We are trying to streamline it and create ETL processes. These will take data exactly from the data sources and move them to Netezza without using of a middle database. The volume of data is quite detailed. We are talking about records in the tens to hundreds of millions.
We would be happy to see in next versions the ability to return several parameters from jobs. Now, jobs can return just one parameter. If they could return several parameters, that would be great.
We would be happy if the IBM could give us more tolerance for bad networks or VPN channels, as this happens from time to time.
It would be great if we could use more than one SQL operator in the Source DB connector stage. Currently, in the target DB connection stage, we can use several SQL operators, but in the Source DB connector stage we can use only one. It would be better if we could use several.
Data Vault is become more popular. It would be great if it appeared in the newest versions.
I would like them to have more database procedures.
For how long have I used the solution?
We began using it in September last year.
What do I think about the stability of the solution?
It is quite stable. I haven't seen any pop up errors. It works properly.
They fixed some bugs in version 11.5.02. It works well now.
What do I think about the scalability of the solution?
It is quite scalable.
DataStage is okay, but the problem of scalability is with another component of the solution (Netezza). The main problem is with the client version of Netezza. IBM stops to support it, then they tell us that we need move to the next version of Netezza. However, the price is too high for the client and we need to look for another platform.
The client thinks that Datastage can stay in place with another platform.
There are not more than five data analysts and administrators using DataStage because it works at night with ETL processes. Therefore, end users are not using it. The several people who maintain and administer it are the users.
We have two data specialist who work with it. From the bank, there are about five people who use it.
How are customer service and technical support?
I haven't used the technical support.
Which solution did I use previously and why did I switch?
Our client previously used SSIS from Microsoft. They also used Oracle. However, they did not have a special solution for ETL. Ten years ago, they used another data warehouse solution which used XML files as a transport layer.
DataStage is a directly specialized ETL tool which has instruments built for the ETL process as a stream. It can visualize and can track the ETL process, integrating it with the data governance catalog along with other IBM instruments. Previous solutions, except for SSIS, were just a number of scripts which created a process like peer-to-peer. It wasn't a centralized ETL tool with centralized ETL governance.
How was the initial setup?
It was straightforward technically.
What about the implementation team?
Three to four years ago, they decided to start a new data warehouse project. They were working with another Ukrainian company, which engineered this solution. However, the solution hadn't made it to production because of some problems between the understanding of IT and business. They tried to move it to production several times. After that, they decided to do some technical audits for this solution. They& asked us to come and see the solution, then write the audit report, which we did. Then, they asked us what to do with these problems, and this is when we began to help them.
All the components were already in place. We changed it a bit, tweaked the ETL processes, and changed some structures in the data warehouse. This solved the current needs of their business.
The deployment is continuous. We are working on this project currently. It should take another year. At the moment, we have some Agile processes, in which we are finding new business needs. We try to understand them, then deploy the current user story.
What was our ROI?
The main problem of this project is they are trying to move the old solution to production in order to begin getting return on investment.
What's my experience with pricing, setup cost, and licensing?
There were no problems with the licensing model for the bank.
What other advice do I have?
It is the best solution in the IBM environment. It uses IBM data models, such as data quality tools.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
System Engineer at a energy/utilities company with 51-200 employees
Easy-to-deploy product with good scalability
Pros and Cons
- "The product is easy to deploy."
- "There could be more customization options for the product."
What needs improvement?
There could be more customization options for the product.
For how long have I used the solution?
We have been using IBM InfoSphere DataStage for 20 years. At present, we are using version 11.7.
What do I think about the stability of the solution?
I rate IBM InfoSphere DataStage’s stability a five out of ten.
What do I think about the scalability of the solution?
The product is suitable for enterprise companies. We have 100 users for it. I rate the platform’s scalability a seven out of ten. It is easily scalable compared to other systems.
How are customer service and support?
The complexity of the technical support services depends on the contact person. Sometimes, it is a good experience, while sometimes a poor experience communicating with their executives.
How would you rate customer service and support?
Neutral
How was the initial setup?
The product is easy to deploy. I rate the process an eight or nine. The deployment time depends on the specific requirements of customers. It takes approximately three months to complete. It requires a team of five to 100 people to execute it, depending on the company size.
What's my experience with pricing, setup cost, and licensing?
The product is expensive. I rate its pricing a ten out of ten.
What other advice do I have?
I rate IBM InfoSphere DataStage an eight out of ten.
Which deployment model are you using for this solution?
On-premises
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Owner at 7Spring Consult
Reliable, simple to install, and useful
Pros and Cons
- "It is quite useful and powerful."
- "It would be useful to provide support for Python, AR, and Java."
What is our primary use case?
I am a consultant. I provide product information for our clients.
What is most valuable?
IBM InfoSphere DataStage is a good product.
It is quite useful and powerful.
What needs improvement?
From a practice point of view, solutions such as IBM InfoSphere DataStage and Oracle Data Integrator are losing ground, whereas open-source solutions are becoming increasingly powerful.
For example, we are currently working hard on several examples, and in a few years, open-source solutions will take the lead in the market. It will be used by large enterprises.
Clients are looking for open-source solutions more and more.
It would be useful to provide support for Python, R, and Java.
For how long have I used the solution?
I have more than 22 years of experience with many different products.
It has been three to four years that we have been using IBM InfoSphere DataStage.
What do I think about the stability of the solution?
I have no issues with the stability of IBM InfoSphere DataStage.
How are customer service and support?
Clients are quite dependant on support from the vendor. For example, if you want to activate a new feature on the product, you must create a ticket. You have no information on when it will be implemented, and the vendor does not know because they have a stream of tickets that are completed by the priority given to the ticket.
Which solution did I use previously and why did I switch?
I am a consultant. I have different projects with different platforms. We are constantly going back and forth to different solutions for different projects.
I have had clients who have used Amazon Redshift.
Over the years, my clients have used many different products. For example, they use IBM Landscape and we use IBM InfoSphere.
How was the initial setup?
The initial setup was straightforward. We did not have issues.
What's my experience with pricing, setup cost, and licensing?
Comparable solutions will have common disadvantages, which is the total cost of the project.
It's quite expensive.
Which other solutions did I evaluate?
From time to time, I evaluate different products for my clients.
What other advice do I have?
We have had different projects with three of four clients. The average term per project has been nine months and one year.
If you are working with an open-source solution or another solution, you can implement some features by yourself. For example, in the case of Amazon, which has Amazon Lambda, you can easily write your code in Python or Java, and it will orchestrate it. You can create your features yourself easily and gives you more abilities to make your solution run quicker, eliminating the dependence from the vendor.
I would rate IBM InfoSphere DataStage an eight out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Download our free IBM InfoSphere DataStage Report and get advice and tips from experienced pros
sharing their opinions.
Updated: October 2024
Product Categories
Data IntegrationPopular Comparisons
Informatica Intelligent Data Management Cloud (IDMC)
Azure Data Factory
Informatica PowerCenter
MuleSoft Anypoint Platform
Oracle Data Integrator (ODI)
webMethods.io
Talend Open Studio
AWS Database Migration Service
Oracle GoldenGate
SAP Data Services
StreamSets
Buyer's Guide
Download our free IBM InfoSphere DataStage Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- How do you compare Informatica PowerCenter with IBM DataStage?
- Would you upgrade to more premium versions of IBM InfoSphere DataStage?
- Is IBM InfoSphere DataStage more difficult to use compared to other tools in the field?
- Do you rely on IBM Cloud Paks for your data? Have you utilized this product, or do you use IBM InfoSphere DataStage without it?
- When evaluating Data Integration, what aspect do you think is the most important to look for?
- Microsoft SSIS vs. Informatica PowerCenter - which solution has better features?
- What are the best on-prem ETL tools?
- Which integration solution is best for a company that wants to integrate systems between sales, marketing, and project development operations systems?
- Experiences with Oracle GoldenGate vs. Oracle Data Integrator?
- What are the must-have features for a Data integration system?