We performed a comparison between AWS Glue and IBM Infosphere DataStage based on our users’ reviews in four categories. After reading all of the collected data, you can find our conclusion below.
Comparison Results: For users vested in the AWS ecosystem, AWS is hands down the best choice. Users are happier with the pricing, too. IBM Infosphere can handle a significant amount of data quickly and easily. Once IBM Infosphere DataStage finetunes processes and moves toward a greater focus on cloud technologies, it will become a more desirable solution in today’s cloud-focused marketplace.
"The most valuable feature of AWS Glue is its ease of use and good documentation. Additionally, we can do all the transformations that we need."
"I like its integration and ability to handle all data-related tasks."
"The solution is highly user-friendly, and its features are easy to use. The new addition of AWS Glue Data Catalog is also very beneficial, making the tool even more helpful for its users."
"AWS Glue is fast and managed by AWS. Hence, you don't have to worry about capacity and the performance of Glue jobs. It has integrations with other data stores of AWS. The product offers metadata management, logging, and ETL processing capabilities. It comes with a powerful feature, Glue Studio, which helps to do queries interactively within the community. It is a managed service and very secure. Another popular and mature service is S3."
"AWS Glue's best features are scalability and cloud-based features."
"I like the fact that AWS Glue works with Python scripts."
"I like that it's flexible, powerful, and allows you to write your own queries and scripts to get the needed transformations."
"The most valuable feature of AWS Glue is that it provides a GUI format with a drag-and-drop feature."
"When we have needed help from the IBM team, they were helpful. Our company is a premium partner so we get fast responses."
"The most valuable feature of the solution is the ability to incorporate very complex business rules in Data Stage."
"The performance optimization is quite good in DataStage. It provides parallelism and pipelining mechanisms"
"The product is a stable and powerful data management solution that can run in parallel mode for enhanced speed."
"IBM is stable and accurate to monitor. It's easy to understand to monitor the data lineage from source to target."
"Highly customizable: Allowing you to handle multiple data latencies (scheduled batch, on-demand, and real-time) in the same job."
"The most valuable feature is the ability to transfer information via notes."
"The solution is stable."
"Currently, it supports only two languages in the background: Python and Scala. From our customization point of view, it would be helpful if it can also support Java in the background."
"The product has only a few built-in transformations."
"The interface for AWS Glue could improve, they do not put a lot of details. You can write the code, in PySpark or in Scala, which is a big advantage, it is only easy to use for a developer. It will be difficult for new users to enter the cloud environment."
"The product is expensive for data streaming. This area needs improvement."
"In terms of performance, if they can further optimize the execution time for serverless jobs, it would be a welcome improvement."
"AWS Glue is more costly compared to other tools like Airflow."
"The setup and installation is a bit complex without advanced knowledge or training."
"Only people who can code, either in Java or Python, can use the product freely. Those who don't know Java or Python might find using AWS Glue difficult."
"The initial setup could be more straightforward."
"The initial setup can be complex."
"I really like this tool, but the administration should be on the same client application because a lot of administration features are not on the client-side, and they usually need to have administrative access. It's quite complicated to force IT teams to have separate administrative access from the developers."
"The interface needs improvement."
"The solution should be more user-friendly."
"Reduced cost would allow more customers to choose the product. It's quite expensive in relation to the cost of other similar solutions."
"DataStage is quite expensive. It is too hard to find a consultant using DataStage in Turkey."
"The graphical user interface (GUI) feels a lot like the interfaces from the 1980s."
AWS Glue is ranked 1st in Cloud Data Integration with 37 reviews while IBM InfoSphere DataStage is ranked 7th in Data Integration with 37 reviews. AWS Glue is rated 7.8, while IBM InfoSphere DataStage is rated 7.8. The top reviewer of AWS Glue writes "Provides serverless mechanism, easy data transformation and automated infrastructure management". On the other hand, the top reviewer of IBM InfoSphere DataStage writes "User-friendly with a lot of functions for transmission rules, but has slow performance and not suitable for a huge volume of data". AWS Glue is most compared with AWS Database Migration Service, Informatica PowerCenter, Informatica Cloud Data Integration, SSIS and Matillion ETL, whereas IBM InfoSphere DataStage is most compared with SSIS, IBM Cloud Pak for Data, Azure Data Factory, Talend Open Studio and Oracle GoldenGate. See our AWS Glue vs. IBM InfoSphere DataStage report.
See our list of best Cloud Data Integration vendors.
We monitor all Cloud Data Integration reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.