Complementary Worker On Assignment at a manufacturing company with 10,001+ employees
Real User
Top 20
2024-11-05T10:41:28Z
Nov 5, 2024
I'm not confident in highlighting any potential room for improvement with Azure Data Factory at this time. To the best of my knowledge, it is satisfactory as it is.
When we initiated the cluster, it took some time to start the process. Most of our time was spent ensuring the cluster was adequately set up. We transitioned from using the auto integration runtime to a custom integration runtime, which showed some improvement.
Director - Emerging Technologies at Speridian Technologies
Real User
Top 20
2024-07-18T13:09:00Z
Jul 18, 2024
While it has a range of connectors for various systems, such as ERP systems, the support for these connectors can be lacking. Take the SAP connector, for example. When issues arise, it can be challenging to determine whether the problem is on Microsoft's side or SAP's side. This often requires working with both teams individually, which can lead to coordination issues and delays. It would be beneficial if Azure Data Factory provided better support and troubleshooting resources for these connectors, ensuring a smoother resolution of such issues.
The main challenge with implementing Azure Data Factory is that it processes data in batches, not near real-time. To achieve near real-time processing, we need to schedule updates more frequently, which can be an issue. Its interface needs to be lighter. One specific issue is with parallel executions. When running parallel executions for multiple tables, I noticed a performance slowdown.
Solution Architect at a computer software company with 1,001-5,000 employees
Real User
Top 20
2024-03-19T17:22:00Z
Mar 19, 2024
When working with AWS, we have noticed that the difference between ADF and AWS is that AWS is more customer-focused. They're more responsive compared to any other company. ADF is not as good as AWS, but it should be. If AWS is ten out of ten, ADF is around eight out of ten. I think AWS is easier to understand from the GUI perspective compared to ADF.
Azure Data Factory could benefit from improvements in its monitoring capabilities to provide a more robust feature set. Enhancing the ease of deployment to higher environments within Azure DevOps would be beneficial, as the current process often requires extensive scripting and pipeline development. It is also known for the flexibility of the data flow feature, particularly in supporting more dynamic data-driven architectures. These enhancements would contribute to a more seamless and efficient workflow within GitLab.
Senior Devops Consultant (CPE India Delivery Lead) at a computer software company with 201-500 employees
Real User
Top 20
2024-03-13T08:10:00Z
Mar 13, 2024
The product's technical support has certain shortcomings, making it an area where improvements are required. Instead of sending out documents, I think the tool's support team should focus on how to troubleshoot issues. I want the tool's support team to have real-time interaction with users. The product's price can be problematic for small businesses, making it an area where improvements are required.
Data Governance/Data Engineering Manager at National Bank of Fujairah PJSC
Real User
Top 10
2024-03-06T16:25:15Z
Mar 6, 2024
There is room for improvement primarily in its streaming capabilities. For structured streaming and machine learning model implementation within an ETL process, it lags behind tools like Informatica. Snowflake is also more efficient for loading data into Snowflake, whether from Blob storage or AWS. From our experience, ADF is mainly useful for batch processing. I'm not sure how its streaming capabilities compare to others for industry-wide use cases.
Implementing a standard pricing model at a more affordable rate could make it accessible to a larger number of companies. Currently, smaller businesses face a disadvantage in terms of pricing, and reducing costs could address this issue.
Senior Data Engineer at a photography company with 11-50 employees
Real User
Top 5
2023-07-17T20:50:00Z
Jul 17, 2023
There aren't many third-party extensions or plugins available in the solution. Adjunction or addition of third-party extensions or plugins to Azure Data Factory can be a great improvement in the tool. Creation of custom codes, custom extensions, or third-party extensions, like Lookup extension, should be made possible in the tool. I am unsure if Azure Data Factory bridges the gap between on-premises, cloud, and hybrid solutions. I would like to see a version that would work equally well in both on-premises and cloud environments. I would like to see the aforementioned offerings made to customers as valuable alternatives to the old SSIS tool.
Director of Business Intelligence Analytics at The Gibraltar Group - Insurance Services
Real User
Top 20
2023-04-05T15:40:22Z
Apr 5, 2023
Sometimes when I run some jobs, I have issues with the log flow. I want to see where the data goes. I want to see the data stream. I'd like more integrations with other APIs. Sometimes I need to do some coding, and I'd like to avoid that. I'd like no-code integrations.
Data Strategist, Cloud Solutions Architect at BiTQ
Real User
Top 10
2022-12-23T08:54:07Z
Dec 23, 2022
Improvement could be made around streaming data because I feel that the Data Factory product is mainly geared for batch processing and doesn't yet have in-built streaming data processing. Perhaps it's on the way and they are making some changes to help facilitate that. If they were to include better monitoring that would be useful. I'd like to see improved notifications of what the actual errors are.
CTO at a construction company with 1,001-5,000 employees
Real User
2022-12-22T07:08:44Z
Dec 22, 2022
The pricing model should be more transparent and available online. When you start programming, you define the fields, variables, activities, and components but don't know the implication on price. You get a general idea but the more activities you add, the more you pay. It would be better to know price implications up front. There is a calculator you can run to simulate price but it doesn't help a lot. Practically speaking, you have to build your job and run it to see the exact price implications. This is an issue because you might realize you are paying too much so you have to reprogram or change things.
Senior Consultant at a computer software company with 1,001-5,000 employees
Consultant
2022-11-25T11:59:02Z
Nov 25, 2022
I would like to see this time travel feature in Snowflake added to Azure Data Factory. In addition to taking care of data internally, how it is instead of using indexing, they have some different mechanisms to quickly execute the query instead of normal indexes. Both of these would be great to see implemented in future upgrades.
Engineering Manager at a energy/utilities company with 10,001+ employees
Real User
2022-10-11T14:23:14Z
Oct 11, 2022
They introduced the concept of Flowlets, but it has bugs. Flowlets are a reusable component that allows you to create data-flows. We can configure a Flowlet as a reusable pipeline and plug it inside different data-flows, so we don't have to rewrite our code or visual transformation. If we make any changes in our data-flow, it reverts all our changes to the original state of the Flowlet. It does not retain changes, and we must reconfigure the Flowlets repeatedly. We had these issues three months ago so things might have changed. It works fine whenever we plug it in and configure it in our data-flow, but if we make minor changes to it, the Flowlet needs to be reconfigured again and loses the configuration.
This solution is currently only useful for basic data movement and file extractions, which we would like to see developed to handle more complex data transformations.
The list of issues and gaps in this tool is extensive. It includes: 1) Missing email/SMTP activity. 2) Mapping data flows requires significant lag time to spin up spark clusters. 3) Performance compared to SSIS. Expect copy activity to take ten times that of what SSIS takes for simple data flow between tables in the same database. 4) It's missing a debug of a single activity. 5) Oath2.0 adapters lack automated support for refresh tokens. 6) Copy activity errors provide no guidance as to which column is causing a failure. 7) There's no built-in pipeline exit activity when encountering an error. 8) AutoResolveIntegration runtime should never pick a region that you're not using (should be your default for your tenant). 9) Resolve IR queue time lag. For example a small table copy activity I just ran took 95 seconds queuing and 12 seconds to actually copy the data. 10 Copy activity upsert option is great if you think watching trees grow is an action movie. They need to fix the bugs, for example: 1) Debug sometimes stops picking up saved changes for a period of time, rendering this essential tool useless during that time. 2) Enable interactive authoring (a critical tool for development) often doesn't turn on when enabled without going into another part of the tool to enable it. And then you have to wait several minutes before it's enabled which is time your blocked from development until it's ready. And then it only activates for up to 120 minutes before you have to go through this all over again. I think Microsoft is trying to torture developers. 3) Exiting the inside of an activity that contains other activities always causes the screen to jump to the beginning of a pipeline requiring re-navigating to where you were at (greatly slowing development productivity). 4) AutoResolveIntegration runtime (using default settings) often picks remote regions to operate, which causes either an unnecessary slowdown or an error message saying it's unable to transfer the volume of data across regions. 5) Copy activity often gets error "mapping source is empty" for no apparent reason. If you play with the activity such as importing new metadata then it's happy again. This sort of thing makes you want to just change careers. Or tools.
Data Factory has so many features that it can be a little difficult or confusing to find some settings and configurations. I'm sure there's a way to make it a little easier to navigate. In the main ADF web portal could, there's a section for monitoring jobs that are currently running so you can see if recent jobs have failed. There's an app for working with Azure in general where you can look at some segs in your account. It would be nice if Azure had an app that lets you access the monitoring layer of Data Factory from your phone or a tablet, so you could do a quick check-in on the status of certain jobs. That could be useful.
Some stuff can be better, however, overall it's fine. The performance and stability are touch and go. The deployment should be easier. We’d like the management of the solution to run a little more smoothly.
Chief Strategist & CTO at a consultancy with 11-50 employees
Real User
2022-06-03T17:08:21Z
Jun 3, 2022
The documentation could be improved. They require more detailed error reporting, data normalization tools, easier connectivity to other services, more data services, and greater compatibility with other commonly used schemas. I would like to see a better understanding of other common schemas, as well as a simplification of some of the more complex data normalization and standardization issues. It would be helpful to have visibility, or better debugging, and see parts of the process as they cycle through, to get a better sense of what is and isn't working. It's essentially just a black box. There is some monitoring that can be done, but when something goes wrong, even simple fixes are difficult to troubleshoot.
Microsoft is constantly upgrading its product. Changes can happen every week. Every time you open Data Factory you see something new and need to study what it is. I would like to be informed about the changes ahead of time, so we are aware of what's coming. In future releases, I would like to see Azure Data Factory simplify how the information of logs is presented. Currently, you need to do a lot of clicks and go through steps to find out what happened. It takes too much time. The log needs to be more user-friendly.
The performance could be better. It would be better if Azure Data Factory could handle a higher load. I have heard that it can get overloaded, and it can't handle it.
I was planning to switch to Synapse and was just looking into Synapse options. I wanted to plug things in and then put them into Power BI. Basically, I'm planning to shift some data, leveraging the skills I wanted to use Synapse for performance. I am not a frequent user, and I am not an Azure Data Factory engineer or data engineer. I work as an enterprise architect. Data Factory, in essence, becomes a component of my solution. I see the fitment and plan on using it. It could be Azure Data Factory or Data Lake, but I'm not sure what enhancements it would require. User-friendliness and user effectiveness are unquestionably important, and it may be a good option here to improve the user experience. However, I believe that more and more sophisticated monitoring would be beneficial.
It does not appear to be as rich as other ETL tools. It has very limited capabilities. It simply moves data around. It's not very good after that because it's taking the data to the next level and modeling it.
Interactive authoring turns off by default after 60 minutes of inactivity but it's needed for pretty much any debugging and previewing data. This gets super annoying after a while and slows productivity. Need to be able to override the 60 minutes of inactivity.
The dynamic content editor needs to mature in its syntax checking and assisting coding. For example, it is nice that it lists activity outputs but (and this is a big BUT) if you select an output it doesn't actually select the output variable value but rather the parent name. So it syntax checks but tries to feed the parent JSON name as the value. Seriously? Why? That single thing was the hardest issue to learn when I first started with the tool.
You have to look at activity JSON output and know what to add to the activity output in the dynamic content to get what you need. If you don't, it runs but you can go crazy trying to figure out why the logic isn't working. That's just an example where the syntax checking and validation of pipelines just don't check as much as you'd expect.
IT Functional Analyst at a energy/utilities company with 1,001-5,000 employees
Real User
2022-03-31T20:05:00Z
Mar 31, 2022
One area for improvement is documentation. At present, there isn't enough documentation on how to use Azure Data Factory in certain conditions. It would be good to have documentation on the various use cases. Sometimes, it's really difficult to find the answers to very technical questions regarding certain conditions.
Technical Director, Senior Cloud Solutions Architect (Big Data Engineering & Data Science) at NorthBay Solutions
Consultant
2021-11-05T19:42:49Z
Nov 5, 2021
Data Factory is embedded in the new Synapse Analytics. The problem is if you're using the core Data Factory, you can't call a notebook within Synapse. It's possible to call Databricks from Data Factory, but not the Spark notebook and I don't understand the reason for that restriction. To my mind, the solution needs to be more connectable to its own services. There is a list of features I'd like to see in the next release, most of them related to oversight and security. AWS has a lake builder, which basically enforces the whole oversight concept from the start of your pipeline but unfortunately Microsoft hasn't yet implemented a similar feature.
Senior Data Engineer at a real estate/law firm with 201-500 employees
Real User
2021-11-04T19:59:04Z
Nov 4, 2021
The only thing I wish it had was real-time replication when replicating data over, rather than just allowing you to drop all the data and replace it. It would be beneficial if you could replicate it. Real-time replication is required, and this is not a simple task.
Lead BI&A Consultant at a computer software company with 10,001+ employees
Real User
2021-08-31T13:03:00Z
Aug 31, 2021
We didn't have a very good experience. The first steps were very easy but it turned out that we used Europe for a Microsoft data center, also partly abroad for our alpha notes. As soon as we started using Azure Data Factory, the bills got higher and higher. At first we couldn't understand why, but it is very expensive to put data into a data center abroad. So instead, we decided to use only Northern Europe, which worked out for a while in the beginning. And then we had nothing to show for it. They gave me a really hard time for this. Azure Data Factory should be cheaper to move data to a data center abroad for calamities in case of disasters. What I really miss is the integration of Microsoft TED quality services and Microsoft Data services. If they were to combine those features in Data Factory, I think they would have a very strong proposition. They promise something like that on Microsoft Congress. That was years ago and it's still not here.
There is always room to improve. There should be good examples of use that, of course, customers aren't always willing to share. It is Catch-22. It would help the user base if everybody had really good examples of deployments that worked, but when you ask people to put out their good deployments, which also includes me, you usually got, "No, I'm not going to do that." They don't have enough good examples. Microsoft probably just needs to pay one of their partners to build 20 or 30 examples of functional Data Factories and then share them as a user base.
Principal Engineer at a computer software company with 501-1,000 employees
Real User
2021-05-17T14:02:46Z
May 17, 2021
Snowflake connectivity was recently added and if the vendor provided some videos on how to create data then that would be helpful. I think that everything is there, but we need more tutorials.
.NET Architect at a computer software company with 10,001+ employees
Real User
2021-04-17T15:16:14Z
Apr 17, 2021
It would be better if it had machine learning capabilities. For example, at the moment, we're working with Databricks and Azure Data Factory. But Databricks is very complex to do the different data flows. It could be great to have more functionalities to do that in Azure Data Factory.
General Manager Data & Analytics at a tech services company with 1,001-5,000 employees
Real User
2021-03-10T08:56:59Z
Mar 10, 2021
I'm more of a general manager. I don't have any insights in terms of missing features or items of that nature. Integration of data lineage would be a nice feature in terms of DevOps integration. It would make implementation for a company much easier. I'm not sure if that's already available or not. However, that would be a great feature to add if it isn't already there.
Senior Manager at a tech services company with 51-200 employees
Real User
2021-02-14T15:56:02Z
Feb 14, 2021
My only problem is the seamless connectivity with various other databases, for example, SAP. Our transaction data there, all the maintenance data, is maintained in SAP. That seamless connectivity is not there. Basically, it could have some specific APIs that allow it to connect to the traditional ERP systems. That'll make it more powerful. With Oracle, it's pretty good at this already. However, when it comes to SAP, SAP has its native applications, which are the way it is written. It's very much AWS with SAP Cloud, so when it comes to Azure, it's difficult to fetch data from SAP. The initial setup is a bit complex. It's likely a company may need to enlist assistance. Technical support is lacking in terms of responsiveness.
Director at a tech services company with 1-10 employees
Real User
2020-12-09T10:31:00Z
Dec 9, 2020
We are too early into the entire cycle for us to really comment on what problems we face. We're mostly using it for transformations, like ETL tasks. I think we are comfortable with the facts or the facts setting. But for other parts, it is too early to comment on. We are still in the development phase, testing it on a very small set of data, maybe then the neatest four or bigger set of data. Then, you might get some pain points once we put it in place and run it. That's when it will be more effective for me to answer that.
Azure Technical Architect at Hexaware Technologies Limited
Vendor
2020-10-27T00:17:44Z
Oct 27, 2020
Understanding the pricing model for Data Factory is quite complex. It needs to be simplified, and easier to understand. We have experienced some issues with the integration. This is an area that needs improvement.
Business Unit Manager Data Migration and Integration at a tech services company with 201-500 employees
Real User
2020-10-21T11:46:00Z
Oct 21, 2020
The number of standard adaptors could be extended further. What we find is that if we develop data integration solutions with Data Factory, there's still quite a bit of coding involved, whereas we'd like to move in a direction with less coding and more select-and-click.
I find that Azure Data Factory is still maturing, so there are issues. For example, there are many features missing that you can find in other products. You cannot use a custom data delimiter, which means that you have problems receiving data in certain formats. For example, there are problems dealing with data that is comma-delimited.
CTO at a construction company with 1,001-5,000 employees
Real User
2020-08-19T07:57:30Z
Aug 19, 2020
The pricing scheme is very complex and difficult to understand. Analyzing it upfront is impossible, so we just decided to start using it and figure out the costs on a weekly or monthly basis.
Azure Architect\Informatica ETL Developer at Relativity
Vendor
2020-07-13T06:55:56Z
Jul 13, 2020
Azure Data Factory is a bit complicated compared to Informatica. There are a lot of connectors that are missing and there are a lot of instances where I need to create a server and install Integration Runtime. The support and the documentation can be improved. There are a lot of tasks that you need to write code for.
Data Strategist, Cloud Solutions Architect at BiTQ
Real User
Top 10
2020-06-15T07:33:55Z
Jun 15, 2020
I'm not sure if I have any complaints about the solution at the moment. There are a few bits and pieces that we would like to see improved. These include improvements related to the solution's ease of use and some quality flash upgrades. However, these are minor complaints. If the user interface was more user friendly and there was better error feedback, it would be helpful.
Azure Technical Architect at Hexaware Technologies Limited
Vendor
2019-12-31T09:39:00Z
Dec 31, 2019
The user interface could use improvement. It's not a major issue but it's something that can be improved. It has the ability to create separate folders to organize objects, Data Factory objects. But any time that we created a folder we were not able to create objects. We had to drag and drop into the folder. There were no default options. It was manual work. We offered their team our feedback and they accepted my request.
Team Leader at a insurance company with 201-500 employees
Real User
2019-12-23T07:05:00Z
Dec 23, 2019
The only thing that we're struggling with is increasing the competency of my team. So we think that the Microsoft documentation is too complicated. I would like to see it more connected. I know they're working on the Snowflake data warehouse connector, but more connectors would be helpful.
Head of IT at a logistics company with 10,001+ employees
Real User
2019-12-16T08:14:00Z
Dec 16, 2019
Because I have not really done a really deep benchmark against competitors, I may not be familiar enough with the potential of competing products and capabilities to be able able to say what is missing or should be improved definitively. From my perspective, the pricing seems like it could be more user-friendly. Of course, nothing is ever as inexpensive as you want. Perhaps one good additional feature would be incorporating more ways to import and export data. It would be nice to have the product fit our service orchestration platform better to make the transfer more fluid.
Sr. Technology Architect at Larsen & Toubro Infotech Ltd.
Real User
2019-12-09T10:58:00Z
Dec 9, 2019
At this point in time, they should work on somehow integrating the big data capabilities within it. I have not explored it, but it would be good if somehow we could call a Spark job or something to do with the Spark SQL within ADS so that we wouldn't need a Spark tested outside. On the UI side, they could make it a little more intuitive in terms of how to add the radius components. Somebody who has been working with tools like Informatica or DataStage gets very used to how the UI looks and feels. In ADS, adding a new table or joining a new table and overriding that with an override SQL that I could customize would be helpful. Being able to debug from the design mode itself would be helpful.
Microsoft Consultant at a tech services company with 201-500 employees
Consultant
2019-12-05T11:14:00Z
Dec 5, 2019
It would be helpful if they could adjust the data capture feature so that when there are source-side changes ADF could automatically figure it out. The solution needs to integrate more with other providers and should have a closer integration with Oracle BI.
The solution could use some merge statements. The solution should offer better integration with Azure machine learning. We should be able to embed the cognitive services from Microsoft, for example as a web API. It should allow us to embed Azure machine learning in a more user-friendly way.
Principal Consultant at a tech services company with 11-50 employees
Real User
2019-07-29T10:11:00Z
Jul 29, 2019
I think more integration with existing Azure platform services would be extremely beneficial. In the next release, it's important that some sort of scheduler for running tasks is added. A built-in scheduling mechanism for running the treasury will be a very helpful improvement.
Data Flow is in the early stages — currently public preview — and it is growing into a tool that will offer everything other ETL tools offer. There are a few features still to come. The thing we missed most was data update, but this is now available as of two weeks ago. A feature that is confirmed as coming soon is the ability to pass in a parameter and filter, etc.
Azure Data Factory efficiently manages and integrates data from various sources, enabling seamless movement and transformation across platforms. Its valuable features include seamless integration with Azure services, handling large data volumes, flexible transformation, user-friendly interface, extensive connectors, and scalability. Users have experienced improved team performance, workflow simplification, enhanced collaboration, streamlined processes, and boosted productivity.
I'm not confident in highlighting any potential room for improvement with Azure Data Factory at this time. To the best of my knowledge, it is satisfactory as it is.
When we initiated the cluster, it took some time to start the process. Most of our time was spent ensuring the cluster was adequately set up. We transitioned from using the auto integration runtime to a custom integration runtime, which showed some improvement.
While it has a range of connectors for various systems, such as ERP systems, the support for these connectors can be lacking. Take the SAP connector, for example. When issues arise, it can be challenging to determine whether the problem is on Microsoft's side or SAP's side. This often requires working with both teams individually, which can lead to coordination issues and delays. It would be beneficial if Azure Data Factory provided better support and troubleshooting resources for these connectors, ensuring a smoother resolution of such issues.
The main challenge with implementing Azure Data Factory is that it processes data in batches, not near real-time. To achieve near real-time processing, we need to schedule updates more frequently, which can be an issue. Its interface needs to be lighter. One specific issue is with parallel executions. When running parallel executions for multiple tables, I noticed a performance slowdown.
The product integration with advanced coding options could cater to users needing more customization.
When working with AWS, we have noticed that the difference between ADF and AWS is that AWS is more customer-focused. They're more responsive compared to any other company. ADF is not as good as AWS, but it should be. If AWS is ten out of ten, ADF is around eight out of ten. I think AWS is easier to understand from the GUI perspective compared to ADF.
Azure Data Factory could benefit from improvements in its monitoring capabilities to provide a more robust feature set. Enhancing the ease of deployment to higher environments within Azure DevOps would be beneficial, as the current process often requires extensive scripting and pipeline development. It is also known for the flexibility of the data flow feature, particularly in supporting more dynamic data-driven architectures. These enhancements would contribute to a more seamless and efficient workflow within GitLab.
The product's technical support has certain shortcomings, making it an area where improvements are required. Instead of sending out documents, I think the tool's support team should focus on how to troubleshoot issues. I want the tool's support team to have real-time interaction with users. The product's price can be problematic for small businesses, making it an area where improvements are required.
There is room for improvement primarily in its streaming capabilities. For structured streaming and machine learning model implementation within an ETL process, it lags behind tools like Informatica. Snowflake is also more efficient for loading data into Snowflake, whether from Blob storage or AWS. From our experience, ADF is mainly useful for batch processing. I'm not sure how its streaming capabilities compare to others for industry-wide use cases.
Implementing a standard pricing model at a more affordable rate could make it accessible to a larger number of companies. Currently, smaller businesses face a disadvantage in terms of pricing, and reducing costs could address this issue.
There aren't many third-party extensions or plugins available in the solution. Adjunction or addition of third-party extensions or plugins to Azure Data Factory can be a great improvement in the tool. Creation of custom codes, custom extensions, or third-party extensions, like Lookup extension, should be made possible in the tool. I am unsure if Azure Data Factory bridges the gap between on-premises, cloud, and hybrid solutions. I would like to see a version that would work equally well in both on-premises and cloud environments. I would like to see the aforementioned offerings made to customers as valuable alternatives to the old SSIS tool.
Sometimes when I run some jobs, I have issues with the log flow. I want to see where the data goes. I want to see the data stream. I'd like more integrations with other APIs. Sometimes I need to do some coding, and I'd like to avoid that. I'd like no-code integrations.
We require Azure Data Factory to be able to connect to Google Analytics.
Data Factory would be improved if it were a little more configuration-oriented and not so code-oriented and if it had more automated features.
Improvement could be made around streaming data because I feel that the Data Factory product is mainly geared for batch processing and doesn't yet have in-built streaming data processing. Perhaps it's on the way and they are making some changes to help facilitate that. If they were to include better monitoring that would be useful. I'd like to see improved notifications of what the actual errors are.
The pricing model should be more transparent and available online. When you start programming, you define the fields, variables, activities, and components but don't know the implication on price. You get a general idea but the more activities you add, the more you pay. It would be better to know price implications up front. There is a calculator you can run to simulate price but it doesn't help a lot. Practically speaking, you have to build your job and run it to see the exact price implications. This is an issue because you might realize you are paying too much so you have to reprogram or change things.
I have not found any real shortcomings within the product.
I would like to see this time travel feature in Snowflake added to Azure Data Factory. In addition to taking care of data internally, how it is instead of using indexing, they have some different mechanisms to quickly execute the query instead of normal indexes. Both of these would be great to see implemented in future upgrades.
The only challenge with Azure Data Factory is its exception-handling mechanism. When the record fails, it's tough to identify and log.
The solution can be improved by decreasing the warmup time which currently can take up to five minutes.
They introduced the concept of Flowlets, but it has bugs. Flowlets are a reusable component that allows you to create data-flows. We can configure a Flowlet as a reusable pipeline and plug it inside different data-flows, so we don't have to rewrite our code or visual transformation. If we make any changes in our data-flow, it reverts all our changes to the original state of the Flowlet. It does not retain changes, and we must reconfigure the Flowlets repeatedly. We had these issues three months ago so things might have changed. It works fine whenever we plug it in and configure it in our data-flow, but if we make minor changes to it, the Flowlet needs to be reconfigured again and loses the configuration.
Data Factory's performance during heavy data processing isn't great.
Data Factory's monitorability could be better. In the next release, Data Factory should include integrations with open-source tools like Air Flow.
This solution is currently only useful for basic data movement and file extractions, which we would like to see developed to handle more complex data transformations.
The list of issues and gaps in this tool is extensive. It includes: 1) Missing email/SMTP activity. 2) Mapping data flows requires significant lag time to spin up spark clusters. 3) Performance compared to SSIS. Expect copy activity to take ten times that of what SSIS takes for simple data flow between tables in the same database. 4) It's missing a debug of a single activity. 5) Oath2.0 adapters lack automated support for refresh tokens. 6) Copy activity errors provide no guidance as to which column is causing a failure. 7) There's no built-in pipeline exit activity when encountering an error. 8) AutoResolveIntegration runtime should never pick a region that you're not using (should be your default for your tenant). 9) Resolve IR queue time lag. For example a small table copy activity I just ran took 95 seconds queuing and 12 seconds to actually copy the data. 10 Copy activity upsert option is great if you think watching trees grow is an action movie. They need to fix the bugs, for example: 1) Debug sometimes stops picking up saved changes for a period of time, rendering this essential tool useless during that time. 2) Enable interactive authoring (a critical tool for development) often doesn't turn on when enabled without going into another part of the tool to enable it. And then you have to wait several minutes before it's enabled which is time your blocked from development until it's ready. And then it only activates for up to 120 minutes before you have to go through this all over again. I think Microsoft is trying to torture developers. 3) Exiting the inside of an activity that contains other activities always causes the screen to jump to the beginning of a pipeline requiring re-navigating to where you were at (greatly slowing development productivity). 4) AutoResolveIntegration runtime (using default settings) often picks remote regions to operate, which causes either an unnecessary slowdown or an error message saying it's unable to transfer the volume of data across regions. 5) Copy activity often gets error "mapping source is empty" for no apparent reason. If you play with the activity such as importing new metadata then it's happy again. This sort of thing makes you want to just change careers. Or tools.
Data Factory has so many features that it can be a little difficult or confusing to find some settings and configurations. I'm sure there's a way to make it a little easier to navigate. In the main ADF web portal could, there's a section for monitoring jobs that are currently running so you can see if recent jobs have failed. There's an app for working with Azure in general where you can look at some segs in your account. It would be nice if Azure had an app that lets you access the monitoring layer of Data Factory from your phone or a tablet, so you could do a quick check-in on the status of certain jobs. That could be useful.
Some stuff can be better, however, overall it's fine. The performance and stability are touch and go. The deployment should be easier. We’d like the management of the solution to run a little more smoothly.
The documentation could be improved. They require more detailed error reporting, data normalization tools, easier connectivity to other services, more data services, and greater compatibility with other commonly used schemas. I would like to see a better understanding of other common schemas, as well as a simplification of some of the more complex data normalization and standardization issues. It would be helpful to have visibility, or better debugging, and see parts of the process as they cycle through, to get a better sense of what is and isn't working. It's essentially just a black box. There is some monitoring that can be done, but when something goes wrong, even simple fixes are difficult to troubleshoot.
Microsoft is constantly upgrading its product. Changes can happen every week. Every time you open Data Factory you see something new and need to study what it is. I would like to be informed about the changes ahead of time, so we are aware of what's coming. In future releases, I would like to see Azure Data Factory simplify how the information of logs is presented. Currently, you need to do a lot of clicks and go through steps to find out what happened. It takes too much time. The log needs to be more user-friendly.
The performance could be better. It would be better if Azure Data Factory could handle a higher load. I have heard that it can get overloaded, and it can't handle it.
I was planning to switch to Synapse and was just looking into Synapse options. I wanted to plug things in and then put them into Power BI. Basically, I'm planning to shift some data, leveraging the skills I wanted to use Synapse for performance. I am not a frequent user, and I am not an Azure Data Factory engineer or data engineer. I work as an enterprise architect. Data Factory, in essence, becomes a component of my solution. I see the fitment and plan on using it. It could be Azure Data Factory or Data Lake, but I'm not sure what enhancements it would require. User-friendliness and user effectiveness are unquestionably important, and it may be a good option here to improve the user experience. However, I believe that more and more sophisticated monitoring would be beneficial.
It does not appear to be as rich as other ETL tools. It has very limited capabilities. It simply moves data around. It's not very good after that because it's taking the data to the next level and modeling it.
Interactive authoring turns off by default after 60 minutes of inactivity but it's needed for pretty much any debugging and previewing data. This gets super annoying after a while and slows productivity. Need to be able to override the 60 minutes of inactivity.
The dynamic content editor needs to mature in its syntax checking and assisting coding. For example, it is nice that it lists activity outputs but (and this is a big BUT) if you select an output it doesn't actually select the output variable value but rather the parent name. So it syntax checks but tries to feed the parent JSON name as the value. Seriously? Why? That single thing was the hardest issue to learn when I first started with the tool.
You have to look at activity JSON output and know what to add to the activity output in the dynamic content to get what you need. If you don't, it runs but you can go crazy trying to figure out why the logic isn't working. That's just an example where the syntax checking and validation of pipelines just don't check as much as you'd expect.
One area for improvement is documentation. At present, there isn't enough documentation on how to use Azure Data Factory in certain conditions. It would be good to have documentation on the various use cases. Sometimes, it's really difficult to find the answers to very technical questions regarding certain conditions.
Occasionally, there are problems within Microsoft itself that impact the Data Factory and cause it to fail.
Data Factory is embedded in the new Synapse Analytics. The problem is if you're using the core Data Factory, you can't call a notebook within Synapse. It's possible to call Databricks from Data Factory, but not the Spark notebook and I don't understand the reason for that restriction. To my mind, the solution needs to be more connectable to its own services. There is a list of features I'd like to see in the next release, most of them related to oversight and security. AWS has a lake builder, which basically enforces the whole oversight concept from the start of your pipeline but unfortunately Microsoft hasn't yet implemented a similar feature.
The only thing I wish it had was real-time replication when replicating data over, rather than just allowing you to drop all the data and replace it. It would be beneficial if you could replicate it. Real-time replication is required, and this is not a simple task.
We didn't have a very good experience. The first steps were very easy but it turned out that we used Europe for a Microsoft data center, also partly abroad for our alpha notes. As soon as we started using Azure Data Factory, the bills got higher and higher. At first we couldn't understand why, but it is very expensive to put data into a data center abroad. So instead, we decided to use only Northern Europe, which worked out for a while in the beginning. And then we had nothing to show for it. They gave me a really hard time for this. Azure Data Factory should be cheaper to move data to a data center abroad for calamities in case of disasters. What I really miss is the integration of Microsoft TED quality services and Microsoft Data services. If they were to combine those features in Data Factory, I think they would have a very strong proposition. They promise something like that on Microsoft Congress. That was years ago and it's still not here.
There is always room to improve. There should be good examples of use that, of course, customers aren't always willing to share. It is Catch-22. It would help the user base if everybody had really good examples of deployments that worked, but when you ask people to put out their good deployments, which also includes me, you usually got, "No, I'm not going to do that." They don't have enough good examples. Microsoft probably just needs to pay one of their partners to build 20 or 30 examples of functional Data Factories and then share them as a user base.
Snowflake connectivity was recently added and if the vendor provided some videos on how to create data then that would be helpful. I think that everything is there, but we need more tutorials.
It would be better if it had machine learning capabilities. For example, at the moment, we're working with Databricks and Azure Data Factory. But Databricks is very complex to do the different data flows. It could be great to have more functionalities to do that in Azure Data Factory.
I'm more of a general manager. I don't have any insights in terms of missing features or items of that nature. Integration of data lineage would be a nice feature in terms of DevOps integration. It would make implementation for a company much easier. I'm not sure if that's already available or not. However, that would be a great feature to add if it isn't already there.
My only problem is the seamless connectivity with various other databases, for example, SAP. Our transaction data there, all the maintenance data, is maintained in SAP. That seamless connectivity is not there. Basically, it could have some specific APIs that allow it to connect to the traditional ERP systems. That'll make it more powerful. With Oracle, it's pretty good at this already. However, when it comes to SAP, SAP has its native applications, which are the way it is written. It's very much AWS with SAP Cloud, so when it comes to Azure, it's difficult to fetch data from SAP. The initial setup is a bit complex. It's likely a company may need to enlist assistance. Technical support is lacking in terms of responsiveness.
The need to work more on developing out-of-the-box connectors for other products like Oracle, AWS, and others.
We are too early into the entire cycle for us to really comment on what problems we face. We're mostly using it for transformations, like ETL tasks. I think we are comfortable with the facts or the facts setting. But for other parts, it is too early to comment on. We are still in the development phase, testing it on a very small set of data, maybe then the neatest four or bigger set of data. Then, you might get some pain points once we put it in place and run it. That's when it will be more effective for me to answer that.
Understanding the pricing model for Data Factory is quite complex. It needs to be simplified, and easier to understand. We have experienced some issues with the integration. This is an area that needs improvement.
The number of standard adaptors could be extended further. What we find is that if we develop data integration solutions with Data Factory, there's still quite a bit of coding involved, whereas we'd like to move in a direction with less coding and more select-and-click.
I find that Azure Data Factory is still maturing, so there are issues. For example, there are many features missing that you can find in other products. You cannot use a custom data delimiter, which means that you have problems receiving data in certain formats. For example, there are problems dealing with data that is comma-delimited.
The pricing scheme is very complex and difficult to understand. Analyzing it upfront is impossible, so we just decided to start using it and figure out the costs on a weekly or monthly basis.
Azure Data Factory is a bit complicated compared to Informatica. There are a lot of connectors that are missing and there are a lot of instances where I need to create a server and install Integration Runtime. The support and the documentation can be improved. There are a lot of tasks that you need to write code for.
I'm not sure if I have any complaints about the solution at the moment. There are a few bits and pieces that we would like to see improved. These include improvements related to the solution's ease of use and some quality flash upgrades. However, these are minor complaints. If the user interface was more user friendly and there was better error feedback, it would be helpful.
The setup and configuration process could be simplified.
The user interface could use improvement. It's not a major issue but it's something that can be improved. It has the ability to create separate folders to organize objects, Data Factory objects. But any time that we created a folder we were not able to create objects. We had to drag and drop into the folder. There were no default options. It was manual work. We offered their team our feedback and they accepted my request.
The only thing that we're struggling with is increasing the competency of my team. So we think that the Microsoft documentation is too complicated. I would like to see it more connected. I know they're working on the Snowflake data warehouse connector, but more connectors would be helpful.
Because I have not really done a really deep benchmark against competitors, I may not be familiar enough with the potential of competing products and capabilities to be able able to say what is missing or should be improved definitively. From my perspective, the pricing seems like it could be more user-friendly. Of course, nothing is ever as inexpensive as you want. Perhaps one good additional feature would be incorporating more ways to import and export data. It would be nice to have the product fit our service orchestration platform better to make the transfer more fluid.
The speed and performance need to be improved. This solution should be able to connect with custom APIs.
At this point in time, they should work on somehow integrating the big data capabilities within it. I have not explored it, but it would be good if somehow we could call a Spark job or something to do with the Spark SQL within ADS so that we wouldn't need a Spark tested outside. On the UI side, they could make it a little more intuitive in terms of how to add the radius components. Somebody who has been working with tools like Informatica or DataStage gets very used to how the UI looks and feels. In ADS, adding a new table or joining a new table and overriding that with an override SQL that I could customize would be helpful. Being able to debug from the design mode itself would be helpful.
It would be helpful if they could adjust the data capture feature so that when there are source-side changes ADF could automatically figure it out. The solution needs to integrate more with other providers and should have a closer integration with Oracle BI.
The solution could use some merge statements. The solution should offer better integration with Azure machine learning. We should be able to embed the cognitive services from Microsoft, for example as a web API. It should allow us to embed Azure machine learning in a more user-friendly way.
I think more integration with existing Azure platform services would be extremely beneficial. In the next release, it's important that some sort of scheduler for running tasks is added. A built-in scheduling mechanism for running the treasury will be a very helpful improvement.
Data Flow is in the early stages — currently public preview — and it is growing into a tool that will offer everything other ETL tools offer. There are a few features still to come. The thing we missed most was data update, but this is now available as of two weeks ago. A feature that is confirmed as coming soon is the ability to pass in a parameter and filter, etc.