The current use is for extracting data from Google Analytics into Azure SQL Database as a source for our EDW. Extracting from GA was problematic with SSIS.
The larger use case is to assess the viability of the tool for larger use in our organization as a replacement for SSIS for our EDW and also as an orchestration agent to replace SQL Agent for firing SSIS packages using Azure SSIS-IR.
The initial rollout was to solve the immediate problem while assessing its ability to be used for other purposes within the organization. And also establish the development and administration pipeline process.
ADF allowed us to extract Google Analytics data (via BigQuery) without purchasing an adapter.
It has also helped with establishing how our team can operate within Azure using both PaaS and IaaS resources and how those can interact. Rolling out a small data factory has forced us to understand more about all of Azure and how ADF needs to rely upon and interact with other Azure resources.
It provides a learning ground for use of DevOps Git along with managing ARM templates as well as driving the need to establish best practices for CI.
The most valuable aspect has been a large list of no-cost source and target adapters.
It is also providing a PaaS ELT solution that integrates with other Azure resources.
Its graphical UI is very good and is even now improving significantly with the latest preview feature of displaying inner activities within other activities such as forEach and If conditions.
Its built-in monitoring and ability to see each activity's JSON inputs/outputs provide an excellent audit trail.
The trigger scheduling options are decently robust.
The fact that it's continually evolving is hopeful that even if some feature is missing today, it may be soon resolved. For example, it lacked support for simple SQL activity until earlier this year, when that was resolved. They have now added a "debug until" option for all activities. The Copy Activity Upsert option did not perform well at all when I first started using the tool but now seems to have acceptable performance.
The tool is designed to be metadata driven for large numbers of patterned ETL processes, similar to what BIML is commonly used for in SSIS but much simpler to use than BIML. BIML now supports generating ADF code although with ADF's capabilities I'm not sure BIML still holds its same value as it did for SSIS.
The list of issues and gaps in this tool is extensive, although as time goes on, it gets shorter. It currently includes:
1) Missing email/SMTP activity
2) Mapping data flows requires significant lag time to spin up spark clusters
3) Performance compared to SSIS. Expect copy activity to take ten times that of what SSIS takes for simple data flow between tables in the same database
4) It is missing the debug of a single activity. The workaround is setting a breakpoint on the task and doing a "rerun from activity" or setting debug on activity and running up to that point
5) OAuth 2.0 adapters lack automated support for refresh tokens
6) Copy activity errors provide no guidance as to which column is causing a failure
7) There's no built-in pipeline exit activity when encountering an error
8) Auto Resolve Integration runtime should never pick a region that you're not using (should be your default for your tenant)
9) IR (integration runtime) queue time lag. For example, a small table copy activity I just ran took 95 seconds of queuing and 12 seconds to actually copy the data. Often the queuing time greatly exceeds the actual runtime
10) Activity dependencies are always AND (OR not supported). This is a significant missing capability that forces unnecessary complex workarounds just to handle OR situations when they could just enhance the dependency to support OR like SSIS does. Did I just ask when ADF will be as good as SSIS?
They need to fix bugs. For example:
1) The debug sometimes stops picking up saved changes for a period of time, rendering this essential tool useless during that time
2) Enable interactive authoring (a critical tool for development) often doesn't turn on when enabled without going into another part of the tool to enable it. Then, you have to wait several minutes before it's enabled which is time you're blocked from development until it's ready. And then it only activates for up to 120 minutes before you have to go through this all over again. I think Microsoft is trying to torture developers
3) Exiting the inside of an activity that contains other activities always causes the screen to jump to the beginning of a pipeline requiring re-navigating where you were at (greatly slowing development productivity)
4) Auto Resolve Integration runtime (using default settings) often picks remote regions (not necessarily even paired regions!) to operate, which causes either an unnecessary slowdown or an error message saying it's unable to transfer the volume of data across regions
5) Copy activity often gets the error "mapping source is empty" for no apparent reason. If you play with the activity such as importing new metadata then it's happy again. This sort of thing makes you want to just change careers. Or tools.
I have been using this product for six months.
Production operation seems to run reliably so far, however, the development environment seems very buggy where something works one day and not the next.
So far, the performance of this solution is abysmal compared to SSIS. Especially with small tasks such as copying activity from one table to another within the same database.
Customer support is non-existent. I logged multiple issues only to hear back from 1st level support weeks later asking questions and providing no help other than wasting my time. In one situation it was a bug where the debug function stopped working for a couple of days. By the time they got back to me, the problem went away.
We have been and still rely on SSIS for our ETL. ADF seems to do ELT well but I would not consider it for use in ETL at this time. Its mapping data flows are too slow (which is a large understatement) to be of practical use to us. Also, the ARM template situation is impractical for hundreds of pipelines like we would have if we converted all our SSIS packages into pipelines as a single ADF couldn't take on all our pipelines.
Initial setup is the largest caveat for this tool. Once you've organized your Azure environment and set up DevOps pipelines, the rest is a breeze. But this is NOT a trivial step if you're the first one to establish the use of ADF at your organization or within your subscription(s). Instead of learning just an ETL tool, you have to get familiar with and establish best practices for the entire Azure and DevOps technologies. That's a lot to take on just to get some data movements operational.
I did this in-house with the assistance of another team who uses DevOps with Azure for other purposes (non-ADF use).
The setup cost is only the time it takes to organize Azure resources so you can operate effectively and figure out how to manage different environments (dev/test/sit/UAT/prod, etc.). Also, how to enable multiple developers to work on a single data factory without losing changes or conflicting with other changes.
We operate only with SSIS today, and it works very well for us. However, looking toward the future, we will need to eventually find a PaaS solution that will have longer sustainability.