Azure Data Factory is an all-in-one solution for ETL in our company.
My company doesn't use the product for development purposes.
I use the solution in my company as an ETL tool and for orchestration.
Azure Data Factory is an all-in-one solution for ETL in our company.
My company doesn't use the product for development purposes.
I use the solution in my company as an ETL tool and for orchestration.
As a DevOps engineer, I feel that the CI/CD part and the tool's integration with GitHub are the product's best features. If you compare it with other tools, like Glue, AWS, and other solutions, I feel Azure Data Factory's deployment part is a lot easier to manage. The code promotions and the data pipeline promotions to higher environments are a lot easier with Azure Data Factory.
The product's technical support has certain shortcomings, making it an area where improvements are required. Instead of sending out documents, I think the tool's support team should focus on how to troubleshoot issues. I want the tool's support team to have real-time interaction with users.
The product's price can be problematic for small businesses, making it an area where improvements are required.
I have experience with Azure Data Factory. I am the end user of the tool. Azure Data Factory is a PaaS solution. I use the solution's latest version.
It is a stable solution since it is a PaaS product. Stability-wise, I rate the solution an eight out of ten.
The scalability of the product is impressive. Scalability-wise, I rate the solution an eight out of ten.
Most of the people in my company work on Azure, and those who want to use the native ETL capabilities provided by the product opt for Azure Data Factory.
The product is useful in medium to large-sized businesses. Smaller businesses can opt for other options other than Azure Data Factory, considering the amount of money they are ready to spend. There are better options available in the market than Azure Data Factory.
I rate the technical support a five to six out of ten.
Neutral
I rate the product's initial setup phase a seven or eight on a scale of one to ten, where one is difficult and ten is easy.
In my company, we take care of the product's deployment process and maintenance phase.
The solution is deployed using Azure's cloud services.
The solution can be deployed in ten to fifteen minutes.
For deployments, my company usually creates codes in Terraform so that we can have automated deployments, and it is connected to us with a CI/CD tool like Azure DevOps. Azure DevOps does the automated deployment for our company.
During the setup phase, users may face issues when it comes to infrastructure deployment and the configuration around it, especially if you consider the integration runtime, as it is something that is too complicated for a normal developer to understand. There is a need for a cloud expert with a good understanding to be able to take care of the deployment in the right manner and in a secure way. The networking setup and security part of the product are a bit complicated, which I might understand since I am a DevOps engineer, but a developer who is new to the product might not understand such parts of the tool. The deployment of the service in an infrastructure can be possible only if the person involved in the deployment has a basic level of understanding related to the product.
I rate the product price as six on a scale of one to ten, where one is low price and ten is high price.
I wanted to compare Azure Data Factory with Fivetran.
Users rely on Azure Data Factory's connectors to meet data integration and transformation needs. Users use connectors that are native to Azure Data Factory. The tool offers more than 90 connectors that can be used to ingest data from different sources.
The feature of the solution I find to be the most beneficial for data management tasks is its connectors, and it can even be used for hybrid scenarios. The tool can connect to a different cloud, like AWS. The product can connect to your on-premises systems. In general, users are able to ingest data from everywhere, and the best part is that all of the aforementioned areas can be managed through GUI. The tool is like a low code-no code solution.
The visual interface of the solution impacts workflow efficiency because I think it is easier to start with for any developer who wants to use the tool. It is easier to start with and also easier to troubleshoot or debug, especially at a time when you cannot expect all your developers to understand codes. It would be good to have an intuitive GUI. Azure Data Factory
does a pretty good job when you compare it with its competitors.
Most of the time, my company uses integration runtime, so we mostly use a self-hosted integration runtime. In short, my company has not seen my impact has not seen much impact on a project from the product's scalability capabilities on any projects because we use it according to the needs of our customers.
I rate the tool an eight out of ten.
The current use is for extracting data from Google Analytics into Azure SQL Database as a source for our EDW. Extracting from GA was problematic with SSIS.
The larger use case is to assess the viability of the tool for larger use in our organization as a replacement for SSIS for our EDW and also as an orchestration agent to replace SQL Agent for firing SSIS packages using Azure SSIS-IR.
The initial rollout was to solve the immediate problem while assessing its ability to be used for other purposes within the organization. And also establish the development and administration pipeline process.
ADF allowed us to extract Google Analytics data (via BigQuery) without purchasing an adapter.
It has also helped with establishing how our team can operate within Azure using both PaaS and IaaS resources and how those can interact. Rolling out a small data factory has forced us to understand more about all of Azure and how ADF needs to rely upon and interact with other Azure resources.
It provides a learning ground for use of DevOps Git along with managing ARM templates as well as driving the need to establish best practices for CI.
The most valuable aspect has been a large list of no-cost source and target adapters.
It is also providing a PaaS ELT solution that integrates with other Azure resources.
Its graphical UI is very good and is even now improving significantly with the latest preview feature of displaying inner activities within other activities such as forEach and If conditions.
Its built-in monitoring and ability to see each activity's JSON inputs/outputs provide an excellent audit trail.
The trigger scheduling options are decently robust.
The fact that it's continually evolving is hopeful that even if some feature is missing today, it may be soon resolved. For example, it lacked support for simple SQL activity until earlier this year, when that was resolved. They have now added a "debug until" option for all activities. The Copy Activity Upsert option did not perform well at all when I first started using the tool but now seems to have acceptable performance.
The tool is designed to be metadata driven for large numbers of patterned ETL processes, similar to what BIML is commonly used for in SSIS but much simpler to use than BIML. BIML now supports generating ADF code although with ADF's capabilities I'm not sure BIML still holds its same value as it did for SSIS.
The list of issues and gaps in this tool is extensive, although as time goes on, it gets shorter. It currently includes:
1) Missing email/SMTP activity
2) Mapping data flows requires significant lag time to spin up spark clusters
3) Performance compared to SSIS. Expect copy activity to take ten times that of what SSIS takes for simple data flow between tables in the same database
4) It is missing the debug of a single activity. The workaround is setting a breakpoint on the task and doing a "rerun from activity" or setting debug on activity and running up to that point
5) OAuth 2.0 adapters lack automated support for refresh tokens
6) Copy activity errors provide no guidance as to which column is causing a failure
7) There's no built-in pipeline exit activity when encountering an error
8) Auto Resolve Integration runtime should never pick a region that you're not using (should be your default for your tenant)
9) IR (integration runtime) queue time lag. For example, a small table copy activity I just ran took 95 seconds of queuing and 12 seconds to actually copy the data. Often the queuing time greatly exceeds the actual runtime
10) Activity dependencies are always AND (OR not supported). This is a significant missing capability that forces unnecessary complex workarounds just to handle OR situations when they could just enhance the dependency to support OR like SSIS does. Did I just ask when ADF will be as good as SSIS?
They need to fix bugs. For example:
1) The debug sometimes stops picking up saved changes for a period of time, rendering this essential tool useless during that time
2) Enable interactive authoring (a critical tool for development) often doesn't turn on when enabled without going into another part of the tool to enable it. Then, you have to wait several minutes before it's enabled which is time you're blocked from development until it's ready. And then it only activates for up to 120 minutes before you have to go through this all over again. I think Microsoft is trying to torture developers
3) Exiting the inside of an activity that contains other activities always causes the screen to jump to the beginning of a pipeline requiring re-navigating where you were at (greatly slowing development productivity)
4) Auto Resolve Integration runtime (using default settings) often picks remote regions (not necessarily even paired regions!) to operate, which causes either an unnecessary slowdown or an error message saying it's unable to transfer the volume of data across regions
5) Copy activity often gets the error "mapping source is empty" for no apparent reason. If you play with the activity such as importing new metadata then it's happy again. This sort of thing makes you want to just change careers. Or tools.
I have been using this product for six months.
Production operation seems to run reliably so far, however, the development environment seems very buggy where something works one day and not the next.
So far, the performance of this solution is abysmal compared to SSIS. Especially with small tasks such as copying activity from one table to another within the same database.
Customer support is non-existent. I logged multiple issues only to hear back from 1st level support weeks later asking questions and providing no help other than wasting my time. In one situation it was a bug where the debug function stopped working for a couple of days. By the time they got back to me, the problem went away.
Negative
We have been and still rely on SSIS for our ETL. ADF seems to do ELT well but I would not consider it for use in ETL at this time. Its mapping data flows are too slow (which is a large understatement) to be of practical use to us. Also, the ARM template situation is impractical for hundreds of pipelines like we would have if we converted all our SSIS packages into pipelines as a single ADF couldn't take on all our pipelines.
Initial setup is the largest caveat for this tool. Once you've organized your Azure environment and set up DevOps pipelines, the rest is a breeze. But this is NOT a trivial step if you're the first one to establish the use of ADF at your organization or within your subscription(s). Instead of learning just an ETL tool, you have to get familiar with and establish best practices for the entire Azure and DevOps technologies. That's a lot to take on just to get some data movements operational.
I did this in-house with the assistance of another team who uses DevOps with Azure for other purposes (non-ADF use).
The setup cost is only the time it takes to organize Azure resources so you can operate effectively and figure out how to manage different environments (dev/test/sit/UAT/prod, etc.). Also, how to enable multiple developers to work on a single data factory without losing changes or conflicting with other changes.
We operate only with SSIS today, and it works very well for us. However, looking toward the future, we will need to eventually find a PaaS solution that will have longer sustainability.
My clients use Data Factory to exchange information between the on-premises environment and the cloud. Data Factory moves the data, and we use other solutions like Databricks to transform and clean up the data. My teams typically consist of three or four data engineers.
I like how you can create your own pipeline in your space and reuse those creations. You can collaborate with other people who want to use your code.
DataStage is easier to learn than Data Factory because it's more visual. Data Factory has some drag-and-drop options, but it's not as intuitive as DataStage. It would be better if they added more drag-and-drop features. You can start using DataStage without knowing the code. You don't need to learn how the code works before using the solution.
I think the communication about the ADA's would be interesting to see in the platform. How to interact with those kind of information and use it on your pipelines.
I have used Data Factory for eight months.
I have never experienced downtime with Data Factory.
It isn't that expensive to scale Data Factory up. My client can ask for more resources on the tool, and paying more is never an issue.
I rate Azure support seven or eight out of 10. There is room for improvement. Sometimes, you don't know where the errors originate. You have to send a ticket to Azure, and they take two or three days to respond. The issue may resolve itself by then. The problem is fixed, but you don't know how to prevent it or what to do if it happens in the future.
The data transfer has stopped a few times for unknown reasons. We don't know if the resources are insufficient or if there is a problem with the platform. By the time we hear back from Microsoft, the issue has been resolved.
Positive
Data Factory is effortless to set up.
I rate Azure Data Factory nine out of 10. When implementing Data Factory, you should document where you are building so you can pass that information. Sometimes you build something for a specific purpose, but you can use that information for other solutions. If you have a community where you are building things, you can reuse them on the platform, so don't need to build everything from scratch.
We created data ingestion solutions. We have built interpreters, and we have data factories that pull data from our clients. They submit data via Excel spreadsheets, and we process them into a common homogeneous format.
It has helped with some automation. Instead of individual people reviewing these files, we were able to automate the ingestion process, which saved a bunch of time. It saved hours of repeated manual work.
The data mapping and the ability to systematically derive data are nice features. It worked really well for the solution we had. It is visual, and it did the transformation as we wanted.
I couldn't quite grasp it at first because it has a Microsoft footprint on it. Some of the nomenclature around sync and other things is based on how SSRS or SSIS works, which works fine if you know these products. I didn't know them. So, some of the language and some of the settings were obtuse for me to use. It could be a little difficult if you're coming from the Java or AWS platform, but if you are coming from a Microsoft background, it would be very familiar.
For some of the data, there were some issues with data mapping. Some of the error messages were a little bit foggy. There could be more of a quick start guide or some inline examples. The documentation could be better.
There were some latency and performance issues. The processing time took slightly longer than I was hoping for. I wasn't sure if that was a licensing issue or construction of how we did the product. It wasn't super clear to me why and how those occurred. There was think time between steps. I am not sure if they can reduce the latency there.
I have been using this solution for a year and a half.
It is very stable.
It is very scalable. It is a cloud product. It is being used by business analysts, business managers, and Azure cloud architects. We have just one developer/integrator for deployment and maintenance purposes.
We have plans to increase its usage. We'll be rolling it out for other clients.
Microsoft has these things well-documented. There were videos. I was able to find answers when I needed them. To the uninitiated, it was a little difficult, but we got there.
It was of medium complexity. Because it goes to the cloud, the duration was short. The deployment was minutes and hours.
We are a consultant and integrator. You can use our company for its implementation.
I would rate this solution a nine out of ten.
We use Azure Data Factory for data transformation, normalization, bulk uploads, data stores, and other ETL-related tasks.
Azure Data Factory allows us to create data analytic stores in a secure manner, run machine learning on our data, and easily adapt to changing schema.
The most valuable feature of Azure Data Factory is that it has a good combination of flexibility, fine-tuning, automation, and good monitoring.
The documentation could be improved. They require more detailed error reporting, data normalization tools, easier connectivity to other services, more data services, and greater compatibility with other commonly used schemas.
I would like to see a better understanding of other common schemas, as well as a simplification of some of the more complex data normalization and standardization issues.
It would be helpful to have visibility, or better debugging, and see parts of the process as they cycle through, to get a better sense of what is and isn't working.
It's essentially just a black box. There is some monitoring that can be done, but when something goes wrong, even simple fixes are difficult to troubleshoot.
I have been working with Azure Data Factory for a couple of years.
There is only one version.
Overall, I believe the stability has been good, but there have been a couple of occasions when Microsoft's resources needed to be allocated were overburdened, and we had to wait for unacceptable amounts of time to get our slot. It has now happened twice which is not ideal.
There is no limit to scalability.
We only have a few users. One is a data scientist, and the other is a data analyst.
We use it to push up various dashboards and reports, it's a transitional product for transferring, transforming, and transitioning data.
It is extensively used, and we intend to expand our use.
You don't really get that kind of support; it's more about documentation and the community support that is available. I would rate it a three out of five compared to others.
You could call them, and pay for their consulting hours directly, but for the most part, we try to figure it out or look through documentation.
I think their documentation is lagging because it's not as popular of a tool, there's just not a lot, or as much to fall back on.
Neutral
We had only our own tools, and we switched because you get to leverage all of the work done in a SaaS or platform as a service, or however they classify it. As a result, you get more functionality, faster, for less money.
The initial setup is straightforward.
It is a working tool. You can start using it within an hour and then make changes as needed.
We only need one person to maintain the solution; it doesn't take much to keep it running.
It's not a problem; it's a platform.
We completed the deployment ourselves.
We have seen a return on investment. I can't really share many details, but for us, this becomes something that we sell back to our clients.
You pay based on your workload. Depending on how much data you process through it, the cost could range from a few hundred dollars to tens of thousands of dollars.
Pricing is comparable, it's somewhere in the middle.
There are no additional fees to the standard licensing fee.
We looked at some other tools, such as Databricks, AmazonGlue, and MuleSoft.
We already had most of our infrastructure connected to Azure in some way. So the integration of where our data resided appeared to be simpler and safer.
I believe it would be beneficial if they could find someone experienced in some of the tools that are a part of this, such as Spark, not necessarily Data Factory specifically, but some of those other tools that will be very familiar and have a very quick time for productivity. If you're used to doing things in a different way, it may take some time because there isn't as much documentation and community support as there is for some more popular tools.
I would rate Azure Data Factory a seven out of ten.
As a management consultancy company, we help our clients deploy Azure Data Factory or any other cloud-based solution depending on data integration needs. Regarding how we use Azure Data Factory within our company, we are on the Microsoft Stack, so we use the solution primarily for data warehousing and integration.
The feature I found most helpful in Azure Data Factory is the pipeline feature, including being able to connect to different sources.
I also found running Python codes whenever you need to valuable in Azure Data Factory, especially for certain features of the solution, such as data integrations, aggregations, and manipulations.
Azure Data Factory also has built-in security, which is another valuable feature.
I also like that you get access to the whole Azure suite through Azure Data Factory, so the overall architecture design, defining security and access, role-based access management, etc. It's helpful to have the whole suite when designing applications.
Areas for improvement in Azure Data Factory include connectivity and integration.
When you use integration runtime, whenever there's a failure, the backup process in Azure Data Factory takes time, so this is another area for improvement.
Database support in the solution also has room for improvement because Azure Data Factory only currently supports MS SQL and Postgres. I want to see it supporting other databases.
If you want to connect the solution from on-premises to the cloud, you will have to go with a VPN or a pretty expensive route connection. A VPN connection might not work most of the time because you have to download a client and install it, so an interim solution for secure access from on-premise locations to the cloud is what I want to see in Azure Data Factory.
I've been using Azure Data Factory for about a year now.
Azure Data Factory is very stable, so it's a four out of five for me. In some instances, the solution failed, but I wouldn't wholly blame Azure Data Factory because my company connected to some on-premise databases in some cases. Sometimes, you'll get errors from self-hosted integration, faulty connections, or the on-premise server is down, so my rating for stability is a four.
Scalability-wise, Azure Data Factory is a four out of five because Microsoft is still developing certain tiers, which means you can't upgrade an older skill or tier. In contrast, the more modern, newer tiers could be upgraded easily. Rarely will you get stuck in one platform where you have completely destroyed that container and then fit a new container. Most of the time, Azure Data Factory is pretty easy to scale.
We haven't used Microsoft support directly because whenever we have issues with Azure Data Factory, we can find resolutions through their online documentation.
We're using both Azure Data Factory and SSIS.
We had several in-house solutions, but we moved to Azure Data Factory because it was straightforward. From a deployment standpoint, the solution comes with different services, so we didn't have to worry about separate hardware or infrastructure for networking, security, etc.
The initial setup for Azure Data Factory was easy, so I'd rate the setup a four out of five.
The implementation strategy was looking into what my organization needed overall, then planning and direct deployment. My company first did a test, a dummy version, then a UAT with stakeholders before going into production.
It took about two months to complete the deployment for Azure Data Factory.
An in-house team, the digital data engineering team, deployed Azure Data Factory.
We're still computing the ROI from Azure Data Factory. It's too early to comment on that.
My company is on a monthly subscription for Azure Data Factory, but it's more of a pay-as-you-go model where your monthly invoice depends on how many resources you use.
On a scale of one to five, pricing for Azure Data Factory is a four.
It's just the usage fees my company pays monthly. No support fees because my company didn't need support from Microsoft.
If you're not using core Microsoft products, the cost could be slightly higher, for example, when using a Postgres database versus an MS SQL database.
My company uses Azure Data Factory, SSIS, and for a few other instances, Salesforce.
My company currently has about fifty Azure Data Factory users, but not directly exposed to the solution compared to the developers; for example, members of corporate management and other teams apart from the development team are exposed to Azure Data Factory.
Shortly, there could be about two hundred users of Azure Data Factory within the company.
The developer team working directly on Azure Data Factory comprises ten individuals.
For the maintenance of the solution, my company has two to three staff, but it could reach up to eight or ten for the entire product. It's a mix of engineers and business analysts who handle Azure Data Factory maintenance.
I'd rate Azure Data Factory as eight out of ten.
My company is an end user of Azure.
We primarily use the solution in a data engineering context for bringing data from source to sink.
The solution is very comfortable to use. I'm happy with the user interface and dashboards. I'm pretty happy with everything about the solution.
We haven't had any issues connecting it to other products.
It's a stable product.
I have not found any real shortcomings within the product.
I've been using the solution for the past year.
The product has been very stable and reliable. I'd rate the stability nine out of ten. There are no bugs or glitches. It doesn't crash or freeze.
There is a team of 30 people working on the solution.
I've connected with technical support a few times.
They sent a support engineer or a field engineer to us, and he helped us out.
Positive
I'm not sure about the exact cost of the solution.
I'm a customer and end-user.
Our company chose to use this solution based on the fact that it is a Microsoft product. We're using a lot of solutions, including Outlook and Teams. We also use Power BI. We try to use Microsoft so that everything is under one umbrella. That way, there is no difficulty with connecting anything.
It's a good solution to use. There are lots of videos available on YouTube, and it is very easy to learn. It's very easy to perform things on it as well, which is one thing that a product like ThoughtSpot lacks. There is no training needed like Power BI.
I'd rate the solution nine out of ten.
We use Azure Data Factory to connect to clients' on-premise networks and data sources to bring the data into Azure. Additionally, Azure Data Factory orchestrates data movement and transformations. It can connect to a number of different cloud data sources to bring the information into something, such as a data lake or a formal SQL database. Azure Data Factory has the ability to handle large data workloads and can orchestrate them well.
The most valuable features of Azure Data Factory are the flexibility, ability to move data at scale, and the integrations with different Azure components.
Azure Data Factory can improve by having support in the drivers for change data capture.
I have been using Azure Data Factory for approximately three years.
Azure Data Factory is a very reliable and stable solution.
The solution is highly scalable.
The technical support is very good, they are responsive.
We previously use Attunity and we switch to Azure Data Factory mainly because of cost reasons and integration.
The biggest difference between Azure Data Factory and Attunity is Attunitys has the ability to perform change data capture. Whereas Azure Data Factory is more centered around batch or bulk loads.
The initial setup is of a moderate level of difficulty. However, it can be complex. The solution is able to fit both of our use cases.
We normally use one or two people to update and maintain Azure Data Factory.
There's no licensing for Azure Data Factory, they have a consumption payment model. How often you are running the service and how long that service takes to run. The price can be approximately $500 to $1,000 per month but depends on the scaling.
My advice to others that want to implement Azure Data Factory is to use a metadata approach.
I rate Azure Data Factory an eight out of ten.