Hello peers,
I am a BA at a medium-sized tech services company. I am currently researching ETL tools.
Which solution do you prefer: KNIME, Azure Synapse Analytics, or Azure Data Factory? Can you please provide a comparison between these three solutions?
Thank you for your help.
I know you're looking for someone who's done research for you but realize that's actually something people get paid to do.
That said, what you're asking about is a mix of quite different tools when you throw KNIME in the mix. I don't know that tool but sounds like its for specific purpose and it's not an Azure tool. Realize there's endless ETL tools out there. I've used about 1/2 dozen in my career. I currently use both ADF and SSIS. I only use ADF when I have to as it's overly complicated to do version management and deal with ARM templates and is very very slow in comparison to SSIS. ADF can however be a good orchestrator for running SSIS - there's an Azure/PaaS version of SSIS called SSIS-IR that can run from ADF. Synapse Analytics pipelines which is actually ADF technology but stripped down. And now there's Fabric Data Factory which is again ADF but even more stripped down. Fabric is also bleeding edge.
ADF has been around for long time now. Anything Azure is cloud based and integrates with Azure services. KNIME is not that. I advise first on understanding fundamental requirements such as, what are the skill levels of your staff with ETL? Are you an Azure shop? What kind of data volumes are you talking about? What sources do you need to connect to (that's a biggy because not all tools talk to all sources!) What are you trying to do - build a datamart or EDW or just copy some data from a source or ? Do you use PowerBI? These will help drive what kind of tool you're looking for. If you want SAAS like as possible tool due to minimal requirements, low data volumes and low staff expertise and starting from scratch, I'd give Fabric a try especially if you want low tech and already into the Power platform. Hope that helps
Just completing the friend's answer, yes, in the Azure Synapse workspace you can create ETL/ELT pipelines, which even facilitates the data engineer's work because in the same Synapse workspace, you have the data warehouse (Dedicated Pool), your pipelines and other miscellaneous workspace resources.
For additional information, Pipelines in Azure Synapse are very similar to Azure Data Factory. I believe that the ADF is still a little more robust, but it's a matter of time.
I believe Synapse is not an ETL tool. ADF is one optional ETL tool for a Synapse Data warehouse.. What Are the Top ETL Tools for Azure Data Warehouse? | Integrate.io
I'd like to step back and pose a bigger option. You see, ETL means making a copy of data you have already. Have you considered a data fabric or mesh, where the data is used where it lies now? Consider this if your data is already used by some systems, but you need to do a more comprehensive analysis of it.
I always want to reduce the replication of databases. The concept of build yet another database to "replace" all the others rarely works out that way. I'd rather beef up the origination system, or use a replica than build a huge portfolio of ETL programs and an army of ops, data governance, and system support to keep them in sync.
Finally, if you really need an ETL tool, i.e. copies of all that data... look for existing talent in your staff. Otherwise, expect to hire some people experienced with the new tool that can advise on design and development and mentor existing staff.
A couple of questions before starting the feature comparison: i. Are you fine with an open-source solution? ii. Any specific reason you have listed ADF? iii. Who will be using these tools and how much learning curve is involved within the team? iv. What kind of data you are dealing with? v. Is data privacy an important factor? vi. Are you looking for only a cloud-based solution or open to a hybrid solution also? vii. What is the maturity level of the team when it comes to working on the cloud ........ These are just a few of the many questions basis which we do self-assessment or measure our preparedness. Let me know if you need more insights. Happy to help!!
Hi @Rahul-Sahayif you do not have cap on investment the best solution is to go with 1. Azure Ad VM +Azure SQL as a managed service - Higher Cost - Best in Industry 2. Snowflake has fewer charges compared to Azure but 2nd best. solution. 3.ClouderaDHadoop distribution / Horton Networks- less cost and infra management is some higher overhead Open source 1. Delatalake with minio cloud 2. Hive, HDFS / Hadoop clusterwith open source scalability needs to be calculated based on usage compared to above top3