What is our primary use case?
We have used the solution for a real estate agency in Singapore to stream their data on a more or less real-time basis to a mobile app. Their backend system is primarily built on SAP. They used a combination of Kafka message querying and Talend to stream the data into a NoSQL environment, which is MongoDB. The solution basically streams the data from SAP backend systems, all the way to a NoSQL MongoDB database, to support a real-time or near real-time data flow into a mobile app.
This mobile app will be a consumer-facing app. The end consumers will be able to look at the status of their real estate space. Users are mostly the tenants who either rent or lease commercial or residential spaces. The status of their leased property will be made available via a mobile app. The data solution comprised of technology such as Kafka for message queuing, Talend for data flow streaming, and MongoDB for storing the data. That's one use case.
The second problem solved using Talend is to build the data lake for internal data analytics at a government organization, again based in Singapore. Their backend systems are primarily supported by Sybase databases. From the Sybase databases, selective data is brought into a staging area via Qlik Attunity, which is a real-time data synchronization tool that uses the change data capture mechanism. Once the data comes to the staging environment through Attunity, then Talend takes over. Talend pulls that data and moves it to Cloudera's big data solution, the big data platform. Then, Tableau is used for data analytics and data visualization. This solution is set up for internal data analytics within the government organization.
How has it helped my organization?
The tool offered essential connectors to connect to various data sources and our big data platform. Our data sources were SAP/ MS SQL and Sybase. The Hadoop/ hive connectors are used to connect to the Cloudera-based big data platform. The components available to loop and dynamically pass source/target connections helped us to stream data based on source tables' metadata. This helps us to onboard new data sources quickly, rather than building a custom data pipeline for each new data source.
What is most valuable?
The main differentiator that I have seen between Talend and other data integration tools is the ability to view the data pipeline in the form of a program/ code. The other tools hide the backend code behind the scene, however, Talend shows the backend code that gets generated from the UI (user interface) components used for creating a data pipeline. I.e., In most of the data integration tools, all we do is, leverage certain UI components meant for data processing such as reading, selection, filtering, sorting, writing et cetera to build data pipelines. The tools such as Pentaho or Informatica, will convert the pipeline into a backend code, which is written probably in C++ or java program. This is not transparent to the developer or the programmer. But in the case of Talend, the tool converts the pipeline into a programmatic format and shows it to the user. This helps in debugging errors.
The json parsers can flatten the API response body into a structured output. This is useful when we try to use Googleapis for YouTube video analysis and to check corporate status uding ACRA/IRAS apis from Singapore governments.
If somebody has a passion and interest in software programming, they will like this feature. Of course, we can do without it, however, this additional feature available in Talend helps the data engineer to understand the code better, understand how to better tune for performance, and so on. The differentiation is the ability to view the data pipeline in the form of a traditional software program.
What needs improvement?
Comparing the plus and minus of other tools and will try and suggest what can be improved in Talend based on how other tools implemented certain features.
Both Talend and Pentaho leverage open-source plugins. There are lots of open-source committees that build solutions as per their requirement; Say, for example, someone may have had a need to create a Tableau data extract from a certain database for the purpose of data visualization in Tableau. They may have created a plugin for this purpose and shared it with the market. We can always go into the market and search for a plugin that someone else created (for ex., to generate TDE files) when we have similar needs. Both Talend and Pentaho offer the 3rd party plugins in their market. This plugin market feature automatically extends/improves the tool capability. If somebody needs a particular feature, they can always build one and publish it for the other developers to use.
Other than this, maybe if Talend had a few components, it's easier to deploy solutions; job scheduling is one. Currently, the job scheduler is a separate component within Nexus and is accessible over browser. It can be integrated within TOS (Talend Open Studio) so that we dont need to switch between the two. Nexus holds the repository of compiled program code. However, Nexus doesn't have the ability to keep the repository of the source code programs. That can be improved.
Currently, Talend has a versioning feature, meaning the previous version can be named as 0.1 or 1.0 and the subsequent minor/major versions can be increased with incremental version numbers, however, it is not that great as a version control repository (such as VSS, CVS, Git, etc). If a Nexus-like tool can be used for maintaining versions of the underlying source code (not just for the compiled code), that would be great. Basically, version control can be improved.
For how long have I used the solution?
I've been using the solution since 2019.
What do I think about the stability of the solution?
The stability of the solution is mainly based on the server architecture behind the scenes. We have seen some instability issues on other tools around the Talend platform. For ex., Qlik Attunity had been throwing some exceptions every now and then, however, in the case of Talend, we haven't seen many. At times the server would run out of memory. Based on the capacity we have, we may have to go and clean up a few logs and other systems’ data before resuming the processes manually.
The system's stability is okay, especially in the case of our data warehouse. Users do understand that a data warehouse is less critical when compared to operational systems. The users' reports are normally generated at the end of the day for their availability, the next day. We have a workable level of stability, which is good enough. We don't see any instability issues.
What do I think about the scalability of the solution?
Talend operates on a clustered job server setup, available in both Linux & windows. The clusters can be scaled up with additional nodes as per demand. In the case of an on-premise setup, we assess the load and define clusters based on the load. We did not have auto-scaling.
How are customer service and support?
Talend has a support portal and we can get access to the support portal. The initial process could be a little difficult, to get into the support. The initial access provisioning could be improved. It took about a week's time to get a support login created. Once that is done, then you will get quality answers to the support questions.
There are two ways to get technical support, either from the dedicated support team from Talend, or from the open-source community. The dedicated team provides an answer for sure, however, we may not get answers to all our questions from the open-source forums.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
We are aware of the other solutions such as Pentaho. But, there was no switching. For most of the customers, I do the evaluation of tools, over one to two months and then finalize on one particular tool. They don't really move around from one ETL tool to another ETL tool, unless the tool itself is getting obsolete or if it is no longer supported. In the case of Talend, from what I have seen, there is no switching. They must have initially evaluated a few tools and finalized with one. The main reason could be that Talend supports a big data environment. Talend has the ability to integrate with market solutions. Third-party integration is possible. Talend is also becoming popular in Southeast Asia, so skilled resources are readily available in the market if we need people to maintain the environment. Those could be some of the factors that influence the selection of Talend.
How was the initial setup?
Initial setup requires a systems engineer, who has knowledge in Linux/ Windows OS, depending on the server type, to work with Talend support. It takes about 2-3 weeks for the basic setup. For additional plugins/ components, guidance from Talend support team is required.
After the initial setup, most of the ETL data pipelines take about a week at the minimum to build. Typically, we reap the benefits after a month or two from a particular data stream. Of course, it is based on the different use cases and depends on the number of data pipelines involved in solving that particular use case, however, smaller scoped items can be delivered in one to three months' time.
After development and testing, the time required for the deployment varies, depending on the process we adopt and depending on the version control tools that we have as well. Certain tools will easily integrate into the Talend environment. In those cases, I mean, it would be a medium-complexity deployment. We can't say it's very simple, we can't say it's too complex either. There is a medium level of difficulty involved in deployment.
The time it takes depends on the processes a particular organization follows in terms of change management and deployment. Some will require at least one week of review with the change management board. In certain organizations, the approval processes may take up to three weeks to get something deployed. If the process is a little simpler, if the developer has a fair level of control, it could be just one week.
What about the implementation team?
A vendor team was in charge of the server maintenance and their scope of service included deployment as well, as developers do not have access to production networks in established firms. Developers normally provide a step by step instructions in a word document that the vendor team will follow. Some familiarity with the tool is enough, but the vendor personnel we worked with had the reasonable experience to an extent where they recommended code improvements.
What was our ROI?
Typically before embarking on a project, we analyze the return on investment, which could be over two to three years. It could be as quick as three months, depending on the scope of the data integration project, however, in the case of a data warehousing solution, it could be two to three years.
The real estate company we worked with, must have already seen the return on investment. Their mobile apps should be live with the data integrated at the backend.
For the internal data analytics use case, they would've also seen the benefits, as the data is already available in the data lake. The users will be able to do the analysis with the integrated data pool in the data lake.
Both the companies have seen the returns.
What's my experience with pricing, setup cost, and licensing?
Pricing is competitive in comparison with Pentaho. Both are at par. Solutions such as Informatica could be more expensive. Certain packaged solutions that use Talend as ETL may be available, if you are looking for a pre-built data warehouse model.
Which other solutions did I evaluate?
Yes, Apache NiFi was evaluated. Based on the POC (proof of concept) results, we chose Talend.
What other advice do I have?
Some of my comments are based on my limited understanding of the tool. Please do due diligence before selecting the tool. I recommend a POC over 1-2 months on shortlisted tools, before finalizing one.
Which deployment model are you using for this solution?
On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.