It used to be Talend Open Studio, and I've got a paid version that's fully supported by Talend. Now, it's called Talend Cloud. Talend Open Studio parts are exactly the same as the open-source version. But with the cloud platform, when you build a job or an API, you publish it onto the cloud platform, and that manages all the API functionality really well.
The best part is that if something goes wrong with an error or an issue, the logging is phenomenal. It tells you exactly what's wrong. That, coupled with Talend Open Studio, makes it really easy to resolve any issues.
Talend Open Studio itself is fantastic. When you build a job or an API, it tells you what's wrong, what needs to be fixed, and how to fix it. Plus, Talend does the support documentation, and it's really straightforward to follow.
There's plenty of videos online on how to build jobs to transform data, build APIs, and link to other databases. Talend can handle quite major transformations and ETLs. So it's really simple and easy to follow, with all the documentation being intuitive.
We use probably about 10% of Talend's functionality. But the primary use is keeping multiple databases (near 200 different databases) in sync. Talend integrates that information with REST APIs. We're moving a lot of our integration into the Talend platform, taking it from old XML-based integration to REST APIs.
We use Crunchy Data's Crunchy Bridge solution for a lot of our databases in the back end, and Talend, coupled with Postgres, creates a very simple platform for us to build integrations really fast. It also means that we're using intelligence there on change events that occur in the database to keep multiple repositories in sync.
We're moving towards more of a single source of truth platform rather than multiple sources of truth that we've been dealing with for the last fifteen or so years.
Data transformation has proven to be most effective for data management. That's amazing. We have a lot of external parties that gather information for us. We're a global government council, so we're transforming XML, JSON, comma-separated files, emails—every format that you can deliver data to us. GeoJSON, geometry, and spatial data—we're all feeding it through the Talend platform, and we're bringing it all into a single format from Postgres.
It's just so simple to be able to take those foreign formats and transform them into the standard format that we have in-house. Likewise, we can take data from our database and transform it back into those other formats so that external parties can use their existing platforms without involving a human to do any data set in between. It's just made a huge improvement in how fast we can get data out and also consume data for the scientists that we have internally.
The error handling capability in Talend is really what sold us on the product. You can get everything from a full Java stack trace right down to putting a job through and stepping through it one step at a time.
Each component within that job can be set to different levels, from warnings to the actual error itself. It's incredible how easy it is to find a problem and resolve it. Nine times out of ten, it prompts you on what you should be doing. So, if you've done something wrong, maybe you haven't got the schema quite right; it'll actually tell you that you have an error in the schema before you publish the job.
It's subtle, so when it actually even finds errors, it won't go live to production or development until you fix those errors. And if an email does go through, or if there's an ETL with data, it tells you exactly what's wrong. And, generally, most of the time, it gives you some indication of what you need to do to fix it.
Data Governance Maturity
We're on the low end of maturity when it comes to data governance. But one of the beauties of Talend is that we now have one place to go to define business rules around data quality and data ownership. We're moving down the path of a more domain-oriented architecture, giving business owners more control of their own data by structuring our architecture around our business services.
Empowering Analysts
Talend Cloud is making it easy for us to have both developers and data analysts. Data analysts are able to manipulate the data through the cloud solution without having to write jobs and APIs. We publish those APIs and jobs, and they can even manipulate things through pipelines within Talend Cloud.
This means you don't need a highly trained developer. You can just have an analyst who knows how to use the toolset but doesn't need to know the details or mechanics of how everything works.
Streamlined Data Management
This makes it easier for us to align our services with our data. Validation and other processes are pre-built into the jobs themselves, which means that data quality, data control, and data ownership are a lot clearer and in line with our business services.
This streamlines things and makes them a lot easier. However, our maturity level around data governance is still low. We didn't really have any until about two years ago. It was a bit of a wild west when it came to data.