We are using ITRS Geneos for availability and performance monitoring. With respect to availability, there have been no observations of any business functionalities being impacted. All capacity parameters are okay. We are also using the solution for the performance part, where all the latency, the number of messages, the rates, and the number of external client logins are monitored. Lastly, we use the solution to monitor all the hardware capacity parameters such as CPU, disk, and memory usage for all applications.
We use the application logs, as well as OEM logs. We monitor the processing rates, the number of messages processed, and the number of external login clients. We also monitor whether any exception handling or code has been created. Because our application does not involve a database, we are not monitoring database activity. We monitor inside the applications, to see whether the server is on primary mode or has gone to secondary mode, whether the failure has happened or not, as well as the configuration part that all the application configs are intact, there are no changes in the application configs, that we monitor. We also monitor the applications to make sure the connectivity across the various modules and various servers are there, which are interconnected either through PCP or our protocol. We are also monitoring the downstream systems and upstream systems.
Our previous solution alerted us through an SMS or email. The alert would go to one person whose job is to monitor and if they were busy with another activity we would be delayed in responding to the situation. With ITRS Geneos, we can see everything on the dashboard. We can see the relationship between the two email alerts. We can also see the whole picture of where the issue is, allowing us to take action quickly. We can detect the actual connections between the two alerts. Suppose my alert comes from one server, and another alert comes from the client, we know there is a potential issue between the two servers and client connectivity. An email or SMS alert will never give us the full picture. So, at that time, I had to go into the system and focus on what the issue actually was. That was quite difficult. But now, with the new system, it's more proactive. We have started to monitor the system more closely because many of the alerts that we didn't have before are now enabled. For example, we have thresholds that warn us when we need to reduce space.
We first started to use ITRS Geneos in our department in 2014. After that, the solution was implemented in other departments, and the downstream systems inherited the monitoring. Trading is just one part of the organization where monitoring is now taking place; other systems, such as risk surveillance and collateral management, have also been upgraded. As an organization, we need to have a holistic view. This means that if I am a HOD, I need to have a holistic view of not only trading but also downstream systems. Previously, the senior manager did not have any view of what the issue was, where it was, and what the impact was. This is because there are many downstream systems that can be impacted. Now, with the holistic view in place, the senior manager is able to see where the issue is and where the impact is.
The solution provides lightweight data collection. We have recently had a business dashboard as well. Along with order messages and client logins, we are also monitoring capacity reports, view logs, and business reports. We have captured all of this information and put it in our database. We can see a historical view of user logins, latency, and whether or not it has improved.
With the help of the solution, we can predict and prevent failures. Many times, one incident can lead to another event. For example, if we have three modules and there is a buildup in module C, and there are observations and issues, ITRS Geneos can help monitor the buildup. If one of our people is trying to resolve the issues in module C, the monitoring person can also monitor the back pressure going to module A, then I will start checking the impact for modules A and C. Since we have the complete end-to-end connectivity part, with ITRS Geneos, I can make sure that the issue on the one server or the one module can be prevented on the other modules as well. We have prevented multiple P1, and P2 issues using ITRS Geneos.
We monitor the number of incidents proactively every month. This number is based on the number of incidents that we receive from ITRS Geneos. Roughly, 20 to 30 percent of all incidents are proactively detected and avoided using the solution. Whenever any incident comes up, the first question we ask is why wasn't it caught in the monitoring. In the last eight years, we have actually progressed so much with the solution, we configured multiple rules and multiple samplers, using ITRS Geneos because of each and every incident we learn from.