What is our primary use case?
We are using ITRS Geneos for availability and performance monitoring. With respect to availability, there have been no observations of any business functionalities being impacted. All capacity parameters are okay. We are also using the solution for the performance part, where all the latency, the number of messages, the rates, and the number of external client logins are monitored. Lastly, we use the solution to monitor all the hardware capacity parameters such as CPU, disk, and memory usage for all applications.
We use the application logs, as well as OEM logs. We monitor the processing rates, the number of messages processed, and the number of external login clients. We also monitor whether any exception handling or code has been created. Because our application does not involve a database, we are not monitoring database activity. We monitor inside the applications, to see whether the server is on primary mode or has gone to secondary mode, whether the failure has happened or not, as well as the configuration part that all the application configs are intact, there are no changes in the application configs, that we monitor. We also monitor the applications to make sure the connectivity across the various modules and various servers are there, which are interconnected either through PCP or our protocol. We are also monitoring the downstream systems and upstream systems.
How has it helped my organization?
Our previous solution alerted us through an SMS or email. The alert would go to one person whose job is to monitor and if they were busy with another activity we would be delayed in responding to the situation. With ITRS Geneos, we can see everything on the dashboard. We can see the relationship between the two email alerts. We can also see the whole picture of where the issue is, allowing us to take action quickly. We can detect the actual connections between the two alerts. Suppose my alert comes from one server, and another alert comes from the client, we know there is a potential issue between the two servers and client connectivity. An email or SMS alert will never give us the full picture. So, at that time, I had to go into the system and focus on what the issue actually was. That was quite difficult. But now, with the new system, it's more proactive. We have started to monitor the system more closely because many of the alerts that we didn't have before are now enabled. For example, we have thresholds that warn us when we need to reduce space.
We first started to use ITRS Geneos in our department in 2014. After that, the solution was implemented in other departments, and the downstream systems inherited the monitoring. Trading is just one part of the organization where monitoring is now taking place; other systems, such as risk surveillance and collateral management, have also been upgraded. As an organization, we need to have a holistic view. This means that if I am a HOD, I need to have a holistic view of not only trading but also downstream systems. Previously, the senior manager did not have any view of what the issue was, where it was, and what the impact was. This is because there are many downstream systems that can be impacted. Now, with the holistic view in place, the senior manager is able to see where the issue is and where the impact is.
The solution provides lightweight data collection. We have recently had a business dashboard as well. Along with order messages and client logins, we are also monitoring capacity reports, view logs, and business reports. We have captured all of this information and put it in our database. We can see a historical view of user logins, latency, and whether or not it has improved.
With the help of the solution, we can predict and prevent failures. Many times, one incident can lead to another event. For example, if we have three modules and there is a buildup in module C, and there are observations and issues, ITRS Geneos can help monitor the buildup. If one of our people is trying to resolve the issues in module C, the monitoring person can also monitor the back pressure going to module A, then I will start checking the impact for modules A and C. Since we have the complete end-to-end connectivity part, with ITRS Geneos, I can make sure that the issue on the one server or the one module can be prevented on the other modules as well. We have prevented multiple P1, and P2 issues using ITRS Geneos.
We monitor the number of incidents proactively every month. This number is based on the number of incidents that we receive from ITRS Geneos. Roughly, 20 to 30 percent of all incidents are proactively detected and avoided using the solution. Whenever any incident comes up, the first question we ask is why wasn't it caught in the monitoring. In the last eight years, we have actually progressed so much with the solution, we configured multiple rules and multiple samplers, using ITRS Geneos because of each and every incident we learn from.
What is most valuable?
One of the most valuable features of ITRS Geneos is the active time feature that helps with the trading applications that I support. The HP OpenView solution we previously used worked on a 24-hour seven days week cycle but the market hours are 9:00 AM to 3:30 PM.
ITRS Geneos has good GUIs that provide us with a single view of all the various applications. We have around 500 servers, and monitoring 500 servers on individual devices is a bit challenging. Putting the various applications in a single GUI is very difficult. In ITRS Geneos, all the GUI and user-friendly dashboards are there. All the information is available at a glance, we have five segments in case any one segment, or any one problem occurs, we can highlight it, and we follow the steps. We then access the active console which is a step-by-step detailed view to resolve the issue.
What needs improvement?
Currently, the most valuable thing for an individual is a mobile device. Since that is where people are currently tracking everything, we have multiple applications or apps that are for various products. I would like ITRS Geneos to develop an app, where instead of going to specific login terminals or logging into laptops or desktops to check alerts, we can have visibility in the app itself. Using the ITRS Geneos app, we could see the error message during our travels or wherever we are.
I would like to see the capacity of messages for forecasting increased. Since the NSE is the number one derivative stock exchange in the work for three consecutive years, the number of messages is important. We use the capacity planner in ITRS to forecast our data needs for the next two months. The planner is important because the volume of data we produce is becoming more and more volatile compared to when we first started using ITRS Geneos in 2014.
For how long have I used the solution?
I have been using the solution for eight years.
What do I think about the stability of the solution?
ITRS Geneos is a stable product and we have not had any downtime in the past eight years. The stability portion is good, but we do need to configure new applications correctly for the solution to stay stable.
What do I think about the scalability of the solution?
The solution is scalable. We are the number one derivative stock exchange, and our number of servers has increased significantly in recent years. In 2014, the number of servers was 50, but now it is 500, ten times as many. Our gateway servers were also limited to around five, but now there are 15. This increase in scalability has allowed our maintenance team to deploy more gateway servers and improve our monitoring gateways. We are also looking into redundancy and unused data to further improve performance.
We currently have 120 people using the solution in our department and around 400 from all of the departments, including vendors.
How are customer service and support?
The technical support for ITRS is currently very good. The person who developed or implemented the ITRS dashboard previously provided technical support for us, so he has a good understanding of the system. Additionally, many of the deadlines for ITRS are set by top management. ITRS is seen as a high priority, so there is good response time from the ITRS-managed service currently.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
I work in a trading production support role. Eight years ago, we made the switch from using HP OpenView for monitoring purposes to using ITRS Geneos. This was because there was no GUI available with HP OpenView.
We have never considered any lower-cost or open-source alternatives because the team is comfortable with the current system. The L1 monitoring team is the team that checks the dashboards and provides alerts. They are so familiar with the system that we have never thought of getting away from it.
How was the initial setup?
There were a few team members involved with the deployment. I was there with respect to providing any details to ITRS Geneos or the signup portion of the solution, the testing of the ITRS, such as whether alerts are showing properly or not. At that time, I was involved in all these areas of deployment because we have various segments in our module. We started with only a specific segment. Initially, we focused on the GUI design, how it should look and what rules should be configured, and where it should come from. All those nitty-gritty details of importance are required for the ITRS dashboard. I was not responsible for the creation of the samplers, the installation of Netprobes, or the entering of rules. There was a separate team responsible for those tasks.
Developers faced a challenge when there was no database, but they soon found ways to overcome this obstacle. They had to monitor the application, logs, and other aspects of the system. After four to six months, they started to figure out how to monitor specific aspects of the system if there were multiple items that needed to be monitored. We need to consider three things: the first is whether ITRS will have a file or tooling mechanism; the second is whether there will be any impact on production; and the third is whether ITRS will have a built-in script. We have considered these three aspects of ITRS in order to go deeper into the application so Netprobes can act as the agents on the production machine to eliminate any impact. After all the items were gathered, the most important step was putting everything into one active console. This is where I accessed the files and did all of the reading. The fifth month was when we had all the GUIs created and the sixth month was when we started the testing part, making sure all the alerts are configured and displaying correctly. We did all the testing at the end of the day because it was in the production environment. There was some hard work that was put in, and the initial two months were difficult because we didn't know how to do it, or what needed to be planned. We completed deployment for all five segments we support within one year. In the first six months, we completed deployment for only one segment. Subsequently, for the next four months, we deployed with another two segments, and subsequently, in the last two months the entire deployment was complete.
What about the implementation team?
The implementation was completed in-house.
What other advice do I have?
I give the solution a nine out of ten.
We had only two people doing the setup, the manager and one other person. I was mostly a tester, testing the applications and making sure everything was in order for the solution. There were four of us in total during the deployment process. There are currently five to six people on a separate tooling team that handles ITRS Geneos. I am no longer part of the ITRS team.
Previously, ITRS Geneos was only implemented in trading operations from 2014 to 2015. In 2016, we started to use it in other departments as well. By 2017, all the departments were using the ITRS Geneos. There are five to six departments where the solution is used. In fact, we have created an exchange one view as well as a dashboard where we can look at six to seven departments as a single view. In case any specific department has any issues, the warning critical alert will come from that view.
The solution requires maintenance. There is a separate tooling team that takes care of ITRS Geneos. There are five people who are looking into all the maintenance, patching, and all areas of the gateway servers for the solution. Additionally, there are two more activities. The first activity is with respect to the uptime availability part related to gateway servers. The second activity is with respect to releases. If there are any application changes, whether the change has been done in the solution or not, they will also be taken care of. We are also evaluating new dashboards and features, such as our use of ICT and forecasting. There is a separate team who looks into all of this maintenance and development. Plus there is an onsite team in Manila.
For any mission-critical projects, I recommend ITRS Geneos because time is crucial. Everything needs to be resolved within five minutes, and the SLA is strict. To resolve incidents within a five-minute window, we need to monitor and escalate within 30 seconds. The team should focus on monitoring and recovery within the first 30 seconds.
Which deployment model are you using for this solution?
On-premises
*Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.