We use ITSI in the health industry. In the UK, the NHS currently uses ITSI as one of its monitoring sources of information. In ITSI, service components are based around each area of the NHS. For any solutions that have been digitally transformed and require monitoring related to our vaccination campaigns, the logs are ingested through Splunk and monitored through ITSI.
We realized ITSI's benefits immediately after it was deployed. When the COVID pandemic broke out, it kicked off a lot of crazy stuff within the UK. Having a powerful tool to aggregate data and allow real-time monitoring helped our campaign.
ITSI can help us right-size resources, but it depends on how you do things. We have a culture, and Splunk told us not to do this because they have different methods and stuff. In ITSM, you skim what you need at the source and then push that into Splunk. Having that as the centralized logging analytic is great for that, especially when so many things are tied to ingestion, storage, etc. However, for what we do, it leaves much to be desired. You're talking about an enterprise solution on the scale of the NHS with multiple people, contractors, and all these moving parts. Some services do it well where they only send in what you need. Some services just dump everything. You've got a load of load of logs. We can right size appropriately, but it's just yeah. For us, it's it's not really done now as well, I think.
ITSI has helped us streamline our incident management. We have a 24/7 service team working around the clock, responding to alerts that Splunk produces. It's linked to ServiceNow, our service management tool. When the team inputs all the information from Splunk into these tickets, they're raised in ServiceNow. Previously, we used software called Cherwell that looked horrendous. This helps bring the package together.
We've reduced our alerts, but it requires a conscious effort to configure them. That depends on how you use the platform. It goes back to getting the right metrics out of the logs that you're producing. The tool itself is powerful, but if you don't use it properly, things can be a bit noisy, and this is quite noisy, whereas that's down to our configuration sometimes.
Reducing alert noise also takes some tweaking. You've got KPIs and correlation searches that are great for real-time monitoring, but if you set them up immediately, you will get a lot of noise anyway. It depends on how you configure it. They have a couple of tools in the forwarders to say you're only ingesting alert logs or error logs, so you pick up on whatever those error logs would trigger.
It would help to give you accuracy in your ITSI alert noise. However, it might get a bit noisy if you've got more than that and they're not configured into the perfect use case you need. Overall, it's been a conscious effort to ensure we've got our stuff configured right.
It has reduced our mean detection time. For Microsoft/CloudStrike stuff, we can have an SLA as short as three minutes. The feeds are coming in quickly, so our detection time is between three and 10 minutes. For major outages, an SLA of a few minutes is good, especially when it's not a cyber-level threat.
The resolution time is determined by how quickly we can pass the detection along to the IT team and triage the logs to determine the issue. We've had quite quick resolutions because everything's partitioned in a way where it is specifically service-bound. You can look through the data and specific areas. You can optimize these things. The search system in Splunk is powerful and helps speed up resolutions.
ITSI helps to automate routine tasks. That's what the safe searches are for. It's a complete package with Splunk Cloud and ITSI for deeper drill-downs, but not everyone can access the ITSI dashboard all day. Automation helps us get these alert structures, especially at night. When you've got a file that's meant to come in at 3 a.m., you don't need someone waiting around to look at that.
This is what those alerts and automation are for. You can put custom wrappers around stuff. It's a custom output. However, Splunk is trying to make something more standardized at the moment. It saves our IT services multiple hours a week because you don't have to do tasks or sit and look through dashboards to ensure everything is all right. These constant checks every five minutes add up over the week, so that equals tens of hours a week for a lot of different services.