We use every component of the Server Automation tool except for physical provisioning. We use it for compliance on servers and remediation. We do application installation, patching, server builds. We have an external SaaS tool that we actually use to build the framework of the server and then TrueSight Server Automation is used to push down the post-build steps: Which ancillary applications are needed, what's needed for operational support. We do qualifications of our servers through TrueSight Server Automation. We do configuration management, data collection, inventory reporting. Pretty much everything.
Take patching as an example. Prior to using TrueSight Server Automation, we used SMS for patching. It was very manually intense. Every month we had one week where we had seven implementers who were on call the whole time executing these jobs. There was no scheduling. Then they were trying to triage on the fly, fix these issues. It was very expensive and it was demotivating for the employees - knowing they have to do this every month. They were scheduling their vacations around it. It was rough. It's not the position you want to be in.
Once we got TrueSight Server Automation in, and we were able to take a step back in our process and re-analyze how we do it, we noticed that it provides these capabilities so that we could go into a more automated process. Now the data is all driven from the CMDB, which is owner-controlled data, not IT-controlled data. So the owners get to tell us when we're going to do this effort, and if they want to make a change, they change it in the source, and will then we reflect that into all of our automation processes without any manual intervention.
Now, right before patch week starts, I have an automated job that schedules all of the jobs for patching. We've created a set of triage scripts that we've handed down to operators - not even operations staff, but literally operators - who manage all of our patching process now. They're the ones that do the analysis of what the issues are. They follow their triage scripts. They find issues. They know what to do to execute. If there are outliers, there are on-call people they would call, which doesn't happen too often. We've been able to take this very heavy manual process and turn it into a fully automated process which we've been able to hand down to lower-tier staff who are going to be on call anyway. They're already there. Now our staff can schedule their vacations and they can have a life outside of IT.
We also took it one step further and we created a portal site so that when a user logs in they're presented with any of the servers they own or support, again based on CMDB data. We give them the ability to enable/disable patching. They can initiate reboots on their servers. We've also taken that from just patching to being able to control the patching process without user intervention. So if the Exchange group says, "Oh, we're doing this big maintenance procedure this weekend. We can't patch our servers," they can go this site, disable patching for a whole block of servers, give their justification, and it just happens. The only user involved is the owner who made the initial request.
And with this solution, it has helped to reduce IT ops costs. It's tough to estimate by how much. The tool has been in place in our company for around nine years. There was very heavy adoption at first. Millions of dollars were saved with some of the processes. What's really hard to guesstimate is that, where we came in, there were 1,000 servers. We had no automation tool. We couldn't do compliance. To be able to see if we were meeting our standards, we did it when an auditor requested, and we were on-demand doing these tasks. We always found problems. Then we were trying to fix them at the last minute so that we could present audit with something clean.
To be able to create a compliance job that's going to identify and fix this content ahead of time has reduced a whole lot of man-hours. We've really looked more at our time savings than our cost savings. At the end of the day, if we're saving time on having operations staff doing some repeatable event, we can reallocate them to do something else. I don't really see the cost savings, I see the time savings. And then we can have them working on things that are more towards the level that they should be working at, building more content.
In the operations staff, in the first year alone, we probably saved 6,000 hours. We were then able to increase that. It's at a pretty set level now. We're very mature in the product, so it's now just utilizing the content we have. Now we just get efficiencies, not having to manually login to a server and install software. We still get some time savings, but we don't really build metrics around those anymore.