The most valuable aspects of this product are the integration pieces. It is pretty much limitless, and I also enjoy the reporting aspects as I use this product to assist in RCA’s.
The most valuable feature of the product is how they allow you to do the scheduling compared to the other companies. I switched from PagerDuty to VictorOps because the time that it would take for me to create a full-on schedule for a single item would be anywhere from 45 minutes to an hour and a half in PagerDuty. That same schedule takes me no more than ten minutes in VictorOps.
That’s because with VictorOps, you’re in a single-pane window to create that calendar or that schedule, compared to PagerDuty where, once you set up a single session, you have to go back and set up another session, and that's your escalation policy. The escalation policies in VictorOps are: you have a drop-down menu where you add a task in that same window where you see all of your people and the times you're going to roll over the on-call; you're in the same screen the entire time. That's not the same as it is in PagerDuty, where you have to go to three separate screens to do the same function; it just takes so much longer.
Also, the transmogrification feature was really awesome. We use that quite heavily, so we can make sure the messages coming in are getting properly formatted. We're able to add whatever little customization that we want to for that type of message, so that the drops can accept it, and then give us valuable feedback based upon that.
Then, another great feature, because we were part of their beta system, is being able to do the calls and start a call within the timeline. That was awesome. Once you start a conference bridge, it would show the users and you notified the users within VictorOps, which was awesome. You could see in the timeline, for RCA purposes, who joined the call, when they joined the call, and if they typed out anything into the timeline concerning that incident that took place. We use it, I currently use it, for RCA tracking purposes. Any incident that takes place, we tag an incident number to it, and then we tell everybody that's associated with that incident to make sure that they put all of their findings related to that incident number and then we print out a report at the end as our RCA.
We are using this product to streamline RCA processes. We make all our engineers enter in notes to a specific monitor we have designated as the issue and then do cross-team collaboration to add notes to that problem. Later we can run reports on the entire timeline of the event so we aren’t having to do it twice. I can upload the logs to management for them to review what took place and who were involved in troubleshooting.
Obviously Net TGA and Net TGR are huge in the DevOps community, so one thing that we like to do is see what our NTGA and NTGR are per route, and try to figure out where we have gaps and why we have those types of gaps, whether we have an employee that simply just doesn't answer his on-call, which goes against your NTPA, and then ultimately against your NTPR. Then the reports, obviously for the RCA, our upper management expects to see those media files when issues arise.