A lot of things that Foglight does could be derived from DMVs and extended events. I'm going to sound like a salesperson, but I have to be a salesperson to sell the value of a product to my company. They need to understand that to answer some of the questions they ask, not having a tool like this will make my answers very speculative.
As a DBA, you have to be able to answer three questions. The first is: What's happening right now? Why is the system slow or why are things not responding? That is probably the most trivial for an experienced DBA. That is where the tool's value might not be as obvious, as you can look at the sp_who or DMV and pretty much tell what's going on without having to pay money for a license for a product like Foglight.
The second question is: What just happened? There could be just a couple of seconds difference between the first and second questions. But the effort to answer the second question is significantly higher because it is water under the bridge. You need some kind of monitoring solution implemented, even if it's just a basic solution where you capture a certain timeframe, so you can roll back and review what just happened. However, there will still be a significant amount of speculation because, usually, you can't afford to monitor every single metric, and there are hundreds of them. The issue could be the OS, it could be infrastructure-related, or it could be that the SQL code is not performing well because it's not written well. So the second question is significantly harder to answer, and that's where a tool like this will become very helpful.
The third question is: What has been going on? That is by far the hardest question to answer without this type of tool. This is the type of question a manager might ask for the purposes of resource planning. Or a senior VP might say, "Hey, how are we doing? Can we bring on another customer? Can we sustain a 20 percent increase in workload?" I don't know how I would answer that question without having this type of solution. I work in the industry quite a bit and there is, unfortunately, a lot of misunderstanding due to a lack of a comprehensive view of infrastructure.
There's no way to answer that question without getting some kind of baseline tool. Unfortunately, in most of the shops I work for, only one question is usually being answered relatively accurately, and that is "What is going on?" And it's a luxury because by the time a customer escalates an issue and it gets interpreted by support people, there's a gap. That gap could be a couple of minutes, a couple of hours, or a couple of days.
Ultimately, when you negotiate how many licenses you need, you always find the most problematic instances. You also have to also evaluate the culture and maturity of an organization. Unfortunately, there is often a lot of legacy code to maintain. It's not always easy to identify those things quickly.
In that context, Foglight has been pretty spectacular in terms of the number of times I have been able to answer questions that nobody could answer before. I used the tool and showed my team how you use the tool to answer a lot of those questions, and some of those questions were pretty complex. We'll have deadlocks, we'll have locking conflicts, we will have blocking, and we'll have unexpected CPU spikes. Obviously, there is some complexity involved with the architecture and that is not always clear, but the tool is phenomenally helpful in enabling us to change and repair things.
We have also been able to predict a problem. A lot of times you can see a particular process starting to misbehave. Visually, you can see the spike, and it is something that could potentially lead to a bigger problem because the process will not scale. It gives you an opportunity to address things before they become real problems.
When it comes to displaying intensive database queries, Foglight is the best tool. Spotlight does not do that very well and Foglight is fantastic. It enables what they call a multidimensional analysis. You have a visual presentation of query resource utilization and you can slice it by the type of resource. You can also slice it by the number of executions.
For example, a few times I've seen a server running very hot, the CPU would be 80-plus percent, and people are starting to freak out. But in reality, the box is very healthy. It has no locks or blocking. Rather, it's utilizing the CPU because that's what you want it to do. You always need to juxtapose multiple metrics simultaneously, and Foglight is really good for that. It has a dashboard where you can look at multiple parameters and components at the same time. If I see the CPU goes up and I also see the number of connections goes up and the number of batches per second goes up, to me it just means that SQL Server is working hard because we are processing fast and we are able to have more work done in a particular time frame.
A lot of times, when you do have problems, you actually see the CPU go down. People say, "Well, what's the problem?" The problem is that you have some internal blocking or locking, or some kind of resource contention, and the CPU cannot process as many batches per second.
When it comes to identifying the least performant queries, or queries that are performant but that are just very hyper with a lot of calls to them, that is where the tool really shines. It allows you to identify those things quickly.