We pull in information from cloud resources like AWS and Azure, and we just recently got into GCP. Just pulling data directly from there was a little bit easier than trying to do it from on-prem. We can now do that a little easily.
We have a lot of cases where business units that were not even in Splunk got compromised for whatever reason. We could get security logs from those and import them directly, more quickly, and easily with Splunk Cloud. We have had several use cases directly with that. In our company, we do not monitor logs from laptops. We have had issues with users getting compromised on our laptops. We could get the data logs from there.
I also use it to monitor my universal forwarders so that I can see what versions they are on. We had CVEs coming out on the universal forwarders. We had to replace them. I have dashboards to keep track of our progress as we are migrating and upgrading all those agents.
The biggest, heaviest use of Splunk Cloud Platform for us right now is people going and looking at our firewall logs to find the denies and to find out which firewall is being blocked. We are a medium-sized company. We are so segmented with all the PCI and SOC 2 compliance audits that we have. We have segmented everything. We have so many firewalls that there is always another firewall down the line that is blocking. The firewall team is in there every day and all day long, and then we have other teams that go in there to see if the issue that they are having with their app is a firewall issue or not.
I have done health checks several times now, and those have been very valuable in getting more information about what is going on in my platform. There are also recommendations on what is going on in my environment. Sometimes when it says something, I already know that, and when I explain why, it knows that I am aware of it. It knows that it has to be that way for compliance reasons or there are certain break glass accounts that we have to have in case our Okta is offline. It points out things like that.
One of the things we had to do was find out how much Splunk on-prem was costing us because we had so many different groups. We had the storage group, and then we had the hardware team. The indexers and the search heads were physicals. That was being handled by the data center teams, which bought all the hardware, and then we had the virtual servers. Everything else was virtual. That was still owned by us, which is fine, but then we had storage, so we did not know the full cost. As I am trying to migrate from one data center to another, the teams do not want to buy. They do not want to migrate hardware. They want to buy new hardware, which, of course, is a cost to their department. They are a group but not our group, so we wanted to go to Splunk Cloud. We had to first find out how much the total cost of Splunk was for our company so that we could show that moving to Splunk Cloud was going to save the company money, which it did. It saved at least a million dollars a year. We are oversized in some areas, and we are running pretty close in the other areas. It is saving us money in the long term.
We monitor multiple cloud environments. We have data in multiple clouds. We have AWS, Azure, and GCP, as well as our own on-premise that is technically a cloud or our own personal private cloud. We are a cloud customer for our clients. We are in four different environments. It has been fairly simple to monitor multiple cloud environments using Splunk Cloud Platform. The documentation and the TAs have been updated and tell you which piece is what. You see no difference between a client ID, tenant ID, a secret, a key, and the tokens. That has been very handy. We had an incident where there was an S3 bucket somewhere, and one of our teams was unable to communicate with the Cloud Infrastructure team. It was set up as a file share only instead of another type, which was not available in the TA. That was not an option, so that became a challenge. We had to work with them, and they basically had to rebuild that bucket because you cannot just add it as a function to that bucket. They made a whole new bucket and put the logs in there. That was a challenge, but other than that, it has been very smooth and easy. We have had teams that had incidents. They took all the data and put it into an S3 bucket, and it took that right in.
Splunk Cloud Platform has helped reduce our mean time to resolve because they can get the data in faster. I have even automated things. We have a Python script. I can take CSV files and send them to the endpoint and just pop them with all the data they need to do their evaluations, such as if they went to bad sites. They can see all that information. I can get that in quickly. With on-prem, I could do that, but it had to run through so many hoops because of the PCI requirements that our company has. It is still PCI-compliant, but it is just so much easier to work with. I know we have had mean times of 60 days. We are reducing it to one or two weeks now, so it is getting a lot better.
Splunk Cloud Platform has helped improve our organization’s business resilience. That was something with which I have had issues with the on-prem. I have had issues with an index. It could be a hardware issue, a software issue, or an OS issue. By having Splunk Cloud Platform, everything has been a lot more stable. I do not have as many worries or problems there. I have fewer things. I can even troubleshoot on my side if it is a heavy forwarder. That is on me, but there are a whole lot fewer things to look at and worry about. It took away a lot of headaches.
In terms of Splunk’s ability to predict, identify, and solve problems in real-time, real-time is a touchy word because being real-time means you are indexing directly. There are a few people in my company who have or are allowed real-time access, but it is pretty close. It is pretty much within seconds. You have access to all that data, so it has been handy. I had to explain to the teams how searches work in the background. If you are running a search every 5 minutes, it sounds great, but if there is any kind of delay in the data, you can miss something, so 15 minutes is a little better, but still, you are seeing things within minutes and getting alert about them. We connect to Microsoft Teams and Slack. We are sending things to ServiceNow for the monitoring team. It is 24/7, so if they need something to watch 24/7, there is a group. They are now tied into ServiceNow, so they can get all that data right there in one place for that team, pulling it from different monitoring tools besides Splunk. It is handy to be able to just pop it all in there quickly.
The firewall stuff is huge. Everybody is in there. All day long, people are hitting that dashboard searching for firewall blocks or denies. Sometimes, they access it just to see if it is connecting because we do drop a lot of data. A great thing about Splunk is that we can drop some of the data if we need to when it is ingesting. We do not keep all the connects, but we can see whenever a connection is closed. We can see that the connection had been made successfully and then closed. We are able to see that one way or the other. We can see whether things are being blocked or it is able to connect. That information is handy now. We have a complex network, and there are times when we have routing issues. We can see that there is no route in the logs and say that it is a routing issue. They then bring the network team. The firewall is the front point for all that, but the network team has to work closely.