Both are techniques aimed at reducing the number of active alerts an operator receives from the monitoring tool.
I don't fully agree with the previous descriptions of correlation and aggregation, welcome though they are.
Let's take a typical scenario. Assume a network interface on a large switch fails to result in many systems experiencing a failure. In the 'raw' state, i.e. with no correlation or aggregation, the monitoring system would receive potentially thousands of events - possibly multiple SNMP traps from other network devices or servers, event logs records from Windows servers, Syslog entries from Linux, errors from the database management system, errors from web servers relying on that database and probably lots of incidents raised by users on the help desk. Good correlation algorithms will be able to distinguish between "cause" alarms and "symptom" alarms. In this scenario, the "cause" is the failing network switch port and the symptoms are the database failures and log file entries. Simplistically, fixing the cause will also address the symptoms.
Typically, aggregation is used to "combine" events into a single alarm. Again there are multiple methods to do this. A simple one would be - as previously described - duplicate reduction. In a poorly configured monitoring environment every check that breaches threshold results in an alarm. If monitoring is granular, say every 30 seconds the CPU utilization is measured and an alarm raised if it exceeds 80% then very quickly the operator would be overwhelmed by many meaningless alarms - especially if the CPU is doing some work where high CPU usage is expected. In this case, handling 'duplicates' is helpful when helping operators identify real issues. In this case, it may be enough to update the original alarm with the duration of the threshold breach.
There are many techniques for aggregation and correlation beyond identifying cause and symptoms events or ignoring duplicates. For instance, Time based event handling. Consider a scenario where an event is only considered relevant if another event hasn't happened in a given timeframe before or after the focus event. Or a scenario where avent aggregation occurs based on reset thresholds rather than alarm thresholds.
There are also some solutions that purport to intelligently correlate events using AI. Although, speaking personally, this seems more marketing speak than a one-click feature. In reality, these advanced (i.e. $$$$$$) solutions need to maintain a dynamic infrastructure topology in near-real-time and map events to service components in order to assess root cause correlation. In the days of rapidly flexing and shrinking infrastructures, cloud services, and containerization, it is extremely difficult to maintain an accurate, near-real-time view of an entire IT infrastructure from users through to lines of application code. A degree of machine learning has helped, but the cost-benefit simply isn't there yet for these topology-based event correlation features.
Search for a product comparison in Event Monitoring
APM Observability Consultant | BMC TrueSight SME at World Opus Technologies
User
2021-05-29T03:28:56Z
May 29, 2021
Yes, both of them are needed. Since their concepts have been well discussed here, I will just give a few examples of event processing rules I developed in BMC TrueSight for BMC customers in the last 17 years.
'Aggregation' requests are usually about combining multiple occurrences of one single event type into one event for the purpose to minimize the number of redundant incident tickets.
One type of aggregation is to combine multiple events occurred on different instances within a certain time period into one event. I call this type horizontal aggregation. One aggregation rule I developed combined all ping failure events within the last 10 minutes into one event if more than 10 servers failed on ping test. Another aggregation rule I developed combined all Remedy server process down events on a single Remedy server within last 10 minutes into one event.
Another type of aggregation is vertical aggregation along the time line. Take an example of CPU utilization on a single server. If CPU utilization exceeds a threshold at 10:00am, an event will occur and thus a ticket is created. If CPU utilization continues exceeding the threshold, BMC TrueSight won't generate more events. But if CPU utilization falls below the threshold at 10:15 am, BMC TrueSight will close the previous event. What if CPU utilization exceeds a threshold again at 10:30 am? Another event will occur and another ticket will be created. If this pattern goes on for 20 times in a day, we will get 20 tickets. This is sometimes called event flapping. The aggregation rule I developed combined all occurrences of the same type of events into one event based on user-defined criteria - either by a fixed time period (e.g. All high CPU events happened within an 8-hour fixed window go to one event/ticket) or by idle time (e.g. All re-occurrences happened after 3 hours of normal CPU utilization go to a new event/ticket).
'Correlation' requests are usually about grouping different event types (and often occurred on different servers) together for the purpose to identify root cause - though it can sometimes reduce the number of redundant incident tickets as well. Correlation may even add one higher-level ticket that links to all related lower-level tickets especially if these lower-level tickets are assigned to different support groups. One example is to correlate an event from synthetic transaction failure, an event from app server log monitoring, and an event from Oracle alert log monitoring occurred within the last 10 minutes. One challenge in event correlation is to correlate the related events only thus having an accurate infrastructure topology is critical. As discussed here previously, purely relying on discovery tool to keep a real-time topology is difficult and expensive. In BMC TrueSight, I sometimes had to develop an add-on data collection (custom PATROL KM) to extract the component relationship from configuration files on the server and execute this custom PATROL KM at the same schedule as BMC out-of-box PATROL KM.
Aggregation and correlation are necessary in enterprise SIEM in order to realize positive ROI.
They are not same. For evet monitoring (log management) aggregation is enough but if you need correlation then SIEM required. Aggregation means log parsing and correlation means developing rules to detect attacks
Aggregation is taking several events and turning them into one single event, while Correlation enables you to find relationships between seemingly unrelated events in data from multiple sources and to understand which events are most relevant.
SIEM event correlation is an essential part of any SIEM solution. It aggregates and analyzes log data from across your network
applications, systems, and devices, making it possible to discover
security threats and malicious patterns of behaviors that otherwise go
unnoticed and can lead to compromise or data loss.
Aggregation and correlation: Agreeing on the right responses below.
Aggregation takes place during the flow of the real-time events to reduce duplicate events generated from the same source. Aggregation of the event can be adjusted in a few of the SIEM solutions to reduce logging, EPS, Storage, CPU, etc. (Solution Architecture or the platform Engineer has to decide the aggregation setting depending on what is to be achieved out of the environment).
Ex: Reducing same/similar sync events to saves in SIEM from a security device.
Correlationis the process of connecting/relating two different event properties from the Same or different log sources, Those events may or may not hold the same parsed fields. But correlation can only occur once the events are aggregated >> parsed >> Mapped to respective fields so that SIEM rules can check for required fields to trigger a correlated offense/alert.
Ex: Detecting and triggering security threat alerts from different security appliance (Firewall, IDS/IS, WAF, EDR, HIPS, AV ETC)
Suppression: let's not get confused with suppression of alerts as aggregation, As Suppression is used to reduce the same offenses generated multiple times and this takes place after Aggregation >> parsing >> Mapping >> Correlation >> Offense triggered >> Suppression.
Ex : Device not reporting from last 1 hour this can be suppressed as security team works to resolve the event till the devices back in action
Yes. You need aggregation to show sustained activity over time which can indicate an attack, attempt to breach, or exfiltration. You need correlation to show things that happen contemporaneous which is especially useful if they should not or normally do not.
IT Executive Leader / Innovator at a tech consulting company with 11-50 employees
Consultant
2021-05-21T15:54:25Z
May 21, 2021
Not sure anything else could be added that Mr. Collier already stated. The aggregation of any events is to collect and combine events to develop a pool of raw data which could be analyzed later. To correlate events on any given situation is to look for similarities or disparities between those events. I do not see any applications or platforms on the market (yet) that can provide a solid foundation of correlating events. Given a very small sample of variables and LOTS of data, correlations can be surmised -- but still need a very manual process of validation.
Program Manager - Enterprise Command Center at a financial services firm with 10,001+ employees
Real User
2020-06-09T19:06:52Z
Jun 9, 2020
Agree on all the answers posted here, and I especially like Dave's explanation on the more advanced solutions available on the market. Excellent call outs on the need for deep & well maintained relationship mapping to enable an AI's algorithm to connect-the-dots between aggregated alerts firing from multiple separate source tools. Having a mature ITSM implementations with CI-discovery, automated dependency-mapping, and full integration between your correlation engine & CMDB will help too.
Works at a healthcare company with 5,001-10,000 employees
Real User
2020-06-09T14:00:49Z
Jun 9, 2020
Other answers are pretty much sum this up but there is one important point to make. In some technology it's important to take into account the number of events that got are aggregated and for your sim device to be able to treat them as individual events for the purpose of correlation.
Information Security Business Development Officer at a tech services company with 501-1,000 employees
User
2020-06-09T13:33:18Z
Jun 9, 2020
Aggregation is a combining of the same events into one. So, we can reduce the number of events.
Correlation is an identification of patterns or interdependencies between various events.
Both processes are needed for effective SIEM. The first of them can be replaced by highly scalable computing architecture. And without the second one, we will not have the main analytical purpose of SIEM.
DBA at a computer software company with 51-200 employees
User
2020-06-09T13:27:50Z
Jun 9, 2020
As previously mentioned, Correlation is the comparing of the same type of events. In my experience, alerts are created to notify when a series of these occurs and reaches as the prescribed threshold.
Aggregation, based on my experience, is the means of clumping/combining objects of similar nature together and providing a record of the "collection"; of deriving group and subgroup data by analysis of a set of individual data entries. Alerts for this are usually created for prognostication and forecasting. Often the "grouping" is not detailed information so there is a requirement for digging into the substantiating data to determine how this data was summarized.
Alerts/Alarms can be set for both, but usually only for the former and not the latter.
Information security officer at a financial services firm with 1-10 employees
Real User
Top 20
2020-06-10T16:08:14Z
Jun 10, 2020
You can not process and generate advanced correlated alerts without aggregation: limiting your correlation to one set of source will let your SIEM blind and unaware
of global context.
So yes, to get an 'EFFECTIVE' event monitoring with the goal to correlate them, you need to aggregate many different sources.
Deputy business unit manager Security Solutions at a tech services company with 201-500 employees
User
2020-06-10T14:41:40Z
Jun 10, 2020
"Aggregation is a mechanism that allows two or more events that are raised to be merged into one for more efficient processing" from www.ibm.com "Event correlation takes data from either application logs or host logs and then analyzes the data to identify relationships. " from digitalguardian.com
So yes you need both for siem. For simple monitoring you dont. Theres a big difference between what a siem does and that what simple event monitoring does.
Simplly : Correlation is the process to track relation between events based on defined conditions. Aggregation is nothing but to aggregate similiar events. Both are required for effective monitoring.
Works at a tech services company with 201-500 employees
Reseller
2020-06-09T15:01:38Z
Jun 9, 2020
Aggregation is taking several events and turning them into one single event, while Correlation enables you to find relationships between seemingly unrelated events in data from multiple sources and to understand which events are most relevant.
Both Aggregation and Correlation are needed for effective event monitoring and SIEM; In Enterprise Security (ES) correlation searches can search many types of data sources, including events from any security domain (access, identity, endpoint, network), asset lists, identity lists, threat intelligence, and other data in Splunk platform. The searches then aggregate the results of an initial search with functions in SPL and take action in response to events that match the search conditions with an adaptive response action.
Aggregation example - Splunk Stream lets you apply aggregation to network data at capture-time on the collection endpoint before data is sent to indexers. You can use aggregation to enhance your data with a variety of statistics that provide additional insight into activities on your network. When you apply aggregation to a Stream, only the aggregated data is sent to indexers. Using aggregation can help you decrease both storage requirements and license usage. Splunk Stream supports a subset of the aggregate functions provided by the SPL (Splunk Processing Language) stats command to calculate statistics based on fields in your network event data. You can apply aggregate functions to your data when you configure a stream in the Configure Streams UI.
Correlation example - Identify a pattern of high numbers of authentication failures on a single host, followed by a successful authentication by correlating a list of identities and attempts to authenticate into a host or device. Then, apply a threshold in the search to count the number of authentication attempts.
Deputy General Manager - Network Security at a tech services company with 201-500 employees
Real User
2020-06-09T13:07:59Z
Jun 9, 2020
Correlation is a method of consolidating same type of security events from multiple sources and generating single alarm or alerts to reduce multiple security events from each of the devices of similar types. whereas Aggregation is the process generating an alarm after multiple occurrences of an event take place, usually within a fixed timeframe. Example 10 failures login on device happen in 60 seconds will generate single alarm once not on each individual failure.
Consultant at a tech services company with 11-50 employees
Reseller
2020-06-09T11:59:06Z
Jun 9, 2020
Event correlation is an analytical process that looks for trends, patterns, thresholds, or sequences of events in your data. Even when they may not be the same event type (ex: a VPN authentication event followed by a door badge access event in a different location). There are many different ways to do this, including the very common real-time stream of events flowing through an analysis engine or batch queries for specific event sequences.
Aggregation is simply the process of collecting log data together in one place so that you can search and analyze it. Think of aggregation as centralized storage, and correlation as analysis. One note: some SIEM and LM vendors also use the term aggregation to mean consolidation of identical events separated only by time. For example, if you fail login 5 times in a row from the same source IP and same username all within one minute, some products "aggregate" that into a single event that shows the same details but a count and start/end time. That's done mostly for storage
optimization but it actually limits the chain of custody by taking away the legally useful original events.
The users of PeerSpot evaluated event monitoring software to determine the most important aspects of a product. The consensus was that the tool must operate strong data collection with an intuitive filtering system so as to provide enough, but not too much, information that can be drilled-down. Users were also concerned with the software's ability to customize displays per user requirements. Other key features included accuracy, dynamic but simple user interface, and alerts.
Both are techniques aimed at reducing the number of active alerts an operator receives from the monitoring tool.
I don't fully agree with the previous descriptions of correlation and aggregation, welcome though they are.
Let's take a typical scenario. Assume a network interface on a large switch fails to result in many systems experiencing a failure. In the 'raw' state, i.e. with no correlation or aggregation, the monitoring system would receive potentially thousands of events - possibly multiple SNMP traps from other network devices or servers, event logs records from Windows servers, Syslog entries from Linux, errors from the database management system, errors from web servers relying on that database and probably lots of incidents raised by users on the help desk. Good correlation algorithms will be able to distinguish between "cause" alarms and "symptom" alarms. In this scenario, the "cause" is the failing network switch port and the symptoms are the database failures and log file entries. Simplistically, fixing the cause will also address the symptoms.
Typically, aggregation is used to "combine" events into a single alarm. Again there are multiple methods to do this. A simple one would be - as previously described - duplicate reduction. In a poorly configured monitoring environment every check that breaches threshold results in an alarm. If monitoring is granular, say every 30 seconds the CPU utilization is measured and an alarm raised if it exceeds 80% then very quickly the operator would be overwhelmed by many meaningless alarms - especially if the CPU is doing some work where high CPU usage is expected. In this case, handling 'duplicates' is helpful when helping operators identify real issues. In this case, it may be enough to update the original alarm with the duration of the threshold breach.
There are many techniques for aggregation and correlation beyond identifying cause and symptoms events or ignoring duplicates. For instance, Time based event handling. Consider a scenario where an event is only considered relevant if another event hasn't happened in a given timeframe before or after the focus event. Or a scenario where avent aggregation occurs based on reset thresholds rather than alarm thresholds.
There are also some solutions that purport to intelligently correlate events using AI. Although, speaking personally, this seems more marketing speak than a one-click feature. In reality, these advanced (i.e. $$$$$$) solutions need to maintain a dynamic infrastructure topology in near-real-time and map events to service components in order to assess root cause correlation. In the days of rapidly flexing and shrinking infrastructures, cloud services, and containerization, it is extremely difficult to maintain an accurate, near-real-time view of an entire IT infrastructure from users through to lines of application code. A degree of machine learning has helped, but the cost-benefit simply isn't there yet for these topology-based event correlation features.
Yes, both of them are needed. Since their concepts have been well discussed here, I will just give a few examples of event processing rules I developed in BMC TrueSight for BMC customers in the last 17 years.
'Aggregation' requests are usually about combining multiple occurrences of one single event type into one event for the purpose to minimize the number of redundant incident tickets.
One type of aggregation is to combine multiple events occurred on different instances within a certain time period into one event. I call this type horizontal aggregation. One aggregation rule I developed combined all ping failure events within the last 10 minutes into one event if more than 10 servers failed on ping test. Another aggregation rule I developed combined all Remedy server process down events on a single Remedy server within last 10 minutes into one event.
Another type of aggregation is vertical aggregation along the time line. Take an example of CPU utilization on a single server. If CPU utilization exceeds a threshold at 10:00am, an event will occur and thus a ticket is created. If CPU utilization continues exceeding the threshold, BMC TrueSight won't generate more events. But if CPU utilization falls below the threshold at 10:15 am, BMC TrueSight will close the previous event. What if CPU utilization exceeds a threshold again at 10:30 am? Another event will occur and another ticket will be created. If this pattern goes on for 20 times in a day, we will get 20 tickets. This is sometimes called event flapping. The aggregation rule I developed combined all occurrences of the same type of events into one event based on user-defined criteria - either by a fixed time period (e.g. All high CPU events happened within an 8-hour fixed window go to one event/ticket) or by idle time (e.g. All re-occurrences happened after 3 hours of normal CPU utilization go to a new event/ticket).
'Correlation' requests are usually about grouping different event types (and often occurred on different servers) together for the purpose to identify root cause - though it can sometimes reduce the number of redundant incident tickets as well. Correlation may even add one higher-level ticket that links to all related lower-level tickets especially if these lower-level tickets are assigned to different support groups. One example is to correlate an event from synthetic transaction failure, an event from app server log monitoring, and an event from Oracle alert log monitoring occurred within the last 10 minutes. One challenge in event correlation is to correlate the related events only thus having an accurate infrastructure topology is critical. As discussed here previously, purely relying on discovery tool to keep a real-time topology is difficult and expensive. In BMC TrueSight, I sometimes had to develop an add-on data collection (custom PATROL KM) to extract the component relationship from configuration files on the server and execute this custom PATROL KM at the same schedule as BMC out-of-box PATROL KM.
Aggregation and correlation are necessary in enterprise SIEM in order to realize positive ROI.
They are not same. For evet monitoring (log management) aggregation is enough but if you need correlation then SIEM required. Aggregation means log parsing and correlation means developing rules to detect attacks
Aggregation is taking several events and turning them into one single event, while Correlation enables you to find relationships between seemingly unrelated events in data from multiple sources and to understand which events are most relevant.
SIEM event correlation is an essential part of any SIEM
solution. It aggregates and analyzes log data from across your network
applications, systems, and devices, making it possible to discover
security threats and malicious patterns of behaviors that otherwise go
unnoticed and can lead to compromise or data loss.
Aggregation and correlation: Agreeing on the right responses below.
Aggregation takes place during the flow of the real-time events to reduce duplicate events generated from the same source. Aggregation of the event can be adjusted in a few of the SIEM solutions to reduce logging, EPS, Storage, CPU, etc. (Solution Architecture or the platform Engineer has to decide the aggregation setting depending on what is to be achieved out of the environment).
Ex: Reducing same/similar sync events to saves in SIEM from a security device.
Correlation is the process of connecting/relating two different event properties from the Same or different log sources, Those events may or may not hold the same parsed fields. But correlation can only occur once the events are aggregated >> parsed >> Mapped to respective fields so that SIEM rules can check for required fields to trigger a correlated offense/alert.
Ex: Detecting and triggering security threat alerts from different security appliance (Firewall, IDS/IS, WAF, EDR, HIPS, AV ETC)
Suppression: let's not get confused with suppression of alerts as aggregation, As Suppression is used to reduce the same offenses generated multiple times and this takes place after Aggregation >> parsing >> Mapping >> Correlation >> Offense triggered >> Suppression.
Ex : Device not reporting from last 1 hour this can be suppressed as security team works to resolve the event till the devices back in action
Thank you
Yes. You need aggregation to show sustained activity over time which can indicate an attack, attempt to breach, or exfiltration. You need correlation to show things that happen contemporaneous which is especially useful if they should not or normally do not.
Not sure anything else could be added that Mr. Collier already stated. The aggregation of any events is to collect and combine events to develop a pool of raw data which could be analyzed later. To correlate events on any given situation is to look for similarities or disparities between those events. I do not see any applications or platforms on the market (yet) that can provide a solid foundation of correlating events. Given a very small sample of variables and LOTS of data, correlations can be surmised -- but still need a very manual process of validation.
Agree on all the answers posted here, and I especially like Dave's explanation on the more advanced solutions available on the market. Excellent call outs on the need for deep & well maintained relationship mapping to enable an AI's algorithm to connect-the-dots between aggregated alerts firing from multiple separate source tools. Having a mature ITSM implementations with CI-discovery, automated dependency-mapping, and full integration between your correlation engine & CMDB will help too.
Other answers are pretty much sum this up but there is one important point to make. In some technology it's important to take into account the number of events that got are aggregated and for your sim device to be able to treat them as individual events for the purpose of correlation.
Aggregation is a combining of the same events into one. So, we can reduce the number of events.
Correlation is an identification of patterns or interdependencies between various events.
Both processes are needed for effective SIEM. The first of them can be replaced by highly scalable computing architecture. And without the second one, we will not have the main analytical purpose of SIEM.
As previously mentioned, Correlation is the comparing of the same type of events. In my experience, alerts are created to notify when a series of these occurs and reaches as the prescribed threshold.
Aggregation, based on my experience, is the means of clumping/combining objects of similar nature together and providing a record of the "collection"; of deriving group and subgroup data by analysis of a set of individual data entries. Alerts for this are usually created for prognostication and forecasting. Often the "grouping" is not detailed information so there is a requirement for digging into the substantiating data to determine how this data was summarized.
Alerts/Alarms can be set for both, but usually only for the former and not the latter.
Hello Rony,
you should definitely put this question as thoroughly answered.
No need to go repetitive on that.
You can not process and generate advanced correlated alerts without aggregation: limiting your correlation to one set of source will let your SIEM blind and unaware
of global context.
So yes, to get an 'EFFECTIVE' event monitoring with the goal to correlate them, you need to aggregate many different sources.
"Aggregation is a mechanism that allows two or more events that are raised to be merged into one for more efficient processing" from www.ibm.com
"Event correlation takes data from either application logs or host logs and then analyzes the data to identify relationships. " from digitalguardian.com
So yes you need both for siem. For simple monitoring you dont. Theres a big difference between what a siem does and that what simple event monitoring does.
Simplly : Correlation is the process to track relation between events based on defined conditions. Aggregation is nothing but to aggregate similiar events. Both are required for effective monitoring.
Aggregation is taking several events and turning them into one single event, while Correlation enables you to find relationships between seemingly unrelated events in data from multiple sources and to understand which events are most relevant.
Both Aggregation and Correlation are needed for effective event monitoring and SIEM; In Enterprise Security (ES) correlation searches can search many types of data sources, including events from any security domain (access, identity, endpoint, network), asset lists, identity lists, threat intelligence, and other data in Splunk platform. The searches then aggregate the results of an initial search with functions in SPL and take action in response to events that match the search conditions with an adaptive response action.
Aggregation example - Splunk Stream lets you apply aggregation to network data at capture-time on the collection endpoint before data is sent to indexers. You can use aggregation to enhance your data with a variety of statistics that provide additional insight into activities on your network. When you apply aggregation to a Stream, only the aggregated data is sent to indexers. Using aggregation can help you decrease both storage requirements and license usage. Splunk Stream supports a subset of the aggregate functions provided by the SPL (Splunk Processing Language) stats command to calculate statistics based on fields in your network event data. You can apply aggregate functions to your data when you configure a stream in the Configure Streams UI.
Correlation example - Identify a pattern of high numbers of authentication failures on a single host, followed by a successful authentication by correlating a list of identities and attempts to authenticate into a host or device. Then, apply a threshold in the search to count the number of authentication attempts.
Correlation is a method of consolidating same type of security events from multiple sources and generating single alarm or alerts to reduce multiple security events from each of the devices of similar types. whereas Aggregation is the process generating an alarm after multiple occurrences of an event take place, usually within a fixed timeframe. Example 10 failures login on device happen in 60 seconds will generate single alarm once not on each individual failure.
Event correlation is an analytical process that looks for trends, patterns, thresholds, or sequences of events in your data. Even when they may not be the same event type (ex: a VPN authentication event followed by a door badge access event in a different location). There are many different ways to do this, including the very common real-time stream of events flowing through an analysis engine or batch queries for specific event sequences.
Aggregation is simply the process of collecting log data together in one place so that you can search and analyze it. Think of aggregation as centralized storage, and correlation as analysis. One note: some SIEM and LM vendors also use the term aggregation to mean consolidation of identical events separated only by time. For example, if you fail login 5 times in a row from the same source IP and same username all within one minute, some products "aggregate" that into a single event that shows the same details but a count and start/end time. That's done mostly for storage
optimization but it actually limits the chain of custody by taking away the legally useful original events.