We do work as independent consultants, but mainly the focus is on a crisp and reliable base layer for Service Level and Business Service Management with a working CMDB. In order to map the data and events correctly, you have to have a solid foundation.
CEO at a tech services company
Before choosing this product, we evaluated other options, and we still do. Mainly, it ends in a mixture of tools, and using open source-based tools reporting up into it.
Pros and Cons
- "The Event Management is outstanding; still is the most interesting part of the product."
- "The sizing (which is difficult), the maintenance of it and the upgrade paths. This is a difficult area which is not easy to cover, as every client has a different approach of implementing the product."
How has it helped my organization?
What is most valuable?
The Event Management is outstanding; still is the most interesting part of the product.
What needs improvement?
The sizing (which is difficult), the maintenance of it and the upgrade paths. This is a difficult area which is not easy to cover, as every client has a different approach of implementing the product.
What do I think about the stability of the solution?
Stability is mainly a sizing issue. The product needs to be correctly sized and architectured. For this, you need skill and experience. If you follow this advice, you will have no issues. If you implement without a plan or architecture, you will be lost.
Buyer's Guide
BMC TrueSight Operations Management
November 2024
Learn what your peers think about BMC TrueSight Operations Management. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
816,406 professionals have used our research since 2012.
What do I think about the scalability of the solution?
This is related to stability. You need to know what you have, then all will go well.
How are customer service and support?
Customer Service:
People buy from people. If your account rep is a good one, all goes well. You cannot answer that easily. I have seen light and shadow, as one could say.
Technical Support:
Support has room for improvement. Very often, you find yourself answering the very same questions over and over again. I would give it a 6-7.
Which solution did I use previously and why did I switch?
Some of the clients I have came from other solutions; mainly because they were outdated they switched, or because they were discontinued. The same applies in the other direction, especially if the clients had the wrong account rep.
How was the initial setup?
Initial setup seems to be easy. The deeper you go, the more you need to know about the product, especially about its agents. Some functions are under-represented, especially the Agent Consoles, which are a little too basic compared to the old versions. So you still use a mix of versions which leads to no savings in hardware at all. HA setups are complex (best to use VMotion). Ports are not that well documented. Again, experience is the point. If you know the products under the hood for a long time, you will do good; otherwise, you might run into problems. This is the same for lots of products in the area. If you know what you do, all goes well.
What about the implementation team?
We normally do these kinds of implementations; I am a consultant, not a real end-user, as the clients no longer have the expertise on board (no matter which product they use).
What was our ROI?
Monitoring is like an insurance. If you have it, you feel safe. If you do not have it and run into an accident, you wished you had it.
What's my experience with pricing, setup cost, and licensing?
Use conservative figures. In terms of hardware, monitored servers and also effort. The product is not cheap. But as with other products, you get what you pay for.
Which other solutions did I evaluate?
Before choosing this product, we evaluated other options, and we still do. Mainly, it ends in a mixture of tools, and using open source-based tools reporting up into it, like Zabbix, OP5, Nagios XI or something like that.
What other advice do I have?
Estimate enough time for the implementation. Never trust anyone who tells you that you will be finished in three months. Calculate at least one year for all tasks.
Disclosure: My company has a business relationship with this vendor other than being a customer: We are a consulting partner of BMC, as we are for other vendors. But we do not sell any licenses at all, for any vendor. We do pure consulting, also for other products. We simply report and present different options, and the client decides what to use.
Technical Consultant at Intercom Enterprises
Beneficial dashboard, simple setup, and reliable
Pros and Cons
- "The most valuable feature of BMC TrueSight Operations Management is the dashboard presentation server."
- "BMC TrueSight Operations Management could use some enhancements in the application visibility tools."
What is our primary use case?
I am using BMC TrueSight Operations Management to monitor infrastructure and applications that are hosted on servers.
What is most valuable?
The most valuable feature of BMC TrueSight Operations Management is the dashboard presentation server.
What needs improvement?
BMC TrueSight Operations Management could use some enhancements in the application visibility tools.
For how long have I used the solution?
I have been using BMC TrueSight Operations Management for approximately three months.
What do I think about the stability of the solution?
BMC TrueSight Operations Management is stable.
What do I think about the scalability of the solution?
The solution is scalable.
I am the only one using this solution in my organization.
How was the initial setup?
The initial setup of BMC TrueSight Operations Management is easy. It takes approximately 15 minutes to do the installation.
What other advice do I have?
I would advise others to use BMC TrueSight Operations Management.
I rate BMC TrueSight Operations Management an eight out of ten.
Which deployment model are you using for this solution?
On-premises
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Buyer's Guide
BMC TrueSight Operations Management
November 2024
Learn what your peers think about BMC TrueSight Operations Management. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
816,406 professionals have used our research since 2012.
Senior Performance Analyst and BMC ProactiveNet administrator at a government with 10,001+ employees
The tailoring of the knowledge modules has been particularly useful
Pros and Cons
- "The tailoring of the knowledge modules has been particularly useful as I can streamline the agents to only report on critical events."
- "The knowledge modules could be more lightweight in size. At present, the installation packages can be quite large."
What is our primary use case?
Monitoring applications and servers. We also monitor individual pieces of management software, like WebLogic.
How has it helped my organization?
Proactively monitoring 24/7/365 on all of our servers. This allows technical staff to focus on other areas and our operators can monitor the systems.
What is most valuable?
The tailoring of the knowledge modules has been particularly useful as I can streamline the agents to only report on critical events.
What needs improvement?
The knowledge modules could be more lightweight in size. At present, the installation packages can be quite large.
For how long have I used the solution?
One to three years.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Information Systems Computer System Controller at a insurance company with 11-50 employees
Provides great support for the business tools and IT service
Pros and Cons
- "The solution has a very good business event manager tool."
- "The solution is overly complex."
What is our primary use case?
Our company is currently moving to consolidate the different programs we use. We regularly use Patrol and TrueSight, both are BMC products, providing the same functionality although completely different solutions. We are evaluating which is the right product for us and we're taking everything into consideration because the economy is not great and we have budget issues. Our business requires several complex configurations of systems; web servers, databases and processing environments. All of them must work together under proper performance and this is where we need a complex product like TrueSight.
What is most valuable?
The business event manager tool that consolidates detailed information from a single instance of equipment is the most valuable thing for me. It provides support for the business tools and the IT services which come from several systems. Some are replicated and service tools provide the same functionality for some things. The end user service is made up of a lot of systems and it's what I'm interested in, and how I discovered that BMC TrueSight is good for us. I don't use the event management or monitoring capabilities, I work with user management capabilities.
What needs improvement?
I think the solution is overly complex and requires a lot of resources.
For how long have I used the solution?
I've been using this solution for around 18 months.
What do I think about the stability of the solution?
I haven't noticed any issues with stability. We sometimes have to call for failures or questions but on the whole, it's fine.
What do I think about the scalability of the solution?
There are no concerns about scalability, it's performing well. We have deployed it where we have the most critical applications. We have changed our approach to new architecture and mobile. Instead of big servers, we have now deployed formal servers for web services. We're working on increasing the number of servers available. Our only concern is that it requires some investment at the beginning of the project and we have budget concerns.
How are customer service and technical support?
We don't use BMC technical support directly, we go through a partner.
How was the initial setup?
I am involved in the planning and the development of the solution, so from my perspective the initial setup is a little complex but not in itself, rather because managing the user services requires access to a CMDB. To get the best from this kind of product requires other processes and tools to be aligned with it. The consideration is that these tools provide very good functionality but getting the benefits requires other processes and tools. Our deployment is still in progress, we've been working on it for six months using a consultant from a third party, a BMC partner.
What's my experience with pricing, setup cost, and licensing?
We haven't yet established what the final cost would be for licensing this solution, we're still working on that.
What other advice do I have?
These kinds of products provide benefits if you have other processes that require alignment with other IT solutions, like in sales and deployment and CMDB. Without that, you don't get the full benefits. At the end of every phase we stop and check the software products before starting the next phase of the project.
I would rate this solution an eight out of 10.
Which deployment model are you using for this solution?
On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Enterprise Monitoring Automation Administrator at a healthcare company with 10,001+ employees
We can verify uptimes as another source of keeping devices in compliance
Pros and Cons
- "The ability to pull hosts together to show what processes are running, so it can be used for change management."
- "We can verify uptimes as another source of keeping devices in compliance."
- "More modules for less popular applications and better documentation."
What is our primary use case?
We use it to scan and monitor our server environment. This allows us to monitor devices which are introduced as they are spun up, to see that there are no unknown devices, then we can verify uptimes as well as patching as another source of keeping devices in compliance.
How has it helped my organization?
Allows reliable access to server hardware info, uptime statuses, current patching, and much more. This allows us to make sure we have an updated inventory, as we feed this into our inventory system along with info from Atrium CMDB.
What is most valuable?
The ability to pull hosts together to show what processes are running, so it can be used for change management.
What needs improvement?
More modules for less popular applications and better documentation. Documentation can be great at times, but lacking in other areas.
For how long have I used the solution?
One to three years.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Senior Software Engineer with 201-500 employees
Online documentation is often incorrect/incomplete. It is helpful to be able to apply rule-based routing to alerts.
Pros and Cons
- "It is very helpful to be able to apply rule-based routing to alerts."
- "TSOM's ability to consolidate alerts into a single location and provide filtering of alerts is great."
- "It has provided us with a single location to host all events to be viewed/monitored by our NOC. This has greatly helped them to streamline their processes."
- "BMC's solutions for cloud monitoring (monitoring of AWS and Azure resources) are very poor in stability and customization."
- "BMC's online documentation is often incorrect or incomplete."
What is our primary use case?
We utilize BMC TSOM to monitor our entire infrastructure and all applications that lie therein. Our infrastructure is hosted both in our datacenters and in cloud hosted services (AWS and Azure).
How has it helped my organization?
It has provided us with a single location to host all events to be viewed/monitored by our NOC. This has greatly helped them to streamline their processes.
What is most valuable?
TSOM's ability to consolidate alerts into a single location and provide filtering of alerts is great. It is very helpful to be able to apply rule-based routing to alerts as well.
What needs improvement?
For how long have I used the solution?
One to three years.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Consultant at a tech consulting company with 51-200 employees
BPPM has the potential to be a market beating product however, the investment required is significant
This article is a review of BMC ProactiveNet Performance Manager (BPPM) version 8.6 and its key sub-components.
The main key sub-components include:
> ProactiveNet Analytics
> ProactiveNet Event Management (formerly Mastercell)
> ProactiveNet Performance Manager (i.e. PATROL)
Versions Reviewed
Component |
Version |
BPPM Event Manager |
8.6 |
BPPM Analytics |
8.6 |
PATROL Central |
7.8.10 |
PATROL Central Operator – Web Edition |
7.8.10 |
PATROL Agent |
3.9.00.1i |
PATROL for UNIX Servers |
9.10.00.02 |
Key Capabilities
Event Management
BPPM Event Management (previously known as Mastercell or BEM) is the component that replaces PATROL Enterprise Manager or PEM (previously known as CommandPost).
BPPM introduces a programming language called MRL. MRL is not as flexible as PERL or REX which can both be used in PEM, but MRL does include many in-built features such as policies that make the design of rules slightly easier.
PEM used to perform event management using up to 5 transformers or scripts written in PERL. PEM was effectively a tool box whereby all the intelligence is provided by the PERL scripts which enrich the events using a number of lookup files.
Which product is better, PEM or BPPM? BPPM is arguable a better event management platform. Although MRL is frustrating to work with, the in-built capabilities mean that you don’t have to develop everything from scratch. BPPM is generally a good event management platform.
Threshold Management
PATROL Configuration Manager (PCM) is one of the best threshold management tools in the industry. The threshold management capabilities on BPPM (aka ProactiveNet) are poor in comparison. BMC state that they will include PCM functionality on the next release of BPPM.
The limitations of Threshold management in BPPM are numerous:
- BPPM has no local thresholds that can be applied across multiple servers.
- Local thresholds can only be defined via the GUI.
- Local thresholds can’t be migrated from one environment to another.
- Migration of global thresholds can be performed using a export/import utility – but it is not simple.
- The GUI for managing thresholds is cumbersome and not intuitive.
On the plus side, the different types of thresholds in BPPM are very powerful. BPPM has Absolute, Intelligent, Signature and Predictive thresholds. These thresholds are statistically based and will generate events when a statistical anomaly is detected. The product will automatically calculate trends using linear regression and variations based upon hourly, daily or weekly patterns. However, the statistics will not eliminate threshold management as BMC have sometimes claimed. Many thresholds are Boolean in nature – either good or bad - and are therefore not approriate for statistical analysis. Statistical analysis is only appropriate for about 20% to 30% of thresholds and analysis consumes a lot for CPU cycles.
Ease of Implementation
BPPM is undeniably a complex product. Far too complex in my opinion. There are many other much simpler solutions such as HP SiteScope or CA Nimsoft which can be implemented much faster. In addition, the BMC Product Set has gradually got more and more complex over the years. The solution is really three products bundled together:
- MasterCell which BMC purchased about 7 years ago.
- ProactiveNet which BMC purchased about 4 years ago.
- PATROL which BMC purchased about 20 years ago.
MasterCell is a great event management product. ProactiveNet has perhaps been oversold by BMC – and the value is overstated. The autonomous thresholds can only be applied to 20% -to 30% of parameters anyway. PATROL was originally a great product – but has become bloated and complex after years of poor product management.
As an illustration of how complex the BPPM solution has become, consider the following table:
Component / Feature |
Old Solution with PEM |
New BPPM Solution (version 8.6) |
Number of Servers |
3 (DEV, DR and PROD) |
11 (3 DEV, 3 TEST, 5 PROD) |
Number of Connections to the Agents |
2 (PEM and RT Server) |
3 (BIIP3, BPPM Adaptor, RT Server) |
Number of Adaptors |
1 – RT Server |
3 (RT Server, BPPM Adaptor, BIIP3 |
Dynamic Policy Files (for Rules) |
5 Rule Files |
12 Rule Files |
Forms for Threshold Management |
1 PCM |
2 (TEST and PROD BPPM Servers) |
Extensibility
The PATROL agent has always been very extensible. There is a rich API and many different ways to write an interface. PATROL Central has no API and therefore can not be extended. Both BPPM and PEM are very extensible and can be extended through a variety of scripting languages such as PhP or PERL.
Blackout
BMC has never provided a web form that allows staff in the Operations Bridge to blackout servers or services for upcoming outages due to planned maintenance. This customer (mentioned in this review) had to write its own Web GUI for Blackout. This is an Apache and PhP solution that allows the shift operators to configure blackouts. It required 25 days of development to alter the blackout web form and migrate this functionality from PEM to BPPM.
Administration
Routine Daily Admin Tasks
For an environment of 500 Agents, BPPM requires from 0.5 to 1 FTE to keep the lights on - depending on the experience of the person. Typical daily tasks include the following:
- Restarting Agents. For an environment of 500 Agents, you can expect that 1 agent will crash per day. The most common cause is probably history file corruption. History files can grow to beyond 4 GB if not managed.
- Checking the Consoles. Most environments will end up with a
hierarchy of BPPM Event cells. The Administrator needs to log into each
Console to verify that events are being:
- De-duplicated properly;
- Propagated correctly from one cell to the next;
- that incidents are being raised correctly - if Automtic Incident Generaion (AIG) is configured.
- Managing Thresholds. The Administrator will get on average one request per day to change a threshold or verify that a threshold is in place. For example, an ORACLE DBA may say that there was a SEV2 incident last night related to table locking. "Could you please check that instance DW_PROD is monitorited for locking.?" It can take from 30 minutes to 2 hours to investigate each request and write an email suggesting and agreeing the new threshold. Perhaps longer if a meeting is required.
- Managing Rules. Changes to the BPPM Rules occur about once per month and need to be performed using change control. Rule changes require a code change to the MRL and the cells will need to be bounced.
- Commissioning and Decommission New Agents. Agent commissioning using occurs every few months and may involve up to 20 virtual hosts associated with one Physical machine. The Commissioning process is faily involved (in fact all the Admin steps are complex). See below.
- Deploying KMs. When the support teams deploy new infrastructure software such as Websphere or ORACLE, the associated PATROL Knoweldge Module (KM) will also need to be deployed. Each deployment may take 1-3 hours and will require change control. Input will be required from the SME. For example, the ORACLE DBA may be required to type in the system password for ORACLE during the KM Configuration process.
PATROL Agent Commissioning
The Agent commissioning process for configuring monitoring for a new server consists of the steps shown below:
Step Number |
Step |
Description |
1 |
Ping Host |
Ping Host to very that the hostname is correct? |
2 |
Install Agent |
Install Agent Using Solaris Package |
3 |
Update Event Rules |
edit BPPM enrichment file abc_host.csv |
4 |
Apply to PROD Cell |
import abc_host.csv into PROD cell |
5 |
Apply to TEST Cell |
import abc_host.csv into TEST cell |
6 |
Update PING Test (primary) |
Update PING Test configuration on Primary Server to ensure the host is up. |
7 |
Update PING Test (secondary) |
Update PING Test configuration on Secondary Server to ensure the host is up. |
8 |
Configure UNIX km |
Use PCM to give Agent Standard Configuration for the UNIX km. |
9 |
Update BIIP3 |
Update BIIP3 Config so that the Agent can talk to the Event Management Cell. |
10 |
Agent Restart |
Restart the Agent to ensure that the Agent Configuration takes affect. |
11 |
Update PCO Web Console |
Update PCO Web Console so that the Agent appears in the PATROL console. |
12 |
Update Work request |
Update the Work request to indicate the job is complete. |
If additional Monitoring is required for ORACLE or WEBLOGIC or some other Application, then there are additional configuration steps that are required.
Programming Languages
There are two languages to learn with BPPM
- MRL or Mastercell Rule Language - This is a fairly unique programming language.
- PSL or PATROL Script Language. This language is similar to PERL. The complexity lies in the functions that need ot be learned.
Summary of Administration
Administration of BPPM is overly complex. The product has evolved over the course of the last 20 years. As another new component has been added via aquisition, the product has become increasingly complex and time consuming to administer.
Architectural Considerations
Any Solution Design for BPPM should consider the following key questions:
Question |
Details |
How does the design allow for rule tracing? |
Using the trace log is not practical due to the volume of events. A good solution is to assign a Unique ID to each rule and then configure each rule to add an entry to a new slot called “matching_rules”. |
How does the design specify rule execution order? |
It is often difficult to design rules because of confusion about rule execution order. It is good practice to split all mrl files into mrl files for new rules and mrl files for refine rules. So you get: new_mcxp.mrl and refine_mcxp.mrl. The files then should be grouped in the .load file by stage, so you have refine rules followed by new rules … etc. |
Does the DEV environment have the same number of cells as the TEST and OAT environments? |
Don’t be tempted to have fewer cells in the DEV environment. It is tempting to have fewer cells in order to limit the number of zones (servers) required. This is a mistake. Rule execution order is greatly affected by the propagation (or not) of slots between cells and the configuration of mcell.propagate. |
Does the design specify the configuration of mcell.propagate? |
The design should specify the configuration of all mcell config files – including mcell.propagate, mcell.dir etc. |
Is BIIP3 included in the Design? |
BIIP3 is essential in order to forward PATROL events to the cells for any cells that are not event class 11 and 39. These events are explicitly generated by the PSL event_trigger() function. It is impossible for BPPM Analystics (ProactiveNet) to collect these events because they have no associated metric. |
Threshold Management |
If thresholds are being migrated fro PCM to BPPM, How will the thresholds be migrated from BPPM server to another? Has the export / import process been thoroughly tested? (because is has serious issues). I would advise migrating the thresholds to BPPM as a Phase II activity or wait for BPPM v9. |
Export Thresholds from PCM |
Does the design specify using a tool for extracting all the thresholds from PCM into a spreadsheet? (I have a PERL tool to do this). |
Testing |
Does the Design provide for at least a month of end-to-end testing once the rules have been completed. |
Monitoring the Monitoring |
Does the Design incorporate monitoring of the monitoring? Will an event be generated if the BIIP3 Adapter fails? |
Event Storm |
If the BIIP3 Adaptor looses connection to multiple agents every half an hour and then regains the connection 30 seconds later this will create 200 new AGENT_DOWN events (mc_adapter_control). The de-dup rule will not work because the AGENT_UP event closes the AGENT_DOWN event. What rule is going to prevent this event storm? |
Time-out Policies |
Does the Design specify timeout policies for all the main top level event classes such as MC_CELL.. and EVENT. Does the cell start reasonably quickly with 2000 events? What about 20,000 events? |
DDE Enrichment |
Does the Design fully specify the Enrichment files that will be used? |
DDE Synchronization |
Are the DDE config files pulled or pushed into the cells? How are the DDE cfg files synchronized between cells? |
Blackout |
Has a Web site been included in the Design for Blackout by the Operations Bridge? BPPM does have a “Schedule downtime” facility – but this is entirely inappropriate for operators and does not account for BIIP3 events. |
Blackout Dev |
If a blackout GUI is a requirement, has a month of Development been allocated (using something like Apache and PhP)? |
BPPM Analytics |
Does the Design discuss the possibility of implementing BPPM Analytics as a second phase? |
Reporting |
Does the design include Event Reporting to drive Continuous Improvement? Key reports are total events grouped by:
|
Reporting DEV |
If reporting is a requirement, does the Design include time to implement the BMC reporting tool or 2 weeks of development using PhP and mquery. |
AIG |
Does the Design Include Automatic Incident Generation? (AIG). Semi-automatic incident generation an option – whereby an operator creates a ticket by right clicking on an event. Is this option considered and discussed in the design? |
Failover |
Is failover considered? How is the configuration replicated? Replicated DISK? |
Training |
Doe the project plan include time for Training the staff in the operations Bridge? What about 2nd level support? |
Go-live |
Is the Go-Live big bang or Phased? Phased is preferred for risk mitigation but will require operators to run two consoles in parallel. |
Audible Alarm |
Is an Audible alarm a requirement? If so, then this will require a few days of development to configure a web page that uses a sound file and “mquery –s COUNT”. |
BPPM Classes
BPPM Has a number of event classes as shown below which all inherit from the CORE_EVENT class.
CORE_EVENT
- EVENT
- MC_CELL_EVENT
- MC_UPDATE_EVENT
- MC_SMC_ROOT
- MC_MCCS
- MC_CLIENT_BASE
- MC_CLIENT_CONTROL
- MC_CLIENT_ERROR
- MC_ADAPTOR_BASE
- MC_ADAPTER_CONTROL
- WIN_EVENTLOG
- LOGFILE_BASE
- SNMP_TRAP
- PEM_EV
- PATROL_EV
- PPM_EV
- ALARM
- MC_CELL_CONTROL
- MC_CELL_START
- MC_CELL_STOP
- MC_CELL_TICK
- MC_CELL_STATBLD_START
- MC_CELL_STATBLD_STOP
- MC_CELL_DB_CLEANUP
- MC_CELL_CONNECT
- MC_CELL_CLIENT
- MC_CELL_DESTINATION_UNREACHABLE
- MC_CELL_HEARTBEAT_EVT
- MC_CELL_RESOURCES
- MC_CELL_ACTION_RESULT
- MC_CELL_PUBLISH_RESULT
- IAS_EVENT
- IAS_START
- IAS_STOP
- IAS_SYNCH_EVENT
- IAS_REINIT
- IAS_LOGIN
- IAS_ERROR
Mastercell Rule Language (MRL)
Mastercell Rule Language (or MRL) is the language used to develop event management rules within BPPM. The administrator can develop 11 different types of rules as shown in the table in section "Rule Phases" below. The language is simple and relatively easy to learn in terms of both the syntax and the in-built functions. The most difficult concept to grasp is the execution order as explained below. One of the most common problems with the rules is to misunderstand the execution order and find that the rules are not executing in the desired sequence. The other cause of frustration is the lack of common statements such as a looping structures (do, while for until) which one takes for granted in other languages. It is possible to iterate over a list structure using the listwalk() function call. The New rule phase also has limited capability to loop over events using the Updates clause. Fortunately however, the need to loop is fairly rare. However, at times the lack of standard statements can be a cause of frustration.
The biggest problem with MRL is the slow cycling speed when debugging code. Compared to PhP or PERL, it takes at ten times as long, to stop, compile and restart. So debugging cycles are 10 times as long and productivity is similarly affected. True, it is not necessary to write pages and pages of code - but typically one will write about 8-15 pages of MRL for each project. 8 pages of PhP (tested and debugging) takes 1 to 2 days. 8 pages of MRL (tested and debugged) takes 2-4 weeks. In addition, one should allow for an additional month of End-to-End testing before production go-live to test the rules with real events - and to allow for all possible scenarios to play out and for all the bugs to emerge. This rules of thumb apply for companies of 5,000 to 10,000 employees. For larger organizations, you should allow for more time.
Execution Order
- Rules are processing in order according to their rule phase as shown below.
- Rules are executed in the order in which they appear in the .load file.
- Rules are executed in the order in which they appear in the mrl file.
- Policies are executed in order of the specified ‘execution order”.
Rule Phases
Rules are executed in the order shown below.
Execution Order |
Rule Phase |
Description |
1 |
Refine |
A Refine rule verifies the validity of incoming events and collects additional data for an event before it is sent through the remaining rule phases where further processing takes place. |
2 |
Filter |
Filter rules limit the number of incoming events by discarding those events that need no additional processing or analysis. Filter rules compare incoming events to the event condition formulas (ECFs) contained in the rule to determine if an event is discarded or proceeds to further processing. An incoming event is processed through each Filter rule until a Filter rule discards the event, or all Filter rules are exhausted. An event must match all the Filter rules to be accepted. |
3 |
Regulate |
Use regulate rules to handle time frequency accumulations of events or repetitive occurrences of events. An event is considered a repetition of another if the event has the same values for all the slots that are defined with the dup_detect=yes facet in the BAROC definition of its event class. |
4 |
New |
Use New rules to execute an action when a new event is received, for example increasing the severity level for an event or updating an existing event with new event data. New rules determine if an event becomes permanent and is placed in the repository. |
5 |
Abstract |
Abstract rules create high-level, or abstract, events based on low-level events. A new event starts at the new rules phase, skipping the filter and regulate rules phases. With Abstract rules, you can keep low-level events with cells in the lower-level of the cell hierarchy, abstract the data from low-level events into high-level events, and propagate them to a higher-level cell. A high-level cell in the hierarchy can consolidate abstract events from several low-level cells and prevent a large number of abstracted technical events for which no consolidating rules apply. |
6 |
Correlate |
Correlate rules build an effect-to-cause relationship between an event that occurs as a result of another event. Correlate rules execute whenever a cause or an effect event is received. The relationship between correlated events can be broken. |
7 |
Execute |
The Execute rule performs a specified action when a slot value has changed in the repository. The specified action, which is either internal to the cell or running an external executable, is based on the characteristics of one or more events. |
8 |
Threshold |
The Threshold rule counts the number of events that matches the criteria you specify if the number of these events exceeds the amount allowed within a time frame the Threshold rule executes. An event is considered a repetition of another if the event has the same values for all the slots that are defined with the dup_detect=yes facet in the BAROC definition of its event class. |
9 |
Propagate |
A cell uses Propagate rules to forward events or messages to one or more destination cells or gateways. For example, a Propagate rule can escalate an event from a lower level cell to a higher-level cell in an environment. |
10 |
Timer |
Use Timer rules to create timed triggers to call a rule. Timer rules are evaluated when a timer expires. |
11 |
Delete |
The purpose of Delete rules is to perform actions before an event is discarded from the repository, such as a rule that suppresses data that has no meaning without an event instance. Delete rules are evaluated whenever an event is deleted from the repository or when events are deleted using the Delete flag in the mposter command. |
PATROL Configuration Manager (PCM)
PATROL Configuration Manager (PCM) is a configuration tool used for PATROL agents. The tool is mainly used for configuring Thresholds and is very effective at this task.
Operation
PCM is similar in concept to the Windows registry editor. The Main Form consists of a two TreeView panes as shown below. The left TreeView is used to configure hosts which are arranged in groups such as ORACLE (shown below). The right hand TreeView is used to manage the rules which can also be arranged into groups. The RuleSets are linked to the Hosts by dragging RuleSets from right to left. The RuleSets are dragged and dropped onto the leaves marked "LinkedRuleSets". The user then invokes a command called "Apply RuleSets". The Rulesets are applied to each Agent in the same order as they appear in the hierarchy on the left. RuleSets linked to lower level nodes take precedence and "override" higher level group RuleSets.
Typical Use Case
The use of PCM typically follows a three step process. Administrators must perform the following:
- Select an Agent as a master and configure this Agent using the PATROL Central Operator (PCO) Console.
- Copy the configuration into PCM.
- Apply the configuration to other similar Agents using PCM.
- Restart the Agents in order for the configuration to take affect.
Weakness
The key weaknesses of this configuration process are the following:
- PCM and PCO are seperate tools. Ideally, the configuration tool (PCO) and the configuration distribution tool (PCM) should be the same product. This would eliminate step 2 above.
- Step 4 should not be necessary. Restarted the Agents can be easily performed using PCM - but the problem is that all active events are regenerated. This means that all agents must be blacked out for up to an hour before any restart - otherwise staff in the Operations Bridge will see hundreds of duplicate events that they have already handled over the last few hours.
Desired State Management
The key benefit of PCM is that it can be used to manage a Desired State for each Agent If you apply the configuration once or a thousand times, the result is exactly the same. The Hierarchy allows one to set global or default configuration using the higher nodes in the left TreeView an then to override the configuration with local (host specific) configuration using the lower nodes. This hierarchy works extremely well.
Policies
The Policies feature within BPPM Event Management is gnerally a well executed feature within the product and has suffcient flexibity to meet most customer's needs. The Dynamic Data Enrichment (DDE) policies allows the user to manage the rules externally using Comma Seperated Value (CSV) files.
The key thing that must be kept in mind, is that the DDE policies match based on Best Fit and not First Match. So for example, if you want to match on a hostname called "fred*" (the star is a wild card) then frederick will match before fred* even if fred* appears first in the csv file. The rules are loaded into a hash memory structure within the product. The benefit of 'Best Fit" is that the execution time for finding a match is predictable - irrespective of the number of lines in the CSV file (and there could be thousands). The disadvantage of "Best Fit" is that the matching can be out of sequence and counter-intuitive. Best Practice in this case is to keep the CSV files simple. Each Enrichment file should also have only one purpose. For example, the customer used in this review orignally started with 5 enrichment files with their old PATROL Enteprise manager (PEM) environment. After implementing BPPM, the customer ended up with 11 DDE enrichment files. The number of total lines was less, but the number of files was more.
When migrating from PEM to BPPM, the enrichment files should be "Normalized" - by minimizing the number of lookup columns in order to reduce the probability of out-of-order rule matching.
BMC Standard Policies
Policy |
Description |
Closure |
An closure policy closes a specified event when a separate specified event is received. |
Blackout Policy |
A blackout policy might be used during a maintenance window or holiday period |
Component Based Enrichment |
enriches the definition of an event associated with a component by assigning selected component slot definitions to the event slots |
Enrichment |
enriches the definition of an event associated with a component by assigning selected component slot definitions to the event slots |
Correlation |
Correlation relates one or more cause events to an effect event, and can close the effect event The cell maintains the association between these cause-and-effect events. |
Escalation |
Escalation raises or lowers the priority level of an event after a specified period of time. A specified number of event recurrences can also trigger escalation of an event. For example, if the abnormally high temperature of a storage device goes unchecked for 10 minutes or if a cell receives more than five high-temperature warning events in 25 minutes, an escalation event management policy might increase the priority level of the event to critical. |
Notification |
Notification sends a request to an external service to notify a user or group of users of the event. A notification event management policy might notify a system administrator by means of a pager about the imminent unavailability of mission-critical piece of storage hardware. |
Propagation |
Propagation forwards events to other cells or to integrations to other products. |
Recurrence |
Recurrence combines duplicate events into one event that maintains a counter of the number of duplicates. |
Remote |
Remote action automatically calls a specified action rule provided the incoming event satisfies the remote execution policy’s event criteria. |
Suppression |
Suppression specifies which events that the receiving cell should delete. Unlike a blackout event management policy, the suppression event management policy maintains no record of the deleted event. |
Threshold |
Threshold specifies a minimum number of duplicate events that must occur within a specific period of time before the cell accepts the event. For events allowed to pass through to the cell, the event severity can be escalated or de-escalated a relative number of levels or set to a specific level. If the event occurrence rate falls below a specified level, the cell can take action against the event, such as changing the event to closed or acknowledged status. |
Timeout |
Timeout changes an event status to closed after a specified period of time elapses |
Component Based Blackout |
Specifies which events the receiving cell should classify as unimportant and therefore not process . The events are logged for reporting purposes. A Component Based Blackout event management policy might specify that the cell ignore events generated from a component or device based on component selection criteria for this policy. |
Typical DDE Enrichment Files
CSV File Name |
Description |
Lookup Columns |
Data Columns |
Host.csv |
Assign Location and HostType (DEV, TEST or PROD) based on host name | HostName | Location, Physical Server, HostType |
HostSuppress.csv |
Filter out events based on hostname (e.g. when new Agent installed) | HostName | HostSuppress (YES,NO) |
Application.csv |
Assign an application nane to each event. | ApplicationClass, Parameter | Application |
ObjectSuppress.csv |
Filter out troublesome parameters based on Event class | ApplicationClass, Parameter, EventClass | ObjectSuppress (YES,NO) |
ApplicationSupress.csv |
Filter out events based on application | Application | ApplicationSuppress (YES,NO) |
HostBlackout.csv |
Blackout Hosts for planned outages based on timeframe | HostName, PhysicalServer, Location | TimeFrame |
Service.csv |
Assign Service Name to all events | Host, Instance, HostType | Service, SupportGroup |
ServiceSuppress.csv |
Filter Out events based on service | Service | ServiceSuppress (YES,NO) |
ServiceBlackout.csv |
Blackout services for planned outages during a particular time frame | Service | TimeFrame |
ServiceDowngrade.csv |
Downgrade severity for particular services | Service | SeverityCode (e.g. 12333) |
TextMessage |
Change message Text for certain parameters | ApplicationName, Parameter, EventClass | NewMesaage |
Note: Severitycode of 12333 downgrades MAJOR (4) and CRITICAL (5) to MINOR (3).
Issues
PATROL Agent Restart
If the PATROL agent’s configuration is changed, then the agent usually requires a restart. Unfortunately, the PATROL Agent regenerates all active events (any parameter that exceeds a threshold) when the agent is restarted. This means that all an agent must be blacked out when the Agent is restarted.
PATROL Agent History Corruption
The Agent History file will always get corrupted if the History file exceeds 4 Gbytes. There is a 4 GB file size limit on Solaris. The history file will frequently exceed this limit on busy servers running messaging services such as Tuxedo or MQ (simply because there is a lot to monitor). The history file may get corrupted for other reasons. When the Agent gets corrupted, it will generated an event for every attempt to store a parameter value. This problem can generate hundreds of events every few minutes from just one host. This number events can easily overload a cell and a BIIP3 Adaptor (see BIIP3 Corruption below).
With 500 UNIX Agents, you should expect one agent to get corrupt history about every 2 weeks.
BIIP3 Cache File Corruption
If the BIIP3 cache file is corrupted, the BIIP3 can get stuck on one event and keep generating the event. I have seen 4 million repeated events in a cell due to this problem.
BIIP3 Cache file corruption may be caused by overloaded (see PATROL Agent History Corruption above).
I have seen this problem occur twice within 3 months.
The workaround is to clear the ache file and restart the BIIP3 Adaptor.
BIIP3 Agent Connection Drops
In certain situations, the BIIP3 Adaptor may loose connection with all the agents every half an hour. The Agent will then gain connection again almost immediately. This causes a flapping AGENT_DOWN and AGENT_UP condition that is not de-duplicated – because the AGENT_UP clears the AGENT_DOWN event. This issue can generate thousands of events and thousands of new Incidents (assuming Automatic Incident Generation is implemented).
One best workaround is to create a new rule for MC_ADAPTER_CONTROL (AGENT_DOWN) events and set them initially to severity INFO. If the Agent is truly down then the second agent down event (which occurs 3 minutes later) should be configured in the rule to set the severity back to WARNING or ALARM.
The problem is also solved by restarted the BIIP3 Adapter. I therefore suggest that all customers schedule a restart of the BIIP3 adaptors once per day. No events are lost because the BIIP3 adapter (and the PATROL Agent) caches all events.
I have seen this problem about once per month with a population of 500 agents.
BPPM Threshold Migration
The migration of both global and local thresholds from one BPPM Analystics instance to another must be performed by hand. The is an export / import mechanism for global thresholds, but as of July 2012, this mechanism is unreliable. There is no import / export mechanism for local (host specific) thresholds.
BPPM Local Instance thresholds
BPPM Analytics does not support instance specific thresholds. In other words, you can not set a default threshold for FSCapacity across all file systems and then set an instance specific threshold that applies only to the root FileSystem and htne apply this instance specific threshold to all hosts. The instance specific threshold must be individually defined on all hosts. If there re 500 hosts, this becomes unfeasible. This is no script or API that can be used to automate this task.
BPPM – Missing Hosts
With this release of BPPM, the PATROL Agents are connected to BPPM Analytics using the BPPM Adaptor. When you use the Graphing facility to graph parameters in BPPM, some of the hosts do not appear – event though they are connected via the Adapter. At the time of this writing, this case is open with BMC and is unresolved.
BPPPM does not support Custom Event Catalogues
PATROL Events that are triggered using the event_trigger() PSL function are not supported by BPPM Analytics (ProactiveNet). This forces all customers (who use PATROL agents) to implement both the BIIP3 Adapter (for event_trigger() events) and the BPPM Adapter for all standard PATROL metrics (that have an underlying parameter).
This means that the adapter layer with a BPPM implementation is quite complex. There are three Adapters attached to every agent on three separate ports. The Adapters are the RTServer, the BIIP3 Adapter, and the BPPM Adapter.
This complexity means that the implementation becomes fragile, complex to administer and fundamentally unreliable.
LOG monitoring
It is difficult to define catch-all rules using the standard BMC Log monitoring KM. For example, it is possible to create a catch-all rule that triggers on the search stirng "ALARM". You hten give htis definition a custom origin which might be something like "LOG.BANKING_app_log.alarm". You then create a custom event mesasage that inserts the line from the log file inot the text of the message. This can be done with the syntax "%1-". The problem occurs at the event management layer. All events that match this rule will get rolled up into one event as duplicates - despite the fract that each event represents a different line from the log file and a different problem.
The work-around is to change the de-duplication rules at the event managemnet layer. Be careful. if the rules are improperly defined, you can make the product vulnerable to an event storm - which may only manifest itself a month or two later.
Monitoring of the monitoring is insufficient.
Typical Project
Project Background
The review was conducted after an upgrade Project in which every component within an old PATROL environment was upgraded. The project was driven by the customers internal audit organization that review the companies products and determined that PATROL enterprise Manager (PEM) was no longer supported an therefore the whole environment should be upgraded.
Project Phases
The project consisted of a number of separate projects which could have been undertaken individually. The customer chose to performed all three projects simultaneously which increased the risk, complexity and length of the overall project.
Phase |
Description |
Phase 1 |
Solution Design |
Phase 2 |
Upgrade of the PATROL Agents and Knowledge Modules |
Phase 3 |
Replacement of PEM with BPPM Event Manager |
Phase 4 |
Introduction of BPPM Analytics |
Project Timescales
The Solution Design phase was conducted in late 2011 and the implementation was started immediately after the New Year in 2012. Phase 3 of the solution was finally put into production on Thursday 28th June 2012.
Phase 4 of the project has not yet been completed. Phase 4 was removed from the project scope when the customer fell behind on delivery. Currently, there are no plans to complete this phase of the project.
The customer contracted several months of consultancy from BMC Software. BMC performed the initial solution Design and much of the initial configuration of the event management rules.
Resources
The resources assigned to the project, consisted of the following:
Resource |
Time Allocation |
BMC Consultant |
~ 3 months |
Customer SME |
7 Months full time |
Independent Consultant |
4 Months |
Customer UNIX Engineers (2 Engineers) |
4 Months |
Customer infrastrucutre Architect |
1 Month |
Customer Project Manager |
2 Month |
Customer Deliver manager |
2 Months |
Management Involvement (Project Sponsor + Resource Manager) |
1 Month |
Total |
24 Months |
Lessons Learned
The project overran initial estimates – both in terms of budget and cost. The following issues were encountered:
Issue |
Description |
Solution Design |
The Event Management Rules had to be completely redesigned which delayed the projected by about a month. The customer’s old rules used First Match – whereas BPPM only supports Best Fit. The complexity of the customer’s rules was not properly analysed or understood during the design phase. |
Documentation |
The design of the event management rules and were not properly documented. When it became evident that the design had to be changed, the lack of documentation slowed understanding and meant that some thinking had to be repeated and the design documented properly. |
Thresholds |
The customer spent over a month trying to migrate their thresholds from PATROL to BPPM. This tasks was complex due to the different format of the thresholds. The customer also experienced many issues with the migration tools which did not work properly. Managing thresholds in BPPM is not as easy as managing thresholds in PATROL (using PATROL Configuration Manager). In the end the customer abandoned the attempt to introduce BPPM analytics. The Autonomous alerts only covered 20% of the thresholds anyway, so the benefit of BPPM Analytics was not compelling. |
Testing |
The customer underestimated the time required for comprehensive testing. Testing should have been planned earlier, started earlier and resourced appropriately. At least a full month of end-to-end testing was required. |
Technical Lead |
Technical Leadership was lacking through some parts of the project. Initially, the BMC Consultant was the technical lead. Towards the end, an independent consultant was the technical lead. There were issues of continuity. |
Project Phases |
The project consisted of 4 project phases. Phase 2 and Phase 4 were optional and were not required in order for the custom to meet its audit deadline. In the end, Phase 4 was abandoned. |
Summary and Conclusion
Component Rating (1-5 Stars)
BMC ProcativeNet Performance Manager (BPPM) is really 3 products bundles into one suite. It still makes sense to rate each component individually.
Product | Summary | Score 1-5 |
BMC BPPM v8.6 Analystics (formerly ProActiveNet) | The product appears to have reasonably good quality control. The graphing is good. The threshold management features are poor - but BMC says this is being fixed in the next release. I am not convinced on the whole concept of using statistics. Statistical analysis uses a lot of CPU which makes scaleability an issue. Only about 30% of monitored metrics are appropriate for statistical analysis. BMC's claims that this product removes the need for threshold management is an exageration and 70% of thresholds will still need to be managed using absolute value (i.e. standard) thresholds. |
3 |
BMC BPPM v8.6 Event Mgmt (formerly Mastercell) | This product is one of the strongest event management products around. There are challenges with using the MRL rule language - but generally this product works well. I question BMC's bundling of this product with ProactiveNet and would like to see the product available as a stand-alone component. Develoing and debugging rules is time consuming and difficult. Only time will tell if this product continuous to be a good event management platform. | 3 |
BMC PATROL 7.8.10 | Twenty years ago, PATROL was the best monitoring solution of its type. Since then the product has become bloated and overly complex. PCM was a great addition and makes the management of thresholds realtively easy and repeatable. The product has not changed much in about 8 years. Four years ago, BMC were going to retire the product. Today PATROL is an integral part of BMC's BPPM strategy. The KMs and the breadth of monitoring saves this product from a lower rating. | 3 |
Rating according to Capabilities (Score 1-10)
Component/Capability |
Previous Version (with PEM) |
Latest Version (BPPM v8.6) |
Event Management |
3 |
4 |
Threshold Management |
5 |
2 |
Analytics / Graphs |
3 |
5 |
Ease of Implementation |
3 |
2 |
Extensibility / interfaces |
4 |
4 |
Operator Form for Blackout |
1 |
1 |
Average Score |
(3.2) |
(3) |
Components |
PATROL and associated KMs PATROL Central Operator PATROL Enterprise Manager (PEM) |
PATROL and associated KMs PATROL Central Operator BPPM Event Management BPPM Analytics (ProactiveNet) |
Conclusion
The score for BPPM has not improved with this revision. The product is more complex, more difficult to implement and thresholds are more difficult to administer. The improvement in capability associated with anomaly detection is not convincing and not proven to this customer and is only relevant for 30% of parameters. BMC must work hard to improve administration and ease of implementation.
The combination of BPPM Analytics (ProactiveNet), BPPM Event Management (Mastercell) and PATROL has the potential to be a market beating product. However, the investment required is significant. Time will tell if BMC delivers on this vision.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
IT Operations Monitoring Specialist at a tech services company with 51-200 employees
Robust, and responsive technical support, but setup could be simplified
Pros and Cons
- "BMC TrueSight Operations Management is easily scalable."
- "The graphs are extremely limited. We don't have a lot of dashboard options. To make reports and dashboards more useful, we usually need to integrate some dashboard solutions."
What is our primary use case?
BMC TrueSight Operations Management is used to monitor the infrastructure, applications, and databases.
What is most valuable?
It's very good. I like it. It's a great product, but there are some things that could be improved, such as the dashboards.
What needs improvement?
The dashboards could be better. The graphs are extremely limited. We don't have a lot of dashboard options. To make reports and dashboards more useful, we usually need to integrate some dashboard solutions.
The initial setup could be simplified.
For how long have I used the solution?
I have been working with BMC TrueSight Operations Management for approximately 12 years.
We are working with version 11.304.
What do I think about the stability of the solution?
After you configure everything, it's stable.
What do I think about the scalability of the solution?
BMC TrueSight Operations Management is easily scalable.
In our company, we have four people who use this solution.
How are customer service and support?
Technical support used to be better a few years ago. The level was slightly lower than expected. For the time being, it's not great, but occasionally they are good, but that is dependent on the consultant who answers the phone.
They usually respond quickly, but it's not the solution we require, and it's not always effective, but it can be. Technical training would help.
Which solution did I use previously and why did I switch?
I used Entuity. I also have basic knowledge of PRTG and Nagios. From those three, I have more working knowledge of Entuity.
I started working with Entuity, nine or ten years ago. We stopped using it two years ago. We are not familiar with the current versions.
I am currently working with Helix Operations Management and the ServiceNow ITOM.
How was the initial setup?
In general, it is not easy to install. It's complex. There are too many components, and you must set them up and work with the infrastructure team on permissions and file reports. Because there are so many components, this becomes more complicated and difficult, particularly in terms of infrastructure management. It is not easy to install.
What about the implementation team?
We have a monitoring team. We work alongside them to manage and support the solution.
What's my experience with pricing, setup cost, and licensing?
I'm not familiar with it. They have changed the licensing fees.
What other advice do I have?
You will face some difficulties unless you have someone with advanced knowledge of the solution.
I would rate BMC TrueSight Operations Management a seven out of ten.
Which deployment model are you using for this solution?
On-premises
Disclosure: My company has a business relationship with this vendor other than being a customer:
Buyer's Guide
Download our free BMC TrueSight Operations Management Report and get advice and tips from experienced pros
sharing their opinions.
Updated: November 2024
Product Categories
IT Infrastructure Monitoring Application Performance Monitoring (APM) and Observability Event Monitoring Cloud Monitoring Software AIOpsPopular Comparisons
Elastic Observability
SolarWinds NPM
PRTG Network Monitor
ServiceNow IT Operations Management
Auvik Network Management (ANM)
Cisco Intersight
VMware Aria Operations for Applications
Buyer's Guide
Download our free BMC TrueSight Operations Management Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- What are the limitations of BPPM 9.5 server monitoring tools?
- Comparison of BMC Truesight OM with MS System Center OM and IBM Tivoli Monitoring
- BMC TrueSight Intelligence [EOL] vs BMC TrueSight Operations Management: integration with Operations Management Systems and cost
- Any experience with Event & Incident Analytic engines like Moogsoft?
- Windows 10 - what are your main concerns about upgrading?
- When evaluating IT Infrastructure Monitoring, what aspect do you think is the most important to look for?
- What advice would you give to others looking into implementing a mid-market monitoring solution?
- Zabbix vs. Groundwork vs. other IT Infrastructure Monitoring tools
- Anyone switching from SolarWinds NPM? What is a good alternative and why?
- What is the best tool for SQL monitoring in a large enterprise?
I would like to concur with the statement "I question BMC's bundling of this product with ProactiveNet and would like to see the product available as a stand-alone component." Also, regarding MRL tracing, I have had some success using the releatively new tracewrite() function.