End User Management or Monitoring(EUM) falls within the area of Application Performance Management (APM). When EUM is discussed, most people think of Synthetic monitoring – using robots that run synthetic transactions from several locations. The use of synthetic transactions is just one of many options – and it is certainly not the best solution. This article discusses the main ways of performing synthetic monitoring and the Pros and Cons of each method.
-
Robot. Robots running Synthetic Transactions
-
Passive Appliance. Appliances in the Datacentre that passively collect network traffic and perform deep packet inspection.
-
Web Page Instrumentation. Javascript that is injected or placed inside web pages.
-
Java (or .NET) Profiler. An agent that monitors the Java Virtual Machine or the Common Language Runtime (CLR) using a technique called byte code injection (or by using an API such as the JVMPI).
Figure 1: Real User Monitoring - Showing URL Performance
The Use of Synthetic transactions is perhaps the best know of EUM methods – but arguable the least effective – and definitely the method with the highest Administrate overhead and Total Cost of Ownership (TCO). There are many issues with Synthetic Transactions:
-
Incomplete Location Coverage. The use of synthetic transactions does not monitor real users; it monitors transactions executed by robots from certain fixed locations. It is very expensive to monitor all locations due to the number of robots required.
-
Not monitoring Real Users. It is impossible to measure the experience of real users – especially for highly distributed users of external customer facing applications such as internet banking. The synthetic transaction may work fine, but real users may experience issues due to their location, or type of browser or their local workstation.
-
Page Abandonment. Synthetic transaction monitoring does not report on user behaviour such as page abandonment rates.
-
High Administrative Cost. Synthetic transactions are costly to maintain. First, the scripts must be developed. It usually takes from 3-5 days to develop a robust production ready script for each synthetic transaction. Most products have a “record” capability for capturing the transaction – but the script will always need editing in order to make the script robust and production ready. Scripts need to be updated every time the application is updated. Robots occasionally fail and need restarting. For a mid-sized organization with 6,000 employees (such as a small bank) you could easily require one FTE support person to maintain the solution
Passive Collector in the Datacentre
The second method for performing End User Monitoring (EUM) is install an appliance in the Data Centre which collects network packets and then perform deep packet inspection. The level of statistics that can be gathered using this approach is very good and exceeds the detail gathered by Synthetic Monitoring. These type of products can report on individual web page load times for any user located in any location. User satisfaction measures such as abandonment rates can be measured and reported. The total transaction latency can be divided into client latency, network latency and Server latency to quickly determine which tier is responsible for the performance issue. Depending on where the probe is located, the issue can be isolated to the specific tier in the datacentre. It should be emphasized that the probe is entirely passive; the probe is attached to a span port on a main switch and just passively captures packets. Once installed, the solution requires almost no maintenance. There are no scripts to update and no robots to keep running.
The main disadvantages to this method are the following:
-
Does not work without users. If there are no users (e.g at night) then the passive probe can not detect a problem with the application. Problems may go undetected until the first user logs on in the morning.
-
Protocol Support. The passive probes perform deep packet inspection. The range of protocols that are supported is limited. The techniques works best for protocols that have a defined start and end such as HTML (or SSL). The probes can decode SSL traffic if provided with the decryption key. The probes do not work so well for custom in-house applications that use a custom protocol. Generally protocols that supported include: HTTP, HTTP over SSL, SQL, Tuxedo/Jolt, Citrix, MQ. Depending on the product, support for AJAX may be limited. The solution works best for HTTP.
One of the best solution in this category is Gomez Data Center Real User Monitoring (formerly Vantage Agentless or Adlex). I have deployed this product at two customer sites and can recommend this product highly.
Web Page Instrumentation
The last type of technology is Real User Monitoring implemented using client side Javascript. Google Analytics and Gomez User Experience Management (UEM) are two products that use this method – although each product provides a different solution. Google Analytics fits in the market called “Web Site Analytics”. Gomez EUM is a Real User Monitoring (RUM) solution that reports on page load times, abandonment rates and then raises alerts if performance exceeds an SLA. The solution is asynchronous. This means that once the piece of Javascript has been executed by the user’s browser from the web page, execution occurs independently of the page load – so subsequent client side activity is not affected. This solution can be offered as a Software as a Service (SaaS) because the Javascript can send the information to a Vendor managed server located anywhere in the world. Problems with this approach include the following:
-
Security. Most Banks and Governments will not allow applications to send information off-site from code embedded in web pages, so SaaS may not be allowed. However the Javascript runs in the client's borwser and sends performance data to a cloud based SaaS server. The data may never enter the corporate network, so security need not be an issue.
-
Server Latency. This solution can not break down server latency into its componentIf tracing transactions through the datacentre is important, then other solution (such as Dynatrace) may also be needed.
-
Alteration to Web Pages. All web pages delivered to the clients must be updated. Depending of the architecture of the application, this will probably be relatively easy from a technical perspective, but the development teams will need to be involved. Many different development teams may need to be contacted.
Java (or .NET) Profiler
Many of the vendors sell Java (or .NET) profilers that are able to track transactions inside the JVM. These tools do not monitor End User Experience but thet can help diagnose performance issues related to the code - which usually account for the majority of performance issues. These profilers can trace transactions and determine what exactly how much time is being spent in each method that is invoked for each transaction. However, these products are more than simple profilers; If the agents are loaded into all backend JVM's and then linked together to one management server, then it is possible to track transactions through the various tiers and draw maps of how the backend tiers fits together. Products such as AppDynamics and Compuware's Dynatrace work in this fashion.
Recommendation
I have implemented Synthetic Transactions (both Compuware and BMC solutions) and am not convinced by this technology. Due to the high maintenance cost required, these solutions generally fall into disrepair and stop working after a few years.
A very good solution for End User Monitoring is Gomez Data Center Real User Monitoring (formerly Vantage Agentless or Adlex). I have deployed this product at two customer sites and can recommend this product highly. For HTTP based application such as Internet Banking, this product is a great fit and invaluable. Just make sure you investigate support for AJAX (if this is a requirement). The product is relatively easy to integrate into the event management layer and can be performed using SNMP, so don’t be concerned about integration. Initial up-front cost may be high – but TCO will be low; the product requires almost no maintenance. Compuware can perform a Virtual POC (they just capture some Network Traffic) so purchasing the product can be relatively painless too.
If the business application is a cloud based application, then installing a sniffer device is not posisble, so a Browser based Javascript solution is the best solution.
Most performance and availability issues are caused in the datacentre, so it is important to be able to break down server time into measurements for each tier. Tracing performance issues into the JVM using a java profiler is standard practice nowadays for DEV environments. However tracing production transactions through the back-end messaging layer or into the database is a more complex task and requires the capabilities of products such as Compuware’s Dynatrace and AppDynamics.
RUM first?
Most customers implement component level monitoring (bottom up monitoring) first. Most customers consider Real User Monitoring (RUM) to be a luxury. Is this view correct? I have been raving about RUM - but would I implement RUM before Component level monitoring?
The answer is definitely Yes. For HTTP applications, I would implement Data Center RUM before any other type of monitoring. Top down monitoring gives you immediate information and alerts about overall application availability and performance.
RUM will tell you about the state of your application now – at the current point in time. However, RUM will not tell you about potential issues that might occur in the future – in one hour or tomorrow. RUM does not monitor things that fill up – such as storage. RUM is useless for capacity issues. So, customers must implement component level monitoring as well. Customers should have both. For all customers, both top down and bottom up monitoring are essential.
Vendors
Vendor |
Synthetic Transactions |
Deep Packet Analysis |
Browser Javascript |
JVM or CLR Monitoring |
Compuware |
Gomez Synthetic Monitoring
|
Gomez Data Center RUM
|
Gomez User Experience Management (formerly Gomez Browser RUM) |
DynaTrace |
OpNet |
|
AppInternals Xpert |
|
|
AppDynamics |
|
|
AppDynamics Pro (RUM) |
AppDynamics Pro |
New Relic |
|
|
New Relic RUM |
|
IBM |
Tivoli Composite Application Manager for Transactions (Rational) |
Tealeaf |
|
|
CA |
|
CA Wiley Customer Experience Manager |
APM Cloud Monitor |
|
BMC |
ProactiveNet TMART (licensed from Borland SilkTest or SilkPerform ) |
Coradiant |
|
|
HP |
Loadrunner |
HP BAC - RUM |
|
>/td>
|
Disclosure: I am a real user, and this review is based on my own experience and opinions.
HI Guys,
I find very interesting to run through all product line is being posted here. I have some concern, can anyone help me find technology that can monitor traffic utilization of leased line circuit transport. Hope to hear from you guys.
Thanks